Machine Learning to Build Artificial Proteins

Protein is essential for the life of a cell and performs complex tasks and promotes chemical reactions. Scientists and engineers have long tried to take advantage of this power by designing artificial proteins that can perform new tasks such as disease treatment, carbon capture or harvesting energy, but many of the processes designed to produce these proteins are slow, complex, and have high failure rates.

In a breakthrough that could have implications across the healthcare, agriculture, and energy sectors, a team lead by researchers in the Pritzker School of Molecular Engineering (PME) at the University of Chicago has developed an artificial intelligence-led process that uses big data to design new proteins.

The researchers found fairly simple design rules for the creation of artificial proteins with the help of a machine learning model that can evaluate protein information collected from a genomic database. When the research team made this artificial protein in the lab, they found it way similar to the naturally occurring

“We have all wondered how a simple process like evolution can lead to such a high-performance material as a protein,” said Rama Ranganathan, professor of biochemistry and molecular biology at University of Chicago. “We found that genome data contains enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to bottle nature’s rules to create proteins ourselves.”

AI: the next thrust towards learning protein design

Proteins comprises thousands of amino acids, and their very sequences define the protein’s structure and function. Although decoding how to develop these sequences so as to create novel proteins has always been a challenge.

Despite the fact that the previous theories and research have discovered the methods to specify protein structure, their functioning has still remained obscure.

What Ranganathan and his team recognized since the past 15 years is that the exponentially growing genome databases comprise huge amounts of information. This ranges from the the basic principles of protein structure to the standards of their functions. The team developed statistical models based on this data and then started using machine-learning methods to reveal new information about proteins’ basic design rules.

A PLATFORM TO DECODE OTHER COMPLEX SYSTEMS

Not just limited to the above mentioned applications, this platform is proposed to clarify the complexities of several systems ranging from deep neural networks to the physics behind the lipid structures. This system can provide a robust platform for coherently engineering protein molecules through an approach which otherwise has just been in thesis. It can further guide to discover solutions for issues of carbon capture and even energy harvesting.