Powerful algorithms used by Netflix, Amazon, and Facebook can ‘predict’ the biological language of cancer and neurodegenerative diseases such as Alzheimer’s, researchers have found.

The vast data produced over decades of research was fed into a computer language model to determine if artificial intelligence can make more advanced discoveries than humans.

Researchers at St John’s College, University of Cambridge, found that machine learning techniques could interpret the ‘biological language’ of cancer, Alzheimer’s disease and other neurodegenerative diseases.

Their pioneering research has been published in a scientific journal PNAS today (April 8, 2021), and can be used in the future to “correct grammatical errors within cells that cause disease”.

Professor Tuomas Knowles, lead author of the paper and St John’s College Fellow, said: “Bringing machine learning technology to research in neurodegenerative diseases and cancer is an absolute changer. Ultimately, the goal is to use artificial intelligence to develop targeted drugs to dramatically alleviate symptoms or prevent dementia.”

Every time Netflix recommends a series to watch or Facebook suggests someone to make friends with, platforms use powerful machine learning algorithms to make highly educated guesses about what people will do next. Alexa and Siri, like voice assistants, can even identify individual people and “talk” to you immediately.

Dr. Kadi Liis Saar, the first author and researcher of the paper at St John’s College, used similar machine learning technology to train a large-scale language model to look at what happens when something goes wrong with proteins inside the body, causing disease.

He said: “There are thousands and thousands of proteins in the human body, and scientists do not yet know how many of them work. We asked for a neural network-based language model to learn the language of proteins.

“We specifically asked the program to learn about language-modifying biomolecular condensates – protein droplets found in cells,” which researchers really need to understand in order to break the language of biological function and dysfunction that cause cancer and neurodegenerative diseases such as Alzheimer’s disease. it could learn, without saying separately, what scientists have already found in the language of proteins over decades of research. “

Proteins are large, complex molecules that play many critical roles in the body. They do most of the work in cells and are needed for the structure, function and regulation of body tissues and organs – for example, antibodies are a protein that acts to protect the body.

Alzheimer’s, Parkinson’s, and Huntington’s diseases are the three most common neurodegenerative diseases, but researchers estimate there are several hundred.

In Alzheimer’s disease, which affects 50 million people worldwide, proteins go rogue, form lumps, and kill healthy nerve cells. Healthy brains have a quality control system that effectively disposes of these potentially dangerous protein masses, known as aggregates.

Researchers now think that some disrupted proteins also form liquid protein droplets, called condensates, that have no membrane and fuse freely with each other. Unlike irreversible protein aggregates, protein condensates can form and regenerate and are often compared to pieces of deformation wax in lava lamps.

Professor Knowles said: “Protein condensates have recently attracted a lot of attention in the scientific world because they control key cellular events such as gene expression – how DNA is converted into proteins – and protein synthesis – how cells produce proteins.

“Any defects associated with these protein droplets can lead to cancer. Therefore, bringing natural language processing technology into research into the molecular origins of protein dysfunction is vital if we are to be able to correct the grammatical errors within cells that cause disease.”

Dr. Saar said: “We gave the algorithm all the known proteins so that it could learn and predict the language of proteins in the same way that these models learn human language and how WhatsApp knows how to suggest words to use.

“Then we got to ask about that particular grammar that only leads to some proteins forming condensates inside cells. It’s a very challenging problem and opening it up helps us learn the rules of the language of disease.”

Machine learning technology is evolving rapidly due to the increasing availability of data, increased computing power, and technical advances that have created more efficient algorithms.

Further use of machine learning may change future research on cancer and neurodegenerative diseases. Discoveries can be made beyond what scientists already know and speculate about diseases, and possibly even beyond what the human brain can understand without the help of machine learning.

Dr. Saar explained, “Machine learning can be free from the limitations of what scientists think are the objects of scientific research, and that means finding new connections that we haven’t even thought about yet. It’s really really exciting.”

The developed network is now made freely available to researchers around the world so that multiple researchers can work on development.



Please enter your comment!
Please enter your name here