The cell is a biochemical factory of immense complexity. As fundamental units of life, cells harvest energy from the environment and use it to synthesize complex molecular machinery, to build replicas of themselves and to move. This is a 4-billion-year-old trick that even today mystifies biologists who struggle to understand the processes and principles at work.
To change this, life scientists have modeled cells at various levels of detail. These models can simulate some important biomolecular processes in the synthesis of proteins, such as transcription and translation. The most advanced models can even predict some of the large-scale characteristics of a bacterial cell, its phenotype, given the organism’s genetic code.
But these models merely scratch the surface in capturing the full scale and complexity of cellular machinery. Perhaps the biggest challenge is modeling the huge range of biological processes and pathways at scales ranging from the atomic and molecular to the cellular and tissue level. This activity is highly non-linear so that a small change in initial conditions can be inconsequential or lead to huge differences in outcome over timescales ranging from picoseconds to hours or days.
Step Change
Of course, improvements in processing power and algorithmic sophistication are gradually making the models better. But what computer scientists and life scientists would like is a step change that can leapfrog these improvements.
Just such a step is now in sight, say Charlotte Bunne at Stanford University in Palo Alto and colleagues. This group say artificial intelligence has the potential to model cells at previously unheard of resolution and on a vast range of scales.
They are proposing to build an AI virtual cell that can accurately simulate the behavior of real cells so accurately they will be able to predict the cellular response to a wide range of stimuli, identify potential drug targets and even evaluate virtual versions of these drugs. With this AI approach, they say, “a comprehensive predictive understanding of cell mechanisms and interactions is within reach.”
A key ingredient in the success of this novel approach is the availability of training data. That’s one area in which bioinformaticians have flourished. “The scale of raw biological data is undeniable,” say Bunne and co. They point to the Sequence Read Archive, a public repository for DNA sequencing data that currently holds over 14 petabytes of information, one thousand times more than was used to train ChatGPT.
The difficulty, of course, is in choosing the training data wisely from this and other sources. Much of the data in these databases is likely to be redundant or of limited value for training purposes. Neither is it likely to be diverse enough to capture the full range of cellular behavior.
That’s because these databases are heavily biased towards organisms favored in laboratory experiments, such as Escherichia coli, mice and humans. That will inevitably create species bias in any AI model.
Bunne and co are clear that much more data will be needed from a wide range of sources including DNA, RNA and protein sequences, on the spatial locations of transcriptome and proteome activity and on tissue structure to name just a few.
Innovative Insight
Another important ingredient will be the structure of the AI model itself. Bunne and co propose three interacting levels that will simulate the cell at the molecular, cellular and multicellular levels. Each level can be interrogated by virtual instruments that produce an output designed for human insight or as input for another virtual instrument. In this way, computer scientists can design experiments to assess the behavior of cells over the full range of scales. In essence, this will be a virtual laboratory for cellular science.
For the moment, the AI virtual cell is little more than a twinkle in the eyes of an albeit influential group of researchers, who have begun to flesh out their plans. The ambition is so great that this work will not be possible through the efforts of one or two research groups. Instead, the coming months and years will need to see significant collaboration between groups in academia, government and industry.
That’s usually a challenging task to coordinate. But in this case, the stakes are high enough to provide incentives for all. “The AI Virtual Cell has the potential to revolutionize the scientific process, leading to future breakthroughs in biomedical research, personalized medicine, drug discovery, cell engineering and programmable biology,” say Bune and co. We’ll be watching to see where this project goes next – and who takes part.
Ref: How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities : arxiv.org/abs/2409.11654
Source : Discovermagazine