Evo: Genome-Level Language Model
This is a news story, published by Quanta Magazine, that relates primarily to RNA news.
biology news
For more biology news, you can click here:
more biology newsQuanta Magazine news
For more news from Quanta Magazine, you can click here:
more news from Quanta MagazineAbout the Otherweb
Otherweb, Inc is a public benefit corporation, dedicated to improving the quality of news people consume. We are non-partisan, junk-free, and ad-free. We use artificial intelligence (AI) to remove junk from your news feed, and allow you to select the best science news, business news, entertainment news, and much more. If you like biology news, you might also like this article about
DNA language models. We are dedicated to bringing you the highest-quality news, junk-free and ad-free, about your favorite topics. Please come every day to read the latest DNA sequences news, genomic large language model news, biology news, and other high-quality news about any topic that interests you. We are working hard to create the best news aggregator on the web, and to put you in control of your news feed - whether you choose to read the latest news through our website, our news app, or our daily newsletter - all free!
DNA language modelingQuanta Magazine
•Science
Science
The Poetry Fan Who Taught an LLM to Read and Write DNA | Quanta Magazine

72% Informative
Brian Hie: A genomic large language model (LLM) has been trained on large volumes of DNA.
The model picks up patterns that humans can’t see in DNA.
It uses those patterns to predict how changes to DNA affect the function of its downstream products, RNA and proteins.
Hie became interested in using language models for biology during graduate school.
Evo was trained on a “novel” consisting of many genomes — the E. coli genome alone is 2 million to 4 million base pairs.
Its training data set was also important: Its exposure to 2.7 million genomes from bacteria, archaea and viruses.
It shows the model evolutionary alternatives for life — different ways of expressing the same idea.
Evo is trained only on genomes from the simplest organisms, prokaryotes.
We want to expand it to eukaryotes - organisms such as animals, plants and fungi whose cells have a nucleus.
The model generated a million tokens freely from scratch — essentially, an entire bacterial genome.