OpenAI Starts Offering a Biology-Tuned LLM (arstechnica.com) 14
An anonymous reader quotes a report from Ars Technica: On Thursday, OpenAI announced it had developed a large language model specifically trained on common biology workflows. Called GPT-Rosalind after Rosalind Franklin, the model appears to differ from most science-focused models from major tech companies, which have generally taken a more generic approach that works for various fields. In a press briefing, Yunyun Wang, OpenAI's Life Sciences Product Lead, said the system was designed to tackle two major roadblocks faced by current biology researchers. One is the massive datasets created by decades of genome sequencing and protein biochemistry, which can be too much for any one researcher to take in. The second is that biology has many highly specialized subfields, each with its own techniques and jargon. So, for example, a geneticist who finds themselves working on a gene that's active in brain cells might struggle to understand the immense neurobiological literature.
Wang said the company had taken an LLM and trained it on 50 of the most common biological workflows, as well as on how to access the major public databases of biological information. Further training has resulted in a system that can suggest likely biological pathways and prioritize potential drug targets. "We're connecting genotype to phenotype through known pathways and regulatory mechanisms, infer likely structural or functional properties of proteins, and really leveraging this mechanistic understanding," Wang said. To address LLMs' tendencies toward sycophancy and overenthusiasm, OpenAI says it has tuned the model to be more skeptical, so it's more likely to tell you when something is a bad drug target. There was a lot of talk about GPT-Rosalind's "reasoning" and "expert-level" abilities. We were told that the former was defined as being able to work through complex, multi-step processes, while the latter was derived from the model's performance on a handful of benchmarks. Access to GPT-Rosalind is currently limited "due to concerns about the model's potential for harmful outputs if asked to do something like optimize a virus's infectivity," notes Ars. Only U.S.-based organizations can request access at the moment.
Wang said the company had taken an LLM and trained it on 50 of the most common biological workflows, as well as on how to access the major public databases of biological information. Further training has resulted in a system that can suggest likely biological pathways and prioritize potential drug targets. "We're connecting genotype to phenotype through known pathways and regulatory mechanisms, infer likely structural or functional properties of proteins, and really leveraging this mechanistic understanding," Wang said. To address LLMs' tendencies toward sycophancy and overenthusiasm, OpenAI says it has tuned the model to be more skeptical, so it's more likely to tell you when something is a bad drug target. There was a lot of talk about GPT-Rosalind's "reasoning" and "expert-level" abilities. We were told that the former was defined as being able to work through complex, multi-step processes, while the latter was derived from the model's performance on a handful of benchmarks. Access to GPT-Rosalind is currently limited "due to concerns about the model's potential for harmful outputs if asked to do something like optimize a virus's infectivity," notes Ars. Only U.S.-based organizations can request access at the moment.
By 2030 this could be very bad and very good (Score:2)
On the very good side, this will lower the cost and lead times for new drugs.
On the bad side, nation-states, terrorists, and even just Evil Agents Of Chaos[TM] who have access to tools like this and the knowledge to (ab)use them will be able to unleash biological chaos on the world.
Imagine if someone created a virus that infected everyone, spread rapidly, but was asymptomatic or had only common-cold-like-symptoms on everyone but their intended target, but it killed their target. The target could be an indi
Re: (Score:2)
We can't do that yet, and may never be able to be that specific. Trying to do it, however, could be exceedingly dangerous.
N.B.: All bacteria and viruses have a very high mutation rate.
Eric Schmidt on AI used to make bioweapons soon (Score:2)
From the transcript about 43 minutes in of a public conversion with Eric Schmidt from Apr 10, 2025: https://www.youtube.com/watch?... [youtube.com]
====
"Question: Thanks for the great conversation so far. Leonard Justin. I'm a PhD student at MIT. Um, I was wondering if you could just discuss a bit more some of the risks you see coming specifically with respect to biology and how we should go about mitigating those. What's the role of the AI developers? What's the role of government? Um,
Re: (Score:2)
Generating bad pathogens is quite plausible. Generating narrowly targeted ones that will stay narrowly targeted is currently implausible, and probably will remain so until well after the singularity. It would require designing genomes that were strongly error correcting. Elephants and naked mole rats do a reasonable job of that, but I don't think it's plausible for bacteria.
Re: By 2030 this could be very bad and very good (Score:2)
Specialized models of varuous kinds are nothing new; even forcing llms into a âspecializedâ mode using instructions and "external knowledge" is old news.
Nothing radical happening and you can be sure there is already more than one crispr toolkit working without safety and restrictions that is hooked to a model at some outfit owned by scam slopman, not to mention thiel and elona's.
It is here whether you like it or not, so consider mitigation, not whining.
Re: (Score:2)
I have an idea (Score:2)
Re: (Score:2)
Re: I have an idea (Score:2)
Sure, you can ask, but beware of the answers you get.
Optimal Virus? (Score:2)
This is the same as curing cancer and virtually every disease, because if there was a way to get a "large" payload (say 10kb of RNA, or ideally, 30 kb like the Coronavirus) into every cell efficiently, you can cure cancer. With 10kb it can be done with genius level bioengineering skill. With 30 kb it's trivial.
Re: (Score:2)
Should point out humanity's "best" tool today for doing this is the adenovirus capsid, but it has 3 major shortcomings: it only holds about 4kb of code, can't get into all cells efficiently (relative to other things it can, but not good enough), AND it can (practically) only be dosed once (if you dose it again after a week or two the immune system destroys it).
I wonder how it compares to ours (Score:2)
But that said, every university in Europe is producing the same thing and let's be honest, training new models has becomes a lot easier these days.