Waymo Explores Using Google's Gemini To Train Its Robotaxis (theverge.com) 11
Waymo is advancing autonomous driving with a new training model for its robotaxis built on Google's multimodal large language model (MLLM) Gemini. The Verge reports: Waymo released a new research paper today that introduces an "End-to-End Multimodal Model for Autonomous Driving," also known as EMMA. This new end-to-end training model processes sensor data to generate "future trajectories for autonomous vehicles," helping Waymo's driverless vehicles make decisions about where to go and how to avoid obstacles. But more importantly, this is one of the first indications that the leader in autonomous driving has designs to use MLLMs in its operations. And it's a sign that these LLMs could break free of their current use as chatbots, email organizers, and image generators and find application in an entirely new environment on the road. In its research paper, Waymo is proposing "to develop an autonomous driving system in which the MLLM is a first class citizen."
The paper outlines how, historically, autonomous driving systems have developed specific "modules" for the various functions, including perception, mapping, prediction, and planning. This approach has proven useful for many years but has problems scaling "due to the accumulated errors among modules and limited inter-module communication." Moreover, these modules could struggle to respond to "novel environments" because, by nature, they are "pre-defined," which can make it hard to adapt. Waymo says that MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: the chat is a "generalist" trained on vast sets of scraped data from the internet "that provide rich 'world knowledge' beyond what is contained in common driving logs"; and they demonstrate "superior" reasoning capabilities through techniques like "chain-of-thought reasoning," which mimics human reasoning by breaking down complex tasks into a series of logical steps.
Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its driverless cars find the right route, including encountering various animals or construction in the road. [...] But EMMA also has its limitations, and Waymo acknowledges that there will need to be future research before the model is put into practice. For example, EMMA couldn't incorporate 3D sensor inputs from lidar or radar, which Waymo said was "computationally expensive." And it could only process a small amount of image frames at a time. There are also risks to using MLLMs to train robotaxis that go unmentioned in the research paper. Chatbots like Gemini often hallucinate or fail at simple tasks like reading clocks or counting objects.
The paper outlines how, historically, autonomous driving systems have developed specific "modules" for the various functions, including perception, mapping, prediction, and planning. This approach has proven useful for many years but has problems scaling "due to the accumulated errors among modules and limited inter-module communication." Moreover, these modules could struggle to respond to "novel environments" because, by nature, they are "pre-defined," which can make it hard to adapt. Waymo says that MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: the chat is a "generalist" trained on vast sets of scraped data from the internet "that provide rich 'world knowledge' beyond what is contained in common driving logs"; and they demonstrate "superior" reasoning capabilities through techniques like "chain-of-thought reasoning," which mimics human reasoning by breaking down complex tasks into a series of logical steps.
Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its driverless cars find the right route, including encountering various animals or construction in the road. [...] But EMMA also has its limitations, and Waymo acknowledges that there will need to be future research before the model is put into practice. For example, EMMA couldn't incorporate 3D sensor inputs from lidar or radar, which Waymo said was "computationally expensive." And it could only process a small amount of image frames at a time. There are also risks to using MLLMs to train robotaxis that go unmentioned in the research paper. Chatbots like Gemini often hallucinate or fail at simple tasks like reading clocks or counting objects.
Hey maaan, ... (Score:3)
Re: (Score:2)
At least they acknowledge that's an issue. The question is will they put hallucinating robots on the streets?
Re: (Score:2)
The cost/benefits for corporations are out of whack in the US. That's why you guys are bei
Leading the way (Score:2)
Isn't this what Tesla has been pursuing for years now?
In other words, train a neural net on enough examples of what a "good human" driver would do and eventually you will end up with a model that will do that for almost all scenarios. You will always be able to fool it, but lest often than with a human driver.
No doubt the media still thinks that Waymo is still "ahead" of Tesla in some way.
Re: (Score:2)
Of course Waymo is ahead. They love Google and "AI".
Re: (Score:1)
Meanwhile Tesla has what's basically a fancy version of Cruise control except that very very low parking lot speeds and even then the Tesla summon feature has a nasty habit of causing random accidents in parking lots...
So as long as you ignore the fact that Tesla is at best 15 years behind waymo yeah they're de
Re: (Score:2)
Also as far as I can tell when Steve Jobs died his reality distortion field didn't go with him and transferred to Leon.
I assume you meant "Elon?"
Re: (Score:2)
Re: (Score:2)
No doubt the media still thinks that Waymo is still "ahead" of Tesla in some way.
LOL. Waymo _is_ ahead. They are operating actual real-life taxis in SF, LAS, and Phoenix right now. You just download the app, and request a ride. And it just works.
Re: (Score:2)
Anyways