Waymo Explores Using Google's Gemini To Train Its Robotaxis (theverge.com) 42

Posted by BeauHD on Friday November 01, 2024 @07:10PM from the AI-powered-cars dept.

Waymo is advancing autonomous driving with a new training model for its robotaxis built on Google's multimodal large language model (MLLM) Gemini. The Verge reports: Waymo released a new research paper today that introduces an "End-to-End Multimodal Model for Autonomous Driving," also known as EMMA. This new end-to-end training model processes sensor data to generate "future trajectories for autonomous vehicles," helping Waymo's driverless vehicles make decisions about where to go and how to avoid obstacles. But more importantly, this is one of the first indications that the leader in autonomous driving has designs to use MLLMs in its operations. And it's a sign that these LLMs could break free of their current use as chatbots, email organizers, and image generators and find application in an entirely new environment on the road. In its research paper, Waymo is proposing "to develop an autonomous driving system in which the MLLM is a first class citizen."

The paper outlines how, historically, autonomous driving systems have developed specific "modules" for the various functions, including perception, mapping, prediction, and planning. This approach has proven useful for many years but has problems scaling "due to the accumulated errors among modules and limited inter-module communication." Moreover, these modules could struggle to respond to "novel environments" because, by nature, they are "pre-defined," which can make it hard to adapt. Waymo says that MLLMs like Gemini present an interesting solution to some of these challenges for two reasons: the chat is a "generalist" trained on vast sets of scraped data from the internet "that provide rich 'world knowledge' beyond what is contained in common driving logs"; and they demonstrate "superior" reasoning capabilities through techniques like "chain-of-thought reasoning," which mimics human reasoning by breaking down complex tasks into a series of logical steps.

Waymo developed EMMA as a tool to help its robotaxis navigate complex environments. The company identified several situations in which the model helped its driverless cars find the right route, including encountering various animals or construction in the road. [...] But EMMA also has its limitations, and Waymo acknowledges that there will need to be future research before the model is put into practice. For example, EMMA couldn't incorporate 3D sensor inputs from lidar or radar, which Waymo said was "computationally expensive." And it could only process a small amount of image frames at a time. There are also risks to using MLLMs to train robotaxis that go unmentioned in the research paper. Chatbots like Gemini often hallucinate or fail at simple tasks like reading clocks or counting objects.

Waymo Explores Using Google's Gemini To Train Its Robotaxis

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 42 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:4, Interesting)

by account_deleted ( 4530225 ) writes: on Friday November 01, 2024 @07:13PM (#64913703)

Comment removed based on user account deletion

- Re: (Score:2)
  
  by iAmWaySmarterThanYou ( 10095012 ) writes:
  
  At least they acknowledge that's an issue. The question is will they put hallucinating robots on the streets?
  - Re: (Score:2)
    
    by martin-boundary ( 547041 ) writes:
    
    Why wouldn't they? The difference between a hallucinating human taxi driver and a hallucinating AI taxi driver is that you can put the human driver in jail after an accident, ipso facto making the street a little bit safer. An AI can't be put in jail, and because a single AI model drives thousands of taxis, "retiring" it would imply removing thousands of taxis. AKA too big to fail. AKA slap on the wrist fine instead.
    The cost/benefits for corporations are out of whack in the US. That's why you guys are bei
    - - Re: Hey maaan, ... (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        Every driver makes about 10 mistakes a day, and the difference between the one mistake which causes a fatal crash, and the 999999 ones which don't is basically luck
        -citation missing-
        
        Re: Hey maaan, ... (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        I see drivers make mistakes, but maybe one or two among hundreds of drivers on the road. Don't think that equates to 10 mistakes a day. Also I'm in a very small city so how does that rate change if everyone only drives 30 mins a day?
- Re: (Score:2)
  
  by dvice ( 6309704 ) writes:
  
  "to train"
  Taxis won't be hallucinating, they train to navigate in an environment that was created by hallucinating AI.
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
Leading the way (Score:2)

by AlanObject ( 3603453 ) writes:

Isn't this what Tesla has been pursuing for years now?
In other words, train a neural net on enough examples of what a "good human" driver would do and eventually you will end up with a model that will do that for almost all scenarios. You will always be able to fool it, but lest often than with a human driver.
No doubt the media still thinks that Waymo is still "ahead" of Tesla in some way.
- Re: (Score:2)
  
  by iAmWaySmarterThanYou ( 10095012 ) writes:
  
  Of course Waymo is ahead. They love Google and "AI".
- Re: (Score:1)
  
  by rsilvergun ( 571051 ) writes:
  
  Well they are ahead in one way, their technology works. Waymo has thousands of hours of successful driving and is actively running full self-driving taxi services in Phoenix and San Francisco.
  
  Meanwhile Tesla has what's basically a fancy version of Cruise control except that very very low parking lot speeds and even then the Tesla summon feature has a nasty habit of causing random accidents in parking lots...
  
  So as long as you ignore the fact that Tesla is at best 15 years behind waymo yeah they're de
  - Re: (Score:2)
    
    by ClickOnThis ( 137803 ) writes:
    
    Also as far as I can tell when Steve Jobs died his reality distortion field didn't go with him and transferred to Leon.
    I assume you meant "Elon?"
  - Re: (Score:2)
    
    by AlanObject ( 3603453 ) writes:
    
    Meanwhile Tesla has what's basically a fancy version of Cruise control except that very very low parking lot speeds and even then the Tesla summon feature has a nasty habit of causing random accidents in parking lots...
    I don't own Tesla FSD but got the free trial twice now. It has driven me places using freeways, city streets with complex intersections and traffic situations just fine. Waymo isn't even allowed on freeways and can only operate in their supervised areas. I don't see how you can even compare the products. Tesla has such vastly superior capability.
    Haven't used the summon feature yet and I'm not really that interested.
    - Re: Leading the way (Score:2)
      
      by LindleyF ( 9395567 ) writes:
      
      Tesla, and just about everyone else, is building a glorified driver assist system. It may handle itself most of the time but it has a human backup and it can afford to take risks as a result. Waymo, by contrast, is fully autonomous from the ground up. There isn't a steering wheel. No one else is doing that.
      - Re: Leading the way (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        But waymo needs an exact scan of all the places they drive, which doesn't scale to general use anywhere in the world.
        
        Re: Leading the way (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        Aren't engineering tradeoffs fun?
        
        Re: (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        When you are trying to solve an impossible problem, there will always be a trade-off that makes solving that problem impossible.
        
        Re: Leading the way (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        Today's impossible is tomorrow's difficult, and the next day's commonplace.
        
        Re: (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        It is clear to see none of these systems will work outside of their very specific use cases. Yet people keep saying it is coming in two years. There is also such a thing as beating a dead horse.
        
        Re: Leading the way (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        The internet is impossible, yet here we are. Complexity is defeated by layers. You have to build the foundation first.
        
        Re: (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        Ok but lets put this into perspective. Right now the car companies have 300 baud modems and they are all proclaiming "we will have the internet in 2 years!"
        
        Re: Leading the way (Score:2)
        
        by LindleyF ( 9395567 ) writes:
        
        That's a valid point.
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  Oh yeah, that's why Tesla was showing off human operated remote controlled robots at its fake demo the other day. BTW, would you like to own a bridge? I can get you one, it's cheap!
- Re: (Score:2)
  
  by Cyberax ( 705495 ) writes:
  
  No doubt the media still thinks that Waymo is still "ahead" of Tesla in some way.
  LOL. Waymo _is_ ahead. They are operating actual real-life taxis in SF, LAS, and Phoenix right now. You just download the app, and request a ride. And it just works.
  - Re: (Score:2)
    
    by AlanObject ( 3603453 ) writes:
    
    Yeah, Really. [fox7austin.com]
    In the probe, which began last Monday, NHTSA is looking at a total of 22 incidents, including 17 crashes or fires. No one was hurt or killed.
    Notably, none of the Waymo cars collided with other moving vehicles, but they did crash into all sorts of other things like chains, gates and parked cars. Some cars disobeyed traffic lights, entered construction zones, or even drove on the wrong side of the road with cars coming.
    - Re: (Score:2)
      
      by Cyberax ( 705495 ) writes:
      
      Yes, really. Try it if you're there. It works amazingly well.
      
      Of course, there are problems. But it _works_, while Tesla only promises it within "two years".
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  I agree. I think the hope is that building on a language model would enable it to handle situations it had only read about, enabling it to handle the "long tail" (outlier) events that tend to stymie a direct data-driven approach, by leveraging general knowledge of the world. Empirically they show benchmark results and several examples to imply that it does handle unusual situations well, but I'm not seeing a very clear link to knowledge and reasoning from the LLM being applied to the camera data.
  Anyways
  - Re: (Score:2)
    
    by ceoyoyo ( 59147 ) writes:
    
    People who read a lot are certainly better drivers. Right?
    I assume Google is actually hoping that a more integrated model, which the OP correctly points out Telsa has been using, will work better than many separated subsystems, and the Gemini stuff is mostly hoping the visual training of the MLLM, rather than the language part, will give them a head start.
    Or they're pulling an OpenAI: "have big expensive hammer, must hit things."
    - Re: (Score:2)
      
      by timeOday ( 582209 ) writes:
      
      This is just research, so I do think they're just experimenting with building a driving model on a language model to see if useful knowledge bleeds through.
      On a lark, I just asked chatGPT:
      if I were driving down the road on the freeway behind a loaded-up pickup truck and some large rectangular object about the size of the bed and a foot thick fell off the back, what is it most likely? Answer in 5 words or less
      ChatGPT said:
      Plywood or large mattress
      
      not bad.
      - Re: Leading the way (Score:2)
        
        by fluffernutter ( 1411889 ) writes:
        
        You are making the rather outrageous assumption the sensors would pick up "about the size of a bed". Also, how long did it take chatgpt to return the result? Was it less than the tenth of a second needed to react in that situation?
        
        Re: (Score:2)
        
        by timeOday ( 582209 ) writes:
        
        "The" bed, as in the truck bed it's falling out of.
        But yes, I would be surprised if they could run full-sized Gemini even onboard a tractor-trailer with half of it devoted to a generator anytime soon.
- Re: (Score:2)
  
  by dvice ( 6309704 ) writes:
  
  No. Tesla trains with real data. Waymo is using real data to train AI that will generate new scenarios, which are then used to train the car. Later method allows generating scenarios that have never occurred.
  Waymo is way ahead of Tesla. Tesla started with the easy part (long roads with high speeds), Waymo knew that hardest part is driving in a city with low speeds, so they have spend years perfecting that area.
There's the rub of it (Score:2)

by OpenSourced ( 323149 ) writes:

For example, EMMA couldn't incorporate 3D sensor inputs from lidar or radar, which Waymo said was "computationally expensive." And it could only process a small amount of image frames at a time.

I've always thought that, when truly autonomous driving was developed, it would probably be too computationally expensive to be useful, either by the expensive hw needed or the amount of energy needed, or both.
Why, it's almost as if Waymo was owned by Google (Score:2)

by Rosco P. Coltrane ( 209368 ) writes:

and Google was desperately looking for a use case for their multi-billion dollar investment in AI...
- Re: (Score:2)
  
  by dvice ( 6309704 ) writes:
  
  For uses cases please look into AlphaFold2 and AlphaFold3. They were worth the Nobel prize and they are the focus of Google AI research.
Im going to have Optimus prime (Score:2)

by zawarski ( 1381571 ) writes:

Carry me on its back everywhere I want to go.
End-to-end is not auditable (Score:2)

by larryjoe ( 135075 ) writes:

End-to-end autonomous driving systems have been proposed and even successfully demonstrated for many years. The key reason why no one seriously considers them for practical systems is that they are entirely opaque and cannot be audited, diagnosed, or debugged. If there is a problem, the system cannot be fixed and has to be scrapped. For a smartphone, this is no big deal. For safety-critical systems, it's a showstopper. This is good research, but it will never be greenlit for a commercial system. Well,
Child, Cyclists are Human Sacrafices (Score:2)

by BrendaEM ( 871664 ) writes:

From the getgo, self-driving cars were made to kill people, just ask DARPA

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Comment removed (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: Hey maaan, ... (Score:2)

Re: Hey maaan, ... (Score:2)

Re: (Score:2)

Re: (Score:2)

Leading the way (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: Leading the way (Score:2)

Re: Leading the way (Score:2)

Re: Leading the way (Score:2)

Re: (Score:2)

Re: Leading the way (Score:2)

Re: (Score:2)

Re: Leading the way (Score:2)

Re: (Score:2)

Re: Leading the way (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Leading the way (Score:2)

Re: (Score:2)

Re: (Score:2)

There's the rub of it (Score:2)

Why, it's almost as if Waymo was owned by Google (Score:2)

Re: (Score:2)

Im going to have Optimus prime (Score:2)

End-to-end is not auditable (Score:2)

Child, Cyclists are Human Sacrafices (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals