Google Launches Two New Open LLMs (techcrunch.com) 15

Posted by msmash on Wednesday February 21, 2024 @10:00AM from the how-about-that dept.

Barely a week after launching the latest iteration of its Gemini models, Google today announced the launch of Gemma, a new family of lightweight open-weight models. From a report: Starting with Gemma 2B and Gemma 7B, these new models were "inspired by Gemini" and are available for commercial and research usage. Google did not provide us with a detailed paper on how these models perform against similar models from Meta and Mistral, for example, and only noted that they are "state-of-the-art."

The company did note that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face's leaderboard. To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia's NeMo. Once pre-trained and tuned, these models can then run everywhere. While Google highlights that these are open models, it's worth noting that they are not open-source. Indeed, in a press briefing ahead of today's announcement, Google's Janine Banks stressed the company's commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models.

Google Launches Two New Open LLMs

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 15 Comments Log In/Create an Account

Comments Filter:

Some TL/DRs (Score:5, Informative)

by Rei ( 128717 ) writes: on Wednesday February 21, 2024 @10:27AM (#64257026) Homepage

(As we've been discussing them in LLM training groups)
Technical report: here [googleapis.com]
License: Viral, but not as insidious as LLaMA. Basically sums up to "any derivatives (including things you made from the outputs of this model, not just using it as a foundation) have to be similarly virally licensed to mandate that it not be used for this List of Evil Things We Don't Want You To Do [google.dev]". Doesn't require that outputs only be used to train other Gemma models (like Meta does with LLaMA) and doesn't require you to license with them if any of your models becomes widely used.
Max context length: 8192 tokens
Trained on: 2T and 6T tokens respectively (1000:1 training data / weights ratios).
Performance: The 7B seems similar to or slightly better than Mistral 7B in instruction following. Both outperform in terms of "safety", though a lot of people might consider that a negative. But they put a lot of effort into making sure that e.g. there was no deanonymized personal information in the training dataset.
CO2 training footprint: 131t (about 9 years of the average America's footprint) - but 100% offset.
Tech: Modern but nothing spectacular.
Instruction format: Custom, with specialized tokens
Languages: Optimized for monolingual (English)
Overall: A solid additional set of models. Not sure the 7B will drag many people away from Mistral 7B, but the 2B is interesting, and worth trying vs. e.g. TinyLLaMA, Phi 2, etc. 8192 tokens is a nice context size for a model that small.

- - Re: (Score:2)
    
    by Rei ( 128717 ) writes:
    
    "4.6 Governing Law and Jurisdiction
    This Agreement will be governed by the laws of the State of California without regard to choice of law principles"
    - Re: (Score:2)
      
      by sg_oneill ( 159032 ) writes:
      
      That whole "without regard to choice of law principles" is nonsense contract filler. Juristiction isn't always something you can just claim immunity from. Judges tend to have strong feelings about that lol
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  One additional nice plus:
  Vocab size: 256k tokens
  Downside:
  Head dim: 256
  The reason the latter is concerning is here [github.com]: "All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800."
  Aka, no finetuning with flash attention on home GPUs? Hmm....
- Ignore the viral license (Score:2)
  
  by allo ( 1728082 ) writes:
  
  1) Make sure never to agree to a contract regarding the model. Don't use the hugginface version that requires you to agree to conditions. You may be bound to them otherwise even when you don't need a license for the weights.
  2) The weights itself can't be copyrighted, so you can use them without a license. Without a license you aren't bound to what they write in the license.
  Source for the claim that there can't be copyright on weights:
  https://old.reddit.com/r/Local... [reddit.com]
  TL;DR reason why there is no copyright: C
a new family of lightweight open-weight models (Score:2)

by rossdee ( 243626 ) writes:

WTF does that mean?
Isn't that contradictory?
And how do you 'weigh' an LLM anyway
- Re: (Score:2)
  
  by Marxist Hacker 42 ( 638312 ) * writes:
  
  I think weight, in this case, refers to number of stored and fully vectorized tokens.
- Re: (Score:2)
  
  by Rei ( 128717 ) writes:
  
  I'm not sure what's confusing about it?
  A: Implies that it's one among many
  New: Just released
  Family: More than one model
  Of: Pertaining to
  Lightweight: Not having a huge number of parameters; something easily run on consumer-grade hardware, even older hardware
  Open-weight models: Anyone can download the model weights and run them themselves.
stupid names (Score:2)

by groobly ( 6155920 ) writes:

Choosing random names that do not evoke what the product does is stupid. Few are going to remember these names, except that they start with "G," much like "Garbage."
But the real question is: are they woke enough? (Score:1)

by elcor ( 4519045 ) writes:

Gemini surprised everyone by out competing Goodie-2 and GPT in wokiness and nonsensical phrases. How will these compare? Who will win the fight to DEI supremacy?
- Re: (Score:2)
  
  by Stephen Chadfield ( 7971 ) writes:
  
  Pretty soon AI will refuse to chat with you if you refuse to use its preferred pronouns.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google Launches Two New Open LLMs (techcrunch.com) 15

Google Launches Two New Open LLMs More Login

Google Launches Two New Open LLMs

Some TL/DRs (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Ignore the viral license (Score:2)

a new family of lightweight open-weight models (Score:2)

Re: (Score:2)

Re: (Score:2)

stupid names (Score:2)

But the real question is: are they woke enough? (Score:1)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot