Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Google Technology

Google Launches Two New Open LLMs (techcrunch.com) 15

Barely a week after launching the latest iteration of its Gemini models, Google today announced the launch of Gemma, a new family of lightweight open-weight models. From a report: Starting with Gemma 2B and Gemma 7B, these new models were "inspired by Gemini" and are available for commercial and research usage. Google did not provide us with a detailed paper on how these models perform against similar models from Meta and Mistral, for example, and only noted that they are "state-of-the-art."

The company did note that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face's leaderboard. To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia's NeMo. Once pre-trained and tuned, these models can then run everywhere. While Google highlights that these are open models, it's worth noting that they are not open-source. Indeed, in a press briefing ahead of today's announcement, Google's Janine Banks stressed the company's commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models.

This discussion has been archived. No new comments can be posted.

Google Launches Two New Open LLMs

Comments Filter:
  • Some TL/DRs (Score:5, Informative)

    by Rei ( 128717 ) on Wednesday February 21, 2024 @09:27AM (#64257026) Homepage

    (As we've been discussing them in LLM training groups)

    Technical report: here [googleapis.com]

    License: Viral, but not as insidious as LLaMA. Basically sums up to "any derivatives (including things you made from the outputs of this model, not just using it as a foundation) have to be similarly virally licensed to mandate that it not be used for this List of Evil Things We Don't Want You To Do [google.dev]". Doesn't require that outputs only be used to train other Gemma models (like Meta does with LLaMA) and doesn't require you to license with them if any of your models becomes widely used.

    Max context length: 8192 tokens

    Trained on: 2T and 6T tokens respectively (1000:1 training data / weights ratios).

    Performance: The 7B seems similar to or slightly better than Mistral 7B in instruction following. Both outperform in terms of "safety", though a lot of people might consider that a negative. But they put a lot of effort into making sure that e.g. there was no deanonymized personal information in the training dataset.

    CO2 training footprint: 131t (about 9 years of the average America's footprint) - but 100% offset.

    Tech: Modern but nothing spectacular.

    Instruction format: Custom, with specialized tokens

    Languages: Optimized for monolingual (English)

    Overall: A solid additional set of models. Not sure the 7B will drag many people away from Mistral 7B, but the 2B is interesting, and worth trying vs. e.g. TinyLLaMA, Phi 2, etc. 8192 tokens is a nice context size for a model that small.

    • by Rei ( 128717 )

      One additional nice plus:

      Vocab size: 256k tokens

      Downside:

      Head dim: 256

      The reason the latter is concerning is here [github.com]: "All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800."

      Aka, no finetuning with flash attention on home GPUs? Hmm....

    • 1) Make sure never to agree to a contract regarding the model. Don't use the hugginface version that requires you to agree to conditions. You may be bound to them otherwise even when you don't need a license for the weights.
      2) The weights itself can't be copyrighted, so you can use them without a license. Without a license you aren't bound to what they write in the license.

      Source for the claim that there can't be copyright on weights:
      https://old.reddit.com/r/Local... [reddit.com]

      TL;DR reason why there is no copyright: C

  • WTF does that mean?
    Isn't that contradictory?

    And how do you 'weigh' an LLM anyway

    • I think weight, in this case, refers to number of stored and fully vectorized tokens.

    • by Rei ( 128717 )

      I'm not sure what's confusing about it?

      A: Implies that it's one among many
      New: Just released
      Family: More than one model
      Of: Pertaining to
      Lightweight: Not having a huge number of parameters; something easily run on consumer-grade hardware, even older hardware
      Open-weight models: Anyone can download the model weights and run them themselves.

  • Choosing random names that do not evoke what the product does is stupid. Few are going to remember these names, except that they start with "G," much like "Garbage."

  • Gemini surprised everyone by out competing Goodie-2 and GPT in wokiness and nonsensical phrases. How will these compare? Who will win the fight to DEI supremacy?

If you don't have time to do it right, where are you going to find the time to do it over?

Working...