Google Launches Two New Open LLMs (techcrunch.com) 15
Barely a week after launching the latest iteration of its Gemini models, Google today announced the launch of Gemma, a new family of lightweight open-weight models. From a report: Starting with Gemma 2B and Gemma 7B, these new models were "inspired by Gemini" and are available for commercial and research usage. Google did not provide us with a detailed paper on how these models perform against similar models from Meta and Mistral, for example, and only noted that they are "state-of-the-art."
The company did note that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face's leaderboard. To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia's NeMo. Once pre-trained and tuned, these models can then run everywhere. While Google highlights that these are open models, it's worth noting that they are not open-source. Indeed, in a press briefing ahead of today's announcement, Google's Janine Banks stressed the company's commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models.
The company did note that these are dense decoder-only models, though, which is the same architecture it used for its Gemini models (and its earlier PaLM models) and that we will see the benchmarks later today on Hugging Face's leaderboard. To get started with Gemma, developers can get access to ready-to-use Colab and Kaggle notebooks, as well as integrations with Hugging Face, MaxText and Nvidia's NeMo. Once pre-trained and tuned, these models can then run everywhere. While Google highlights that these are open models, it's worth noting that they are not open-source. Indeed, in a press briefing ahead of today's announcement, Google's Janine Banks stressed the company's commitment to open source but also noted that Google is very intentional about how it refers to the Gemma models.
Some TL/DRs (Score:5, Informative)
(As we've been discussing them in LLM training groups)
Technical report: here [googleapis.com]
License: Viral, but not as insidious as LLaMA. Basically sums up to "any derivatives (including things you made from the outputs of this model, not just using it as a foundation) have to be similarly virally licensed to mandate that it not be used for this List of Evil Things We Don't Want You To Do [google.dev]". Doesn't require that outputs only be used to train other Gemma models (like Meta does with LLaMA) and doesn't require you to license with them if any of your models becomes widely used.
Max context length: 8192 tokens
Trained on: 2T and 6T tokens respectively (1000:1 training data / weights ratios).
Performance: The 7B seems similar to or slightly better than Mistral 7B in instruction following. Both outperform in terms of "safety", though a lot of people might consider that a negative. But they put a lot of effort into making sure that e.g. there was no deanonymized personal information in the training dataset.
CO2 training footprint: 131t (about 9 years of the average America's footprint) - but 100% offset.
Tech: Modern but nothing spectacular.
Instruction format: Custom, with specialized tokens
Languages: Optimized for monolingual (English)
Overall: A solid additional set of models. Not sure the 7B will drag many people away from Mistral 7B, but the 2B is interesting, and worth trying vs. e.g. TinyLLaMA, Phi 2, etc. 8192 tokens is a nice context size for a model that small.
Re: (Score:2)
"4.6 Governing Law and Jurisdiction
This Agreement will be governed by the laws of the State of California without regard to choice of law principles"
Re: (Score:2)
That whole "without regard to choice of law principles" is nonsense contract filler. Juristiction isn't always something you can just claim immunity from. Judges tend to have strong feelings about that lol
Re: (Score:2)
One additional nice plus:
Vocab size: 256k tokens
Downside:
Head dim: 256
The reason the latter is concerning is here [github.com]: "All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800."
Aka, no finetuning with flash attention on home GPUs? Hmm....
Ignore the viral license (Score:2)
1) Make sure never to agree to a contract regarding the model. Don't use the hugginface version that requires you to agree to conditions. You may be bound to them otherwise even when you don't need a license for the weights.
2) The weights itself can't be copyrighted, so you can use them without a license. Without a license you aren't bound to what they write in the license.
Source for the claim that there can't be copyright on weights:
https://old.reddit.com/r/Local... [reddit.com]
TL;DR reason why there is no copyright: C
a new family of lightweight open-weight models (Score:2)
WTF does that mean?
Isn't that contradictory?
And how do you 'weigh' an LLM anyway
Re: (Score:2)
I think weight, in this case, refers to number of stored and fully vectorized tokens.
Re: (Score:2)
I'm not sure what's confusing about it?
A: Implies that it's one among many
New: Just released
Family: More than one model
Of: Pertaining to
Lightweight: Not having a huge number of parameters; something easily run on consumer-grade hardware, even older hardware
Open-weight models: Anyone can download the model weights and run them themselves.
stupid names (Score:2)
Choosing random names that do not evoke what the product does is stupid. Few are going to remember these names, except that they start with "G," much like "Garbage."
But the real question is: are they woke enough? (Score:1)
Re: (Score:2)
Pretty soon AI will refuse to chat with you if you refuse to use its preferred pronouns.