Nvidia and Mistral's New Model 'Mistral-NeMo' Brings Enterprise-Grade AI To Desktop Computers (venturebeat.com) 23
Nvidia and French startup Mistral AI jointly announced today the release of a new language model designed to bring powerful AI capabilities directly to business desktops. From a report: The model, named Mistral-NeMo, boasts 12 billion parameters and an expansive 128,000 token context window, positioning it as a formidable tool for businesses seeking to implement AI solutions without the need for extensive cloud resources. Bryan Catanzaro, vice president of applied deep learning research at Nvidia, emphasized the model's accessibility and efficiency in a recent interview with VentureBeat. "We're launching a model that we jointly trained with Mistral. It's a 12 billion parameter model, and we're launching it under Apache 2.0," he said. "We're really excited about the accuracy of this model across a lot of tasks."
The collaboration between Nvidia, a titan in GPU manufacturing and AI hardware, and Mistral AI, a rising star in the European AI scene, represents a significant shift in the AI industry's approach to enterprise solutions. By focusing on a more compact yet powerful model, the partnership aims to democratize access to advanced AI capabilities. Catanzaro elaborated on the advantages of smaller models. "The smaller models are just dramatically more accessible," he said. "They're easier to run, the business model can be different, because people can run them on their own systems at home. In fact, this model can run on RTX GPUs that many people have already."
The collaboration between Nvidia, a titan in GPU manufacturing and AI hardware, and Mistral AI, a rising star in the European AI scene, represents a significant shift in the AI industry's approach to enterprise solutions. By focusing on a more compact yet powerful model, the partnership aims to democratize access to advanced AI capabilities. Catanzaro elaborated on the advantages of smaller models. "The smaller models are just dramatically more accessible," he said. "They're easier to run, the business model can be different, because people can run them on their own systems at home. In fact, this model can run on RTX GPUs that many people have already."
Mistral is doing great work. (Score:5, Informative)
I still use Mixtral as my go-to midsized model, and Mistral 7B v0,3 as my small model. Would love a new release of Mixtral - I've tried many other models, but there's always something that's important to my task that Mixtral just does better, whether it's creativity, instruction-following accuracy, not outputting superfluous info, handling non-English data, or whatnot. Hope they release a new version at some point like they did with Mistral 7B.
They're also big on new architectures. They were the first to release an open-source MoE. Now they've released an open-source Mamba model (Codestral Mamba 22B), so not even Transformers.
To top it all off, they don't use a pseudo-open-source license (like LLaMA, Qwen, etc), but a true open source license (Apache 2.0).
Go Mistral! :)
Re: (Score:3)
Re: (Score:2)
Re:Mistral is doing great work. (Score:5, Informative)
In certain industries you cannot send the data outside because nobody knows if the data will be used in model training. It is either straight illegal to process data without strong privacy guarantees (medical, financial data) or just ill-advised and therefore disallowed by the company's legal department (e.g. patent proposals).
Re: (Score:3)
So, first off, before going into training I'd get set up and comfortable with local inference. :) This means setting up CUDA, and then an inference program. If you want a GUI and chat interface, I'd recommend text-generation-webui, by Oobabooga. If you want a API, llama.cpp is probably a better option (be sure to compile it with CUDA support; it's disabled by default!). Remember when running models to respect their max context length, otherwise the model won't "see" part of your prompt on long queries.
Re: (Score:2)
As for what you can do with them, your imagination is the limit. Some people train them for porn. Some people for business tasks (structured output in target formats and the like). Some people train expert models on specific subjects. Some people feed them things having nothing whatsover to do with text - like, you could teach it to do CFD or protein folding or whatnot. Some people make agent systems that talk together to coordinate more complex tasks. Some people try to create whole new advanced archit
It looks like you're trying to create a document.. (Score:2)
Re: (Score:2)
My current project for example is developing a model optimized tasks that require zero trivia knowledge, so that it can run them extremely fast on a very small model. Summarization, search, sentiment analysis, topic hashtagging, smart pagination, RAG, translation, text cleanup, and so forth. The output is highly structured (originally I was using json but I switched to a custom format to avoid the need to escape non-English characters) so it can be automatically read into data structures.
The goal is a Pyth
Re: (Score:2)
For a small model, I've found that Phi-3-mini is way better than Mistral 7B even though it only has 3.8B parameters. Haven't tried the small (7B) or medium (14B) versions. I'd expect them to be even better. They're also truly open source (MIT).
Re: (Score:2)
I'll admit that I haven't messed with the Phi series since the original Phi-3 Mini release (they now have a new version out that apparently incorporated some of my feedback, good on them!). The original version had a big weakness in the form of insisting on rambling on and adding context (or even moralizing) even when you told it not to, which made it very difficult to use for structured output. While not part of my tasks, there were other people complaining about how they'd give it a storywriting task an
Re: (Score:2)
there were other people complaining about how they'd give it a storywriting task and even in that it'd break the flow in the middle of the story to go off on asides or moralizing.
You can only expect so much from a tiny model, but it's still far better than Mistral 7B, which is next to useless for storywriting tasks. It's nearly impossible to get it to write a story longer than three paragraphs, and everything comes out as a children's story about nice people where everything is good. Ask it to expand and it just writes a different three paragraph story. Ask it to add conflict and it adds a few ominous words before everything ends up happily.
So industrial strength stupid in your PC? (Score:3)
Why would I want that? Oh, right, I do not.
Enterprise-class only for enterprises (Score:5, Informative)
Minimum requirements: 1x A100 80G
The one I could find was listed for $17,200 US.
Re: (Score:1)
Get a quantized version and offload a few layers to CPU and you can use a 3060 12GB (the card that seems to be the sweet sport for price/performance at the moment). You will have to wait a week for llama.cpp to support it (so people can start quantizing), though.
Re: (Score:2)
Enterprise grade AI (Score:3)
But which Enterprise? NX01 ? NCC-1701 ? NCC-1701-E ?
Re: (Score:2)
NCC-1701, no bloody A, B, C, or D
A.I. Pyramid Scheme (Score:2)
It's like those pyramid scheme, you need to involve other people to keep this bubble not bursting.
I mean, at some point NVIDIA will saturate the cloud data center market with its chips so it needs to expand to keep selling.
One just retch and regurgitate what their models contain, trained on data you've been given. So much for A.I.
Powerinfer-2 does it better (Score:2)
https://arxiv.org/abs/2406.062... [arxiv.org]
The future of real time local generative LM is out of core (ie. parameters stored in cache) and sparse. When you look at what Powerinfer-2 can do with models not even trained for activation sparsity, there is a ton of gas in the tank. Local models can be just as big as server models, they just need to be trained to use a very small subset during inferrence, much smaller yet than current MoE models. Also they need to be architecturally suited to be able to retrieve that subs
Re: (Score:2)
Meant to say parameters stored on drive.