Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AI

Nvidia and Mistral's New Model 'Mistral-NeMo' Brings Enterprise-Grade AI To Desktop Computers (venturebeat.com) 23

Nvidia and French startup Mistral AI jointly announced today the release of a new language model designed to bring powerful AI capabilities directly to business desktops. From a report: The model, named Mistral-NeMo, boasts 12 billion parameters and an expansive 128,000 token context window, positioning it as a formidable tool for businesses seeking to implement AI solutions without the need for extensive cloud resources. Bryan Catanzaro, vice president of applied deep learning research at Nvidia, emphasized the model's accessibility and efficiency in a recent interview with VentureBeat. "We're launching a model that we jointly trained with Mistral. It's a 12 billion parameter model, and we're launching it under Apache 2.0," he said. "We're really excited about the accuracy of this model across a lot of tasks."

The collaboration between Nvidia, a titan in GPU manufacturing and AI hardware, and Mistral AI, a rising star in the European AI scene, represents a significant shift in the AI industry's approach to enterprise solutions. By focusing on a more compact yet powerful model, the partnership aims to democratize access to advanced AI capabilities. Catanzaro elaborated on the advantages of smaller models. "The smaller models are just dramatically more accessible," he said. "They're easier to run, the business model can be different, because people can run them on their own systems at home. In fact, this model can run on RTX GPUs that many people have already."

This discussion has been archived. No new comments can be posted.

Nvidia and Mistral's New Model 'Mistral-NeMo' Brings Enterprise-Grade AI To Desktop Computers

Comments Filter:
  • by Rei ( 128717 ) on Thursday July 18, 2024 @11:17AM (#64635247) Homepage

    I still use Mixtral as my go-to midsized model, and Mistral 7B v0,3 as my small model. Would love a new release of Mixtral - I've tried many other models, but there's always something that's important to my task that Mixtral just does better, whether it's creativity, instruction-following accuracy, not outputting superfluous info, handling non-English data, or whatnot. Hope they release a new version at some point like they did with Mistral 7B.

    They're also big on new architectures. They were the first to release an open-source MoE. Now they've released an open-source Mamba model (Codestral Mamba 22B), so not even Transformers.

    To top it all off, they don't use a pseudo-open-source license (like LLaMA, Qwen, etc), but a true open source license (Apache 2.0).

    Go Mistral! :)

    • Comment removed based on user account deletion
      • Comment removed based on user account deletion
      • by test321 ( 8891681 ) on Thursday July 18, 2024 @11:46AM (#64635341)

        In certain industries you cannot send the data outside because nobody knows if the data will be used in model training. It is either straight illegal to process data without strong privacy guarantees (medical, financial data) or just ill-advised and therefore disallowed by the company's legal department (e.g. patent proposals).

      • by Rei ( 128717 )

        So, first off, before going into training I'd get set up and comfortable with local inference. :) This means setting up CUDA, and then an inference program. If you want a GUI and chat interface, I'd recommend text-generation-webui, by Oobabooga. If you want a API, llama.cpp is probably a better option (be sure to compile it with CUDA support; it's disabled by default!). Remember when running models to respect their max context length, otherwise the model won't "see" part of your prompt on long queries.

        • by Rei ( 128717 )

          As for what you can do with them, your imagination is the limit. Some people train them for porn. Some people for business tasks (structured output in target formats and the like). Some people train expert models on specific subjects. Some people feed them things having nothing whatsover to do with text - like, you could teach it to do CFD or protein folding or whatnot. Some people make agent systems that talk together to coordinate more complex tasks. Some people try to create whole new advanced archit

    • For a small model, I've found that Phi-3-mini is way better than Mistral 7B even though it only has 3.8B parameters. Haven't tried the small (7B) or medium (14B) versions. I'd expect them to be even better. They're also truly open source (MIT).

      • by Rei ( 128717 )

        I'll admit that I haven't messed with the Phi series since the original Phi-3 Mini release (they now have a new version out that apparently incorporated some of my feedback, good on them!). The original version had a big weakness in the form of insisting on rambling on and adding context (or even moralizing) even when you told it not to, which made it very difficult to use for structured output. While not part of my tasks, there were other people complaining about how they'd give it a storywriting task an

        • there were other people complaining about how they'd give it a storywriting task and even in that it'd break the flow in the middle of the story to go off on asides or moralizing.

          You can only expect so much from a tiny model, but it's still far better than Mistral 7B, which is next to useless for storywriting tasks. It's nearly impossible to get it to write a story longer than three paragraphs, and everything comes out as a children's story about nice people where everything is good. Ask it to expand and it just writes a different three paragraph story. Ask it to add conflict and it adds a few ominous words before everything ends up happily.

  • by gweihir ( 88907 ) on Thursday July 18, 2024 @11:41AM (#64635319)

    Why would I want that? Oh, right, I do not.

  • by monkeyporn ( 663471 ) on Thursday July 18, 2024 @11:43AM (#64635327)

    Minimum requirements: 1x A100 80G
    The one I could find was listed for $17,200 US.

    • by Anonymous Coward

      Get a quantized version and offload a few layers to CPU and you can use a 3060 12GB (the card that seems to be the sweet sport for price/performance at the moment). You will have to wait a week for llama.cpp to support it (so people can start quantizing), though.

  • Comment removed based on user account deletion
  • by rossdee ( 243626 ) on Thursday July 18, 2024 @01:45PM (#64635613)

    But which Enterprise? NX01 ? NCC-1701 ? NCC-1701-E ?

  • It's like those pyramid scheme, you need to involve other people to keep this bubble not bursting.

    I mean, at some point NVIDIA will saturate the cloud data center market with its chips so it needs to expand to keep selling.

    One just retch and regurgitate what their models contain, trained on data you've been given. So much for A.I.

  • https://arxiv.org/abs/2406.062... [arxiv.org]

    The future of real time local generative LM is out of core (ie. parameters stored in cache) and sparse. When you look at what Powerinfer-2 can do with models not even trained for activation sparsity, there is a ton of gas in the tank. Local models can be just as big as server models, they just need to be trained to use a very small subset during inferrence, much smaller yet than current MoE models. Also they need to be architecturally suited to be able to retrieve that subs

"Consistency requires you to be as ignorant today as you were a year ago." -- Bernard Berenson

Working...