Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Microsoft AI Technology

Microsoft Unveils AI Model That Understands Image Content, Solves Visual Puzzles (arstechnica.com) 46

Researchers from Microsoft have introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions. From a report: The researchers believe multimodal AI -- which integrates different modes of input such as text, audio, images, and video -- is a key step to building artificial general intelligence (AGI) that can perform general tasks at the level of a human. "Being a basic part of intelligence, multimodal perception is a necessity to achieve artificial general intelligence, in terms of knowledge acquisition and grounding to the real world," the researchers write in their academic paper, Language Is Not All You Need: Aligning Perception with Language Models.

Visual examples from the Kosmos-1 paper show the model analyzing images and answering questions about them, reading text from an image, writing captions for images, and taking a visual IQ test with 22â"26 percent accuracy. [...] In this case, Kosmos-1 appears to be purely a Microsoft project, without OpenAI's involvement. The researchers call their creation a "multimodal large language model" (MLLM) because its roots lie in natural language processing, like a text-only LLM, such as ChatGPT. And it shows: For Kosmos-1 to accept image input, the researchers must first translate the image into a special series of tokens (basically text) that the LLM can understand.

This discussion has been archived. No new comments can be posted.

Microsoft Unveils AI Model That Understands Image Content, Solves Visual Puzzles

Comments Filter:
  • Anybody know where the source code for any of this is? I'd like to take the engine, feed it the Project Gutenberg database, and call it "VictorianAI"- since that's 99% of the text in Project Gutenberg. I think it might be small enough to fit on a desktop device and just crank away at it for a few months, and it'll provide a better signal-to-noise ratio for how to be human than whatever they fed ChatGPT on.

  • by GargamelSpaceman ( 992546 ) on Friday March 03, 2023 @01:05PM (#63339173) Homepage Journal

    If so, are we going to have to have PGP keysigning parties to form a web of trust where we all certify that each others' keys are owned by a flesh and blood human and nothing else so as to retain anonymity while keeping out AI chatbots from discussions?

    • Don't be such an aiophobe. Should't you care about the quality of the comment, not the author?
      • No, because it will drown out the content from humans in a wall of spam. All contents will be from bots run by people trying to keep me from seeing something, sell me something. They will be taylored to make my search for information as fruitless as possible. People will leave and forums will die populated by bots talking to themselves. If people want forums they will need some kind of way to prove they are human. But there's no need to prove id. Let's keep anonymity.

        • No, because it will drown out the content from humans in a wall of spam. All contents will be from bots run by people trying to keep me from seeing something, sell me something.

          Yes. It will be a deluge of targeted misinformation and disinformation. It's inevitable.

        • by narcc ( 412956 )

          Sounds like twitter.

  • by awwshit ( 6214476 )

    With "22â"26 percent accuracy" its a no brainer.

  • is a key step to building artificial general intelligence (AGI) that can perform general tasks at the level of a human.

    And then at the level higher than that of a human. (And that will be done by AI itself).

  • I need a Link (Score:5, Insightful)

    by Lando ( 9348 ) <lando2+slashNO@SPAMgmail.com> on Friday March 03, 2023 @03:33PM (#63339583) Homepage Journal

    To solve all these damn Capchas that pop up. I seem to need to go at least 6 of them before it verifies me as not a robot, be a lot easier if I could have a robot do them for me.

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...