Microsoft Unveils AI Model That Understands Image Content, Solves Visual Puzzles

Microsoft Unveils AI Model That Understands Image Content, Solves Visual Puzzles (arstechnica.com) 46

Posted by msmash on Friday March 03, 2023 @12:40PM from the moving-forward dept.

Researchers from Microsoft have introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions. From a report: The researchers believe multimodal AI -- which integrates different modes of input such as text, audio, images, and video -- is a key step to building artificial general intelligence (AGI) that can perform general tasks at the level of a human. "Being a basic part of intelligence, multimodal perception is a necessity to achieve artificial general intelligence, in terms of knowledge acquisition and grounding to the real world," the researchers write in their academic paper, Language Is Not All You Need: Aligning Perception with Language Models.

Visual examples from the Kosmos-1 paper show the model analyzing images and answering questions about them, reading text from an image, writing captions for images, and taking a visual IQ test with 22â"26 percent accuracy. [...] In this case, Kosmos-1 appears to be purely a Microsoft project, without OpenAI's involvement. The researchers call their creation a "multimodal large language model" (MLLM) because its roots lie in natural language processing, like a text-only LLM, such as ChatGPT. And it shows: For Kosmos-1 to accept image input, the researchers must first translate the image into a special series of tokens (basically text) that the LLM can understand.

Microsoft Unveils AI Model That Understands Image Content, Solves Visual Puzzles

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 46 Comments Log In/Create an Account

Comments Filter:

- Re: (Score:2)
  
  by Applehu Akbar ( 2968043 ) writes:
  
  No more CAPTCHAS now? Then it's GOOD. Really good.
  - Re: (Score:2)
    
    by Merk42 ( 1906718 ) writes:
    
    Bots spamming sites even more than they do now? Oh yeah, that's really good.
    You do know CAPTCHAs exist for a purpose and that purpose isn't "just to annoy you", right?
    - Re: M$ BAD!! +eleventy billion insightful (Score:2)
      
      by Ambigwitty ( 10261124 ) writes:
      
      All bots aren't created equally. When the subprime mortgage meltdown was in full swing, the trustees tried to make scraping their data difficult. Fuck them, we took a look anyway.
    - Re: (Score:2)
      
      by Applehu Akbar ( 2968043 ) writes:
      
      You do know CAPTCHAs exist for a purpose and that purpose isn't "just to annoy you", right?
      No I don't. CAPTCHAs were implemented as an IT fad that only sort-of-worked early on, when the puzzles were easy for humans to solve and impossible for bots. The bots got better at solving them so fast that desperate attempts to keep the idea viable viable by blurring the figures and adding more random squiggles just made the worst of them impossible for humans and solvable by bots, which is exactly the opposite of what was intended.
      Site operators, get off your lazy butts and give us two-factor authenticact
- Re: Nope (Score:5, Insightful)
  
  by fullgandoo ( 1188759 ) writes: on Friday March 03, 2023 @01:24PM (#63339235)
  
  Prove to me that you understand things and aren't just responding to stimuli based on a huge neural network in your brain that's been trained with decades of input patterns.
  
  - Re: Nope (Score:5, Informative)
    
    by JoshuaZ ( 1134087 ) writes: on Friday March 03, 2023 @01:36PM (#63339271) Homepage
    
    I can't speak for the other user, but that is exactly what I'm doing. But my neural net has also been trained to use the word "understand" to indicate an area where it has a lot of certainty about the desired outputs. So in that sense, this neutral net understands things.
    
    - Re: (Score:2)
      
      by phantomfive ( 622387 ) writes:
      
      Amazingly, something you can do that chatGPT can't do: determine whether a set of parenthesis are balanced or not.
    - Re: (Score:2)
      
      by raftpeople ( 844215 ) writes:
      
      I think our neural nets go beyond that to create our "understanding". It seems like a key aspect is having an internal model of something, learned and built up over time, and being able to use that model to test the validity of questions or future state.
      
      It's not just word correlation or image correlation, it's a model for the substantial attributes, functions and rules of operation of the thing being modeled.
    - Re: (Score:2)
      
      by iMadeGhostzilla ( 1851560 ) writes:
      
      > Prove to me that you understand things and aren't just responding to stimuli based on a huge neural network
      Neural networks and psychotic people "hallucinate". Normal people -- chances are the poster is one -- do not. Hence, he almost certainly undestands, in a way different from the way neural networks process stimuli.
      Our huge neural network does its thing perhaps -- and perhaps not -- unlike the ML neural network, but there is another quality in our minds that is aware of what the neural network is do
      - Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        Normal people hallucinate all the time. They misremember facts, or get confused about procedures, or even see things that are not there due to optical illusions or simply misunderstanding what they are seeing. As for consciousness, a major part of the problem there is that the word has a wide variety of different meanings, ranging from simply being able to process and model one's self, to some sort of nebulous notion of intrinsic awareness. For the first one, we have a lot more of an idea of what is going
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        Neural networks don't "hallucinate". That's nonsense intended to make you think that more is happening than is actually happening. Just like the word "neural" is supposed to make you think of biological brains, even though they're not similar in any way.
        Again, there is absolutely nothing like "understanding" happening here. It's just statistics and probability.
        I don't think that ChatGPT is conscious, but the case for it being not conscious is surprisingly difficult.
        We don't need bad philosophy. We have a complete understanding of the system. You might as well be asking if nighttime is conscious. It's a mean
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        If you think that these are just "statistics and probability" then please explain in what sense you have any evidence that you are I are not just statistics and probability?
        We don't need bad philosophy. We have a complete understanding of the system. You might as well be asking if nighttime is conscious. It's a meaningless question.
        We have a pretty poor understanding of these systems, as demonstrated by the fact that we have trouble getting them to output what we want. Note how much trouble OpenAI has had preventing the system from giving out answers they deem controversial or biased or otherwise not what they want. Our understanding of what these systems are doing
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        Note how much trouble OpenAI has had preventing the system from giving out answers they deem controversial or biased or otherwise not what they want.
        If you had even a basic understanding of these systems, that wouldn't surprise you in the least. This magical thinking of yours would come to a quick and decisive end if you'd just take the time to learn something about the technology.
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        You appear to be more interested in insults and comments about how much other people know, while not actually reading what they wrote. I agree that it is not surprising that OpenAI's attempts in this context have failed. But that was not the claim in question at all. The point was that you asserted that we had a "complete understanding" of these systems. That is very much not true as demonstrated by the inability to control their output effectively. There is currently a massive number of researchers workin
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        The point was that you asserted that we had a "complete understanding" of these systems. That is very much not true as demonstrated by the inability to control their output effectively.
        
        Indeed. One thing my statistics prof back when I got my CS MA said was "statistics is exceptionally unintuitive" and he was right. The problem is that statistics uses massive amounts of exceptionally shallow correlations, while logical reasoning goes deep but very narrow with a comparably small number of steps (i.e. _implications_). That is why automatons like ChatGPT can simulate relatively simple behaviors, but cannot reason one bit. Statistical models are simply unsuitable for reasoning in general.
        At the
        
        Re: (Score:2)
        
        by narcc ( 412956 ) writes:
        
        That is very much not true as demonstrated by the inability to control their output effectively.
        That's completely absurd. One thing has nothing to do with the other. Just because something is completely understood does not mean we can control it at will. We can have a full and complete understanding of some hash function, for example. Given an input and an output, we can give a full and complete accounting of how and why the output was produced. However, we can't go the other way, start with a desired output and work backwards to find all the possible inputs that will produce that hash.
        There is currently a massive number of researchers working on trying to understand what LLMs are doing internally.
        You seem
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        That is all well and good, and I agree with almost all of it. (And I strongly agree with using using and theorem verification systems in conjunction with LLMs). But none of if remotely implies we have a complete understanding of what is going on here. It does not follow from any part of your comment.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Ahem, I claimed we do _not_ habe a complete understanding? And I argued we may not be able to get one?
        
        Re: (Score:2)
        
        by JoshuaZ ( 1134087 ) writes:
        
        I'm sorry! I confused you with narcc, the other commentator. In that case, no substantial disagreement.
        
        Re: (Score:2)
        
        by gweihir ( 88907 ) writes:
        
        Again, there is absolutely nothing like "understanding" happening here. It's just statistics and probability.
        Indeed. But Physicallists are quasi-religious fanatical idiots that are deep in delusion and do not understand anything. The fact of the matter is that a smart human being realizes he/she/it is intelligent and can have insight and this is directly tied to consciousness. The "Heureka!" moment, if you will. (No idea whether dumb humans have that, but I know tons of smart people that all report having experienced this element of human existence...)
        Machines, according to current Physics, cannot have consciousne
        
        Re: (Score:2)
        
        by iMadeGhostzilla ( 1851560 ) writes:
        
        It's good to remember we cannot even define life or measure if something is alive, and consciousness is more sutble than life.
        The paper is interesting but is missing a reason for its existence: why are we even asking if LLMs are "conscious"? We don't have a definition of a way to measure, yet like life we know it when we see it. I take it that the real meaning of the question is "are LLMs anything like us" in terms of the mind? The answer is, no, definitely not, just like we know that a robot doll is not al
        
        Re: (Score:1)
        
        by gweihir ( 88907 ) writes:
        
        It's good to remember we cannot even define life or measure if something is alive, and consciousness is more sutble than life.
        Indeed. Life is still unknown as to how it works. Physicalist idiots will claim a cell is a mechanistic machine, but that is clearly a conjecture. If it were the case, why have all attempts to create life artificially failed? Unless and until we can create life artificially (and no, a virus is _not_ alive), we do not know. As to consciousness, the current Physics standard model does not have a mechanism for it. In fact you can credibly argue it does pretty clearly say "physical objects do not have identit
        
        Re: (Score:2)
        
        by iMadeGhostzilla ( 1851560 ) writes:
        
        Agreed, nothing I would add, or take away.
        FWIW I have, like many, found Nietzsche's thinking to be instructive in tearing to the ground the delusional ideas you mentioned, even if I am not sold on what he suggests building up as a replacement.
        
        Re: (Score:1)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    For text, it's easy. No human has anywhere near the amount of input data that these LLMs have. Not even close.
  - Re: (Score:2)
    
    by narcc ( 412956 ) writes:
    
    You're confused, I can only assume because the word "neural" in the term "neural networks" makes you think that these things are anything at all like brains. They are not. That's silly nonsense. Brains can do many things that neural networks can not do.
    Oh, and these AI things are doing a lot less than you believe them to be doing.
    No, there is absolutely nothing like understanding here. Do you really need me to explain this yet again? I'm happy to, but it's getting really old.
    - Re: (Score:2)
      
      by gweihir ( 88907 ) writes:
      
      Indeed. No idea why some people insist on refusing to see the facts. Anti-vaxers, flat-earthers, physicalist, etc. all believing the most demented nonsense in the face of overwhelming evidence to the contrary.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    You obviously do not have effective intelligence (after all you just claimed to be an automaton and these provable cannot be intelligent), so proving that to you is impossible.
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    Incidentally, Physicalists are idiots pretty much on the same ignorance level as flat-earthers.
- Re: (Score:2)
  
  by narcc ( 412956 ) writes:
  
  I'm starting to wonder if it's even worth the effort. It's like arguing with creationists. These people really, really, want to believe that their science fiction fantasy has come true. Some of them want their virtual girlfriend to really love them back. Others want to live until the batteries give out in Ray Kurzweil's promised video game afterlife. Countless other things, I'm sure, I can't keep up with the nonsense.
  Maybe once the hype dies down and the limitations become too much to ignore, maybe the
  - Re: (Score:2)
    
    by gweihir ( 88907 ) writes:
    
    I'm starting to wonder if it's even worth the effort. It's like arguing with creationists.
    
    Yep. It is quasi-religious fanatical behavior. Flat-earthers. anti-vaxxers, creationists, etc. all being exceptionally sure of their stances in face of overwhelming evidence to the contrary. These people cannot be reached. They are too deep into their delusion. I think the best we can hope for is them becoming one more bizarre sect when the hype has died down.
    If course, there is one, bretty bad, possibility: It is possible these people all only have a limited form of intelligence that is not capable of und
Where can I fork a github project? (Score:3)

by Marxist Hacker 42 ( 638312 ) * writes: <seebert42@gmail.com> on Friday March 03, 2023 @12:56PM (#63339153) Homepage Journal

Anybody know where the source code for any of this is? I'd like to take the engine, feed it the Project Gutenberg database, and call it "VictorianAI"- since that's 99% of the text in Project Gutenberg. I think it might be small enough to fit on a desktop device and just crank away at it for a few months, and it'll provide a better signal-to-noise ratio for how to be human than whatever they fed ChatGPT on.

- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  You might be able to do something like it with wolfram alpha:
  
  https://writings.stephenwolfra... [stephenwolfram.com]
So does this mean the end of Captcha? (Score:4, Funny)

by GargamelSpaceman ( 992546 ) writes: on Friday March 03, 2023 @01:05PM (#63339173) Homepage Journal

If so, are we going to have to have PGP keysigning parties to form a web of trust where we all certify that each others' keys are owned by a flesh and blood human and nothing else so as to retain anonymity while keeping out AI chatbots from discussions?

- Re: So does this mean the end of Captcha? (Score:2)
  
  by gwjgwj ( 727408 ) writes:
  
  Don't be such an aiophobe. Should't you care about the quality of the comment, not the author?
  - Re: (Score:2)
    
    by GargamelSpaceman ( 992546 ) writes:
    
    No, because it will drown out the content from humans in a wall of spam. All contents will be from bots run by people trying to keep me from seeing something, sell me something. They will be taylored to make my search for information as fruitless as possible. People will leave and forums will die populated by bots talking to themselves. If people want forums they will need some kind of way to prove they are human. But there's no need to prove id. Let's keep anonymity.
    - Re: (Score:1)
      
      by Paradise Pete ( 33184 ) writes:
      
      No, because it will drown out the content from humans in a wall of spam. All contents will be from bots run by people trying to keep me from seeing something, sell me something.
      Yes. It will be a deluge of targeted misinformation and disinformation. It's inevitable.
    - Re: (Score:2)
      
      by narcc ( 412956 ) writes:
      
      Sounds like twitter.
AI (Score:2)

by awwshit ( 6214476 ) writes:

With "22â"26 percent accuracy" its a no brainer.
Humans are preparing their own demise (Score:1)

by qaz123 ( 2841887 ) writes:

is a key step to building artificial general intelligence (AGI) that can perform general tasks at the level of a human.
And then at the level higher than that of a human. (And that will be done by AI itself).
I need a Link (Score:5, Insightful)

by Lando ( 9348 ) writes: <lando2+slash@gmai[ ]om ['l.c' in gap]> on Friday March 03, 2023 @03:33PM (#63339583) Homepage Journal

To solve all these damn Capchas that pop up. I seem to need to go at least 6 of them before it verifies me as not a robot, be a lot easier if I could have a robot do them for me.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re: (Score:2)

Re: (Score:2)

Re: M$ BAD!! +eleventy billion insightful (Score:2)

Re: (Score:2)

Re: Nope (Score:5, Insightful)

Re: Nope (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Where can I fork a github project? (Score:3)

Re: (Score:2)

So does this mean the end of Captcha? (Score:4, Funny)

Re: So does this mean the end of Captcha? (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

AI (Score:2)

Humans are preparing their own demise (Score:1)

I need a Link (Score:5, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals