Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI Technology

Researchers Say AI Tool Used in Hospitals Invents Things No One Ever Said 138

AmiMoJo shares a report: Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers.

Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.

[...] It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Nabla's chief technology officer Martin Raison said.
This discussion has been archived. No new comments can be posted.

Researchers Say AI Tool Used in Hospitals Invents Things No One Ever Said

Comments Filter:
  • But seriously, transcription is probably relatively safe. Like summarization, transcription doesn't rely on the LLM to "know" anything except language structure, which is what it's good at.
    • by fuzzyfuzzyfungus ( 1223518 ) on Monday October 28, 2024 @10:51AM (#64899947) Journal
      The problem is that 'knowing' language structure and more or less nothing else is the perfect recipe for apparently plausible, syntactically appropriate, nonsense lurking in.

      Traditional speech-to-text is often a bit on the rough side; but it has the 'virtue'(in a sense) of breaking in stupid visible ways if it chokes on a bit of input. You'll get a similar-sounding word that has no business being in that part of a sentence, or a sentence-length or two of total word salad if there's a burst of background noise or a mic level issue or something. It's not a pretty looking failure; but for the same reason it's not all that sneaky. Not as good as a system that gracefully admits that the input is unusable from timestamp A to timestamp B and tells you as much; but a fair way from the exceptionally smooth confabulation you get out of LLMs.
    • by gweihir ( 88907 )

      If MS Teams uses an LLM to generate its transcripts, then no. That is _not_ a safe application. But MS may cause numerous errors in the Teams transcripts using another substandard technology. Does anybody know?

      • But MS may cause numerous errors in the Teams transcripts using another substandard technology.

        Teams already produces transcripts with numerous errors using substandard technology. Unless you speak slowly and use small words, expect to read the transcript a few times to make sure you understand what was said.
        • by gweihir ( 88907 )

          I usually just use it to check the microphone is working when I cannot easily listen in, e.g. because I am the one speaking. If it transcribes complete nonsense, the language setting is wrong or the audio is bad. If it mostly gets it right, the audio is good. But I just last week saw how bad some mistakes are even with good audio.

      • by AvitarX ( 172628 )

        It definitely uses some type of language modeling.

        I've seen parts of a transcript where people were speaking Chinese and it's attempts to transcribe it as English (not translate, but interpret the sounds as English words) were significantly better at coherence that auto predict when typing on a phone.

        • by gweihir ( 88907 )

          Hahaha, yes. Or it is set to English and you are speaking German. Complete nonsense. At least as of some weeks ago it tells you that the language setting may be wrong.

    • by XXongo ( 3986865 ) on Monday October 28, 2024 @11:07AM (#64900001) Homepage

      But seriously, transcription is probably relatively safe.

      Sounds plausible, but the actual article we're discussing says otherwise.

      • Re: (Score:2, Insightful)

        by HiThere ( 15173 )

        Transcription itself is safe. It's just that depending on it for anything significant isn't. (And note that the original data has been"erased for "data safety reasons".) So if the transcript says to use one drug, and the doctor says he ordered another there's no way to check.

        This should all be obvious to those who read Slashdot, but folks who are the general public, or even medical professionals, might well not understand the problems.

        • by XXongo ( 3986865 )

          Transcription itself is safe.

          Apparently not.

          It's just that depending on it for anything significant isn't.

          So, it's safe, except when it's not.

      • Read between the lines. The article is saying "X uses Y for transcription. Y is capable of hallucinations." It is not saying "X produces wrong transcriptions."
    • Except didn't we just hear these people claim [slashdot.org] that bullsh***ing I mean "hallucinations" was a solved problem?

      • by Ol Olsoc ( 1175323 ) on Monday October 28, 2024 @11:14AM (#64900033)

        Except didn't we just hear these people claim [slashdot.org] that bullsh***ing I mean "hallucinations" was a solved problem?

        It won't ever be a solved problem of course. At some point, the "truth" will become AI referencing AI, so hallucinations will become whatever AI decides is truth.

        • by dfghjk ( 711126 )

          The post you quoted definitely did NOT say that.

          Also, literally everything an LLM produces is a "hallucination". Some just turn out to be pretty good.

        • This AI behaves more and more like humans... warts and all.
      • by HiThere ( 15173 )

        That won't ever be a solved problem. People do it too. But people have a contextual map of the problem that lets them estimate the likelihood of misinterpretation, and this lets them eliminate lots of possible interpretations of the stuff they heard. (And even so, sometimes they eliminate the things actually heard.) This is one basis of humor.

        • Seems like LLM transcriptions should be able to put text in different colors to indicate different confidence levels, or the like... and that ought to be a priority customer need. But it's better marketing to pretend your product is always 100% able to do the job and 100% certain that it's right, and never draw attention to the possibility of mistakes. So I think part of the problem is humans here, not allowing the machine to say "I can't really make out what you're saying but here's a wild guess."

      • Except didn't we just hear these people claim [slashdot.org] that bullsh***ing I mean "hallucinations" was a solved problem?

        Clearly that was a meta hallucination, the problem seems to be getting worse...

    • by zekica ( 1953180 )
      They only need to add "do not hallucinate!" to the prompt.
    • by AmiMoJo ( 196126 )

      That seems to be the problem - it hears some speech and then hallucinates extra sentences that seem to fit the structure.

    • by dfghjk ( 711126 )

      LLMs don't "know" anything.

    • by jvkjvk ( 102057 )

      Transcriptions of medical interviews that include hallucinations are *not* "relatively safe" - they are inherently DANGEROUS.

      If you can't see that, I don't know how to help you.

      • If you can't see that, I don't know how to help you.

        Just ask ChatGPT. I'm sure it has an answer.

    • But seriously, transcription is probably relatively safe. Like summarization, transcription doesn't rely on the LLM to "know" anything except language structure, which is what it's good at.

      This is absolutely not true - or at least you don't know what knowing "language structure" actually entails. Listening to audio and matching sounds to phonemes is not sufficient or the transcription problem would have been solved without LLMs. When you add that "language structure" thing that means matching phonemes to potential words and word sequences and using context to determine what choice is correct. And that means that it has to have a representation of what context -- possible meanings -- and that

      • by Holi ( 250190 )

        "Listening to audio and matching sounds to phonemes is not sufficient or the transcription problem would have been solved without LLMs. "

        It was solved, in 1997. And today error rates are lower then human transcribers. No need for an LLM.

    • by Holi ( 250190 )

      Transcription is not summarization though, it is transcribing, word for word. There is no need for any "language structure". Apparently LLM's are not very good at that.

    • But seriously, transcription is probably relatively safe. Like summarization, transcription doesn't rely on the LLM to "know" anything except language structure, which is what it's good at.

      The trouble with transcription is it's insufficient.

      A lot of understanding language isn't just decoding the waveforms, it's matching the sounds to words based on context.

      It's like those videos floating around a few years ago with misheard lyrics for songs [youtube.com]. Once you set the expectation it really sounds like they're singing the other lyrics because the sounds aren't that different.

      That's why transcription needs LLMs to really work properly, it's not just what word it sounds like, it's what word makes sense in

    • by narcc ( 412956 )

      Like summarization, transcription doesn't rely on the LLM to "know" anything except language structure, which is what it's good at.

      LLMs don't "know" anything except "language structure". Summaries are just as likely to contain nonsense as any other LLM output.

      Transcription could be safe, but the approach Whisper uses clearly isn't. It doesn't work by identifying sequences of phonemes and converting those into text like other approaches, it works in a similar way to other encoder-decoder transformer models, with a few tweaks. You can even give it a prompt to guide the output. It is not surprisingly in the least that it "hallucinates

  • He who controls the past, controls the future. Or so the thinking goes among those who try.
    • Does it save a nickel in operating costs?

      • Save a nickel by randomly inserting racist tirades into unrelated content? No. That's clearly a quote-unquote human being imposing their agenda. But we've always known so-called AI was just going to be a megaphone for human narcissists.
  • by Viol8 ( 599362 ) on Monday October 28, 2024 @10:35AM (#64899891) Homepage

    ... no one really knows in detail how these things actually do what they do. They understand the high level feeding in data and guff about N dimentional matrices of semantic relations, they understand the low level side of back propagation setting neural weights, but theres that fuzzy in the middle part to which no one can quite get their hear around whats happening. Frankly given these models have ever increasing billions of artificial neurons I wonder if anyone really will.

    • by gweihir ( 88907 ) on Monday October 28, 2024 @11:04AM (#64899993)

      Or to be more precise, while the actual mechanisms are somewhat understood, the training data is generally not understood at all.

      • by HiThere ( 15173 )

        That's important, alright, but you also don't know which part of the training data the AI was paying attention to. There have been examples where it was attending to a time of day code encoded into the photo, that people didn't even see.

        • by gweihir ( 88907 )

          It is both, actually. The base mechanisms are well understood, _but_ what actually happens when they are applied to large training data and the results from training on a large data-set is definitely not well understood. Your date-code example is a good one.

    • by dfghjk ( 711126 )

      People know how they work, it is only the "billions" that makes them beyond simple prediction. Humans created them, they didn't accidentally spring into existence.

  • by JustAnotherOldGuy ( 4145623 ) on Monday October 28, 2024 @10:39AM (#64899905) Journal

    Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments.

    Okay, that's a problem. A serious problem by any standard.

    Nabla's tool erases the original audio for "data safety reasons,"

    And that's a much, much bigger and more serious problem. Without the original how would you even know if anything was changed, added, or removed? Obvious things, sure, but what if a dosage was altered or the results of a biopsy (for example) were reported as "clean" when in fact it was not?

    • It almost sounds to me like the AI generated text is a dubious legal dodge to avoid being responsible for HIPPA compliance.

      Which raises the question of whether they're turning around and selling the (dubiously accurate, hallucinated) medical conversations to advertising partners or something.

      • by gweihir ( 88907 )

        This clearly is to make litigation harder. Avoiding HIPPA compliance may also be a factor. The deletion is in any case clearly malicious.

      • Re: (Score:3, Interesting)

        I doubt it. In my experience nefarious reasons are quite rare despite what we see in the media. More likely it was some uneducated person deciding that the transcriptions take up less data storage so deleting the recordings would save them some money, while thinking out very poorly the results of this particular action.

        Start with assuming someone had lazy logic, which is 9 times out of 10 the fault of something, before you jump to nefarious reasons.

        • That doesn't mesh with "date safety" as a stated rationale. I'm all for Hanlon's razor, but this theory doesn't quite match the data.

        • Artificial intelligence is not able to overcome natural stupidity. AI, in its current form, is not ready for prime time. But dumbass humans lookin' to save a few more pennies will happily let it play in prime time, while declaring great victory over some nefarious imagined foe.

          This has been the only real fear of AI I've had all along. Not that it's going to replace us well. But that it'll be used to replace us poorly. In critical roles. Like hospital administration. Oh well. Not like the uber-rich will use

        • When sending a recording to a third-party transcription service, it makes perfect sense for the service to delete the recording when they're done transcribing it. Why would a doctor ever send the one and only copy to the transcriptionist? I mean, back in the days of dictaphones and physical tapes, maybe. But not in the past 25 years.

          If the transcription is done in-house by a tool it makes a lot less sense to auto-delete the original. Still, the solution is easy enough - don't give the tool the one and o

      • ... HIPPA compliance...

        It's "HIPAA" - Health Insurance Portability and Accountability Act.

    • by wildstoo ( 835450 ) on Monday October 28, 2024 @11:03AM (#64899989)
      "Hey, as far as we know, Dr. Patsy really was recommending ethnic cleansing as an infection control method. Without the audio you'll just have to take our idiotic LLM's word for it."
    • Tee and save the audio stream yourself. It is a no-brainer on any half decent computer system.
      • That's fine if you're running the program on a regular computer, but what if it's on a special purpose appliance, that doesn't even have a CLI?
    • by taustin ( 171655 )

      Nabla's tool erases the original audio for "data safety reasons,"

      And that's a much, much bigger and more serious problem. Without the original how would you even know if anything was changed, added, or removed? Obvious things, sure, but what if a dosage was altered or the results of a biopsy (for example) were reported as "clean" when in fact it was not?

      Any doctor that uses such a system should lose their medical license, and be criminally prosecuted for every bad thing that happens before they get caught.

  • If it would specify the *exact* hospitals that are using these tools so everyone here can avoid those places. The article doesn't mention the names of these facilities so it's not generally useful to the readership that is trying to make an informed decision.
    • If it would specify the *exact* hospitals that are using these tools so everyone here can avoid those places. The article doesn't mention the names of these facilities so it's not generally useful to the readership that is trying to make an informed decision.

      No fear, citizen! Soon insurance will require AI transcripts from any hospital they provide payment to as a safety measure. You and your doctor need not fear. Neither of you will be in charge of the decisions.

  • by MpVpRb ( 1423381 ) on Monday October 28, 2024 @10:46AM (#64899939)

    ...picking up speed.
    LLMs exhibited unexpected emergent behavior. This got the train rolling.
    Investors hopped aboard, and the speed increased. Problem is, investors want profits NOW.
    Early adopters hopped aboard because they needed to convince their investors that they were using "the next big thing" and it allowed them to reduce costs.
    Problem is, AI is a research project that will take years to be really useful and today's offerings suck mightily for real work.
    Expect the crapfest to continue as the hype train continues gaining speed

    • by gweihir ( 88907 )

      Maybe we will get lucky and there will be a rather abrupt and terminal stop: An LLM may be involved in somebody getting killed by malpractice and the hospital responsible gets sued into the ground.

  • LLM are good at identifying things (Cancer, cars, etc.)

    There are not good at making complex decisions or programming. That needs to be something else on top of LLMs.

    • LLM are good at identifying things (Cancer, cars, etc.)

      There are not good at making complex decisions or programming. That needs to be something else on top of LLMs.

      And yet we are told by the AI bros that software engineers and programmers will be redundant inside of a few years because soon anybody can be a 'prompt engineer' and create sophisticated software easily using simple written instructions. Are you saying that these peerless geniuses are wrong?

    • No they are not.

      CNNs combined with other neural net forms are good at those tasks.

      LLMs are LANGUAGE models, and are related to structures that are good at transforming language: translation, summarization, etc.

      But media hype and LLM bros decided that any AI or algorithmic technique should be replaced by their bullshit generators.

      "I have a hammer to sell, so everything should be solved by hitting stuff."

  • by Chris Mattern ( 191822 ) on Monday October 28, 2024 @10:58AM (#64899961)

    "It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Nabla's chief technology officer Martin Raison said."

    "That's safety for us, not for you."

  • by RobinH ( 124750 ) on Monday October 28, 2024 @10:59AM (#64899967) Homepage
    My wife knows a psychologist who's using an AI technology to summarize her case notes, which seems like a sensible thing to do until you ask some fundamental questions like, "how do you know if it's accurate?" and "where is the data being stored and processed?" She might be knowledgeable and careful about it, but you just know there are professionals out there who believe AI is "accurate enough" and won't bother to check the results. This is a big problem which is going to take years for professional organizations to even acknowledge and then there will be a big fight over regulating it.
  • Who has been living under a rock here?

    • by MobyDisk ( 75490 )

      It's not an LLM. It a voice-to-text tool.

      • by gweihir ( 88907 )

        It is by OpenAI and "AI powered". Do you really think it is not an LLM that does the work?

    • Who has been living under a rock here?

      Whisper is not an LLM.

      For people who need reliably use the big model, take advantage of whisper confidence scores, use a low temperature and do a duplicate run of the same samples and compare results for similarity.

      "Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing."

      Or don't do any of that and be like these clowns.

  • For entertainment sure, for anything involving real life? WHY?
    • For entertainment sure, for anything involving real life? WHY?

      Because C-suites, which hospitals now answer to thanks to the profitization of the entire health system, are seeing the same dollar signs every other C-suite sees when the AI prophets start talking about all the savings to be had by cutting staff and replacing them with AI. Who cares if it's accurate? It might save money! And there is no greater moral imperative in our universe than saving money!

    • by pz ( 113803 )

      For entertainment sure, for anything involving real life? WHY?

      Because there are cases, well-documented cases, of trained ML systems that match or exceed human-level capacity at certain tasks, like reading radiograms to detect breast cancer. At this point in time, I'd rather trust an ML system than getting a radiogram read by someone who isn't in a major city. In not too long, we won't have humans reading radiograms at all, except in the most zebra of cases.

      But back to the subject at hand: transcription. Having done a similar task of recognizing templated signals in

  • by Some Guy ( 21271 ) on Monday October 28, 2024 @11:05AM (#64899997)

    ...computer software produces erroneous results.

    Let's stop anthropomorphizing these language models please.

    They don't think, they don't reason, they don't "make things up", and they don't hallucinate.

  • Aside from the very valid concerns about accuracy, what kind of idiot names a company NABLA?
  • That alone is a major lawsuit. And if the transcription is incorrect, and results in very bad followup - like amputation when it wasn't called for, or lack of treatment when it was, that's billions and billions.

  • We've had this for about 20 years already with the copy machines. Instead of letting a copy degrade from something nearly unreadable to something even worse they'll make it better. This whole LLM thing is different just in scale. Sure, one might say that's the same difference between a squid and Einstein but I don't think see it yet.

  • I have not had it happen with the "advanced mode" version, but as recent as 2 weeks ago, the the ChatGPT voice chat tool would sometimes interpret background noise as either far-east Asian language or short phrases like "Thanks." I used the tool in my car and there is a fair bit of road noise. so if I paused during a conversation, it would think I said "Thanks" and keep replying with things like "You are welcome, I am happy to help!" At least it assumed I said good things?

  • Nabla's tool erases the original audio for "data safety reasons,"

    It's in the manual, indexed under CYA.

  • I have this conversation with my co-workers frequently. Generative AI solutions have no place in science settings. Science is not about making stuff up. We spend a lot of time carefully documenting empirical evidence and our forward decisions are made from the data, there is no room for fabrication of any kind.

  • If not, can they do multiple transcriptions of the same audio? The accurate ones will all be alike while the error ones will all be different from each other and the accurate one.

  • "Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments".

    Presumably in decreasing order of importance?

  • There are companies like Nexidia that have been providing good transcription tools for decades. Why in the world would one choose to use GENERATIVE AI for a task that can be consistently performed algorithmically? Especially for something like Medical Transcription where the impact of a hallucination could be someone's life?

    Because it operates so differently, Generative AI might make a good automated QA tool -- but not a primary transcriber.
  • I'm going to go out on a limb and predict that LLM's will soon:
    commit crimes
    need sleep
    probably watch porn

    Kurzweil said similar in his books... I paraphrase but basically he said, we will use ourselves, our minds, our brains, as the blueprint for Artificial Intelligence becauuuuse it's the best and perhaps only model we know. So I'll just stand on the shoulders of giants and take an obvious leap: these models will exhibit all the foibles of humanity. We have "hallucinations" now... crimes, lies, abusive beha
    • by dfghjk ( 711126 )

      "...So I'll just stand on the shoulders of giants ..."

      That is not what you're doing, you just can't tell the difference.

      " We have "hallucinations" now... crimes, lies, abusive behaviour, self aggrandizement can't be far behind."

      No. "Hallucinations" are precisely how neural networks work, it's not an accident, and it doesn't predict other undesirable behaviors.

      • yoo hoo.. LLM's aren't neural networks..
        also love or hate, Kurzweil has cast a long shadow on modern AI, rather longer than yours... or mine

        Looking forward to the first documemented AI crime. You heard it first here.

        Now explain to the class how "Hallucinations are precisely now neural networks work".
        I'll make coffee. Also I'll leave now, talk directly to the class.
  • Nabla's tool erases the original audio for "data safety reasons,"

    What possible "data safety reason" could there be in having the audio of a transcript around?

    This isn't like Biden's interview tape, when the umms, stutters, and pauses would all be endlessly analyzed for competitive reasons.

    No, the only "data safety reason" they have is that it's much safer for them not to have that data so that they can't be found *wrong*.

    There's no way to go back and re-analyze the audio so they never have to admit to a mis

  • Would it be hide the evidence before people realize your product is shit and doesn't faithfully transcribe anything? By the time they'll figure out, management will be RICH and GONE
  • It's bad enough that physicians sometimes write the wrong thing - this happened to me and it screwed up life insurance and it was a dickens to get it fixed - but now we will have AIs saying that I said something I never did? Wow.
  • ChatGPT is equally guilty. It just makes-up quotes and makes-up who said them. At the same time it won't tell jokes featuring Mo but is happy to tell jokes on Jesus and Buddha. Won't produce any criticism of abortion and so on. The point is, the owners of ChatGPT are already siloing our access to information. If this is the future then I don't like it.

The goal of Computer Science is to build something that will last at least until we've finished building it.

Working...