What Happened When Alaska's Court System Tried Answering Questions with an AI Chatbot? (nbcnews.com) 63
An AI chatbot to answer probate questions from Alaska residents "was supposed to be a three-month project," said Aubrie Souza, a consultant with the National Center for State Courts told NBC News. "We are now at well over a year and three months, but that's all because of the due diligence that was required to get it right."
"With a project like this, we need to be 100% accurate, and that's really difficult with this technology," said Stacey Marz, the administrative director of the Alaska Court System and one of the Alaska Virtual Assistant (AVA) project's leaders... While many local government agencies are experimenting with AI tools for use cases ranging from helping residents apply for a driver's license to speeding up municipal employees' ability to process housing benefits, a recent Deloitte report found that less than 6% of local government practitioners were prioritizing AI as a tool to deliver services. The AVA experience demonstrates the barriers government agencies face in attempting to leverage AI for increased efficiency or better service, including concerns about reliability and trustworthiness in high-stakes contexts, along with questions about the role of human oversight given fast-changing AI systems. These limitations clash with today's rampant AI hype and could help explain larger discrepancies between booming AI investment and limited AI adoption.
The chatbot was developed with Tom Martin, a lawyer/law professor who designs legal AI tools, according to the article. But the project "had to contend with the serious issue of hallucinations, or instances in which AI systems confidently share false or exaggerated information." "We had trouble with hallucinations, regardless of the model, where the chatbot was not supposed to actually use anything outside of its knowledge base," Souza told NBC News. "For example, when we asked it, 'Where do I get legal help?' it would tell you, 'There's a law school in Alaska, and so look at the alumni network.' But there is no law school in Alaska." Martin has worked extensively to ensure the chatbot only references the relevant areas of the Alaska Court System's probate documents rather than conducting wider web searches.
The article concludes that "what was meant to be a quick, AI-powered leap forward in increasing access to justice has spiraled into a protracted, yearlong journey plagued by false starts and false answers." But the chatbot is now finally scheduled to be launched in late January. "It was just so very labor-intensive to do this," Marz said, despite "all the buzz about generative AI, and everybody saying this is going to revolutionize self-help and democratize access to the courts.
"It's quite a big challenge to actually pull that off."
The chatbot was developed with Tom Martin, a lawyer/law professor who designs legal AI tools, according to the article. But the project "had to contend with the serious issue of hallucinations, or instances in which AI systems confidently share false or exaggerated information." "We had trouble with hallucinations, regardless of the model, where the chatbot was not supposed to actually use anything outside of its knowledge base," Souza told NBC News. "For example, when we asked it, 'Where do I get legal help?' it would tell you, 'There's a law school in Alaska, and so look at the alumni network.' But there is no law school in Alaska." Martin has worked extensively to ensure the chatbot only references the relevant areas of the Alaska Court System's probate documents rather than conducting wider web searches.
The article concludes that "what was meant to be a quick, AI-powered leap forward in increasing access to justice has spiraled into a protracted, yearlong journey plagued by false starts and false answers." But the chatbot is now finally scheduled to be launched in late January. "It was just so very labor-intensive to do this," Marz said, despite "all the buzz about generative AI, and everybody saying this is going to revolutionize self-help and democratize access to the courts.
"It's quite a big challenge to actually pull that off."
Ok (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
good luck trying to call the State of Alaska to the stand.
Re: (Score:2)
depends on the lawyer AI used by the judge dredd.
Re: (Score:2)
I expect we will see a court decide that. Generally, when legal questions are answered in a context that looks trustworthy to a layperson, it will be on the information provider, even with disclaimers in place, unless basically every statement is prefixed by IANAL.
Re: (Score:3)
Buckle up buttercup (Score:2)
But AI will never be 100% correct.
Re: (Score:2)
I know that. You know that. Even the AI grifters know that, they just pretend otherwise.
Re:Buckle up buttercup (Score:5, Interesting)
You are correct. The people quoted in this summary seem to believe that hallucinations are a solvable problem for AI, and that they solved it, it just took a lot longer than expected.
Nope, they didn't solve it. Their AI will still hallucinate, and can still be jailbroken too. Their optimism suggests a severe lack of due diligence, despite the extended period they have worked on this. I am imagining some egos are involved that simply cannot admit to failure, especially given what they have spent, but that is going to make it all the more embarrassing (and harmful) when the hallucinations cause real world harm and they take it offline at that time.
On the other hand, maybe they actually DID make an AI chatbot that never hallucinates! What an amazing leap forward for AI tech! This is really going to revolutionize the industry! (But does anyone at all actually believe this? I sure don't.)
Re: (Score:2)
Has any lawyer ever been 100% correct ?
Re: (Score:2)
What about a lawyer who won their first case and then immediately changed profession — would that qualify?
Re: (Score:2)
LOL that might just do it.
Does Trump hallucinate? (Score:3, Insightful)
Is the Justice Department 100% accurate? When the highest levels of government fail your accuracy tests, is it cherry-picking to seize on AI mistakes?
Re:Does Trump hallucinate? (Score:5, Insightful)
Human mistakes are of an entirely different nature and quality than AI 'mistakes'. A human won't accidentally make up facts, cases, or sources. A human won't write summaries of things that don't exist. A human won't accidentally directly contradict a source while citing it. A human is also actually capable of identifying and correcting mistakes, unlike an LLM. Stop with this absurd nonsense that it's okay for LLMs to "make mistakes" because humans also "make mistakes" These things are not the same and you know it.
As for this 100% business, with AI, you'd be lucky to get 60% accuracy. A human with that kind of track record offering legal advice would be arrested.
Re: (Score:1, Flamebait)
Re: (Score:3)
So, what are the signs that tell you somebody DOESN'T have TDS?
What test does someone take that tells you if they have something to criticise about Trump, it's factual and you should listen?
Re: (Score:2)
Re: Does Trump hallucinate? (Score:1)
"A human is also actually capable of identifying and correcting mistakes, unlike an LLM."
Why can you teach an LLM to produce simple ASCII text that will render cleanly on slashdot, by posting its mistaken attempts back to it until it learns?
Are you hallucinating away my actual experience?
Re: (Score:3)
Have you ever met a human? People make up "facts" and references and misremember things, including sources, all the time. There are entirely fields of study related to false memories and people unintentionally sharing false information.
This is a big part of why witness testimony is so unreliable. Even when people want to be helpful they often make up false information or fail to remember facts properly.
Re: (Score:2)
Clearly you're not on social media. And you're not a politician.
I predict this will be short-lived (Score:5, Interesting)
One thing LLM-type chatbots do not do is reliable information supply. There will be hallucinations, misstatements, lies by omission, and eventually they will have to switch it off again, permanently.
Re: (Score:2)
Re: (Score:2)
Good luck with that. If they have to prefix every statement with "IANAL", they might not se much use.
Re: (Score:3, Insightful)
Re: (Score:2)
oh no, you figured out the AI grift.
Re: (Score:2)
Or they can relocate this rig to a casino and then just say, "well you were unlucky with that info you got, better luck next time!"
Re: I predict this will be short-lived (Score:1)
Have you considered owning your projections? How many cases where LLM answers are good do you leave out?
Re: (Score:2)
You are confused as to what requirement information production has. As soon as there is a significant level of false statements, it becomes totally useless in mist scenarios. You are just too clueless to understand that.
Re: (Score:2, Interesting)
Let's say we live in a fantasy land where LLMs are magically 95% accurate. Would you trust a car that only worked 95% of the time? What about brakes that only stopped your car 95% of the time?
What about legal advice? Would you hire a lawyer that would make up silly nonsense 5% of the time?
Sorry, kid. LLMs just aren't the science fiction fantasy that you want them to be. Your AI girlfriend does not and can not love you. You're not going to have a robot slave. Whatever nonsense it is that you're hoping
Re: (Score:2)
Indeed. In most production scenarios, unreliability is a very bad killer.
Re: (Score:2)
Re: (Score:2)
Impressive. You completely missed the point.
Re: (Score:2)
the question isn't if it is a tool or not (some question the intent here) but rather, if it is as stated, a flawed tool with a very high cost, why is it "needed" at all.
Re: I predict this will be short-lived (Score:2)
Why is my car mpg calculator not even close to 95% accurate compared to dividing indicated mileage by actual gallons pumped?
Re: (Score:1)
Re: (Score:2)
because you are driving an electric car and it decided that math is not important.
Re: (Score:3)
One thing LLM-type chatbots do not do is reliable information supply.
Exactly. For all of the hoopla, It is a stark fact that AI can and will train itself on lies as well as truth. And manually searching the web shows that lies are as common as truth. And even if not lies, so much is opinion.
A human in the loop can do better, even then we are not infallible, especially when dealing with opinion based "facts". But we have a better bullshit filter. AI has none. Which loops us right back to your statement on reliable information supply.
Re: (Score:2)
Have you ever changed the AI's mind by arguing with it? If the AI is wrong why not give it the sane bs filter you use?
No, for the same reason I seldom argue with a person that is bullshitting me.
avoid probate in the first place (Score:5, Informative)
Have beneficiaries listed on IRAs, life insurance, or anything else that allows listing beneficiaries.
Sell your real estate or transfer title (while keeping life estate) or Transfer-on-Death (TOD) to your children soon after you retire.
Sell your cars,camper,etc. or transfer title or Transfer-on-Death (TOD) to your children soon after you retire.
Add Payable-on-Death (POD) to bank accounts.
Need more info just ask AI
Semantics (Score:2)
Re: (Score:2)
already done i thought, this is why trusts exists that take over ownership and never die, having never to to pay inheritance tax.
Re: (Score:3)
Simple, they used AI to solve the problem. They just asked the chatbot if it had stopped hallucinating. The bot replied in the affirmative. That concluded the project. It's called vibe troubleshooting.
Try solving probate differently (Score:2)
The article and commenters agree: creating an AI to answer probate questions is tricky, unreliable, takes a lot of resources, and can cause serious problems when it gives wrong answers. So how about approaching the problem from a different angle: simplify probate laws.
We all know that's easier to say than to do, but ask yourself if it's easier than creating an accurate and reliable AI. When your people can't understand the laws and the processes they're required to live under, maybe the solution is to cha
Re: (Score:3)
Probate laws are that complicated for a reason. They need to handle all those special cases and exceptions to the general rules, and you can't always just go "Well, we just won't handle them.". They were added because they came up at some point and had to be handled, they'll come up again and will need to be handled for the same reasons they needed to be handled before.
Re:Try solving probate differently (Score:5, Insightful)
When that happens you inevitably get moved to a more complex state: in jurisdictions that are serious about precedent, or markets where one implementation gains a commanding lead, whoever winged it most successfully at the time of ambiguity becomes(de-facto or de-jure) part of the new codification. In cases where it's more of a mixed result people might end up recognizing two dialects of a protocol or there will be a 'test' named after whatever judge pulled it out of nowhere because it sounded good that you then say you are applying in future cases to choose which of the uncodified behaviors to go with in a given instance. In some cases it remains more or less unsettled and the outcome is basically a surprise over and over and then the codification is basically that you just wing it; which is not ideal.
This is, of course, not to say that all complexity is created equal: the line between 'flabby' and 'parsimonious' is much more subjective than between 'internally consistent' and 'overdetermined'; but there usually is at least a gradient if not a bright line. What gets extra tricky, though, is that law codes (more than some other types of spec) are something that you need to write both for everyone and to cover everyone.
It's basically fine that AS15531or A478-95a(2019) are not really terribly accessible light reading. If you are dealing with now-aging military avionics or stainless steel cables those may well be you problems; but there's not a real sense of societal injustice in the fact that most people just want their aircraft flying and their wire ropes not snapping; so you have the luxury of nerding out however much your circle of professional specialists think is required by the problem and mandating accordingly. Something like probate law is going to end up happening to basically everybody, so the idea that it is impenetrable to the layman seems troublesome; but, because it happens to everybody, it's also not necessarily easy or simple to identify the equivalent of the 1040EZ case: maybe it's super boring and a guy in good health and generally agreed sound mind writes a straighforward will and then gets hit by a truck the next day. Or maybe some dementia patient's declining years see a fight between their children and hey, look at that, now we need a section on how forensic psychiatry will assess 'undue influence' in the context of whether you helped grandma with that will or whether you strong-armed a feeble old lady while she was in your care like your sibling you don't get on with well alleges. That sounds simple and accessible; and not at all like something that will either be completely impenetrable or fairly overtly allow a judge to just spitball it based on whether he hears the dispute before or after lunch and which of the potential heirs looks more punchable.
None of this is to say that Alaska's probate system is not a nightmare accretion, that seems most likely; but it's probably a nightmare accretion with more parts that are actually load bearing than it appears; and possibly one that doesn't have a structurally sound variant that is also simple(especially in potentially adversarial contexts, like probate law: where one of the fairly common instances is "it's as simple as what this will says" v. "actually, there's a complication"; and therefore rules for both what actual complications count and how they work in addition to 'here's how you read a low complexity uncontested will').
I suspect that it is broadly accurate; in a sense. (Score:2)
citation please (Score:2)
why not ask the ai chat bot to give a source for the answer?
Re: (Score:2)
That's one of the methods. The chatbot can access a database (e.g. of court cases) and is instructed to only answer using data accessed from the database. Hallucinations come from when the model needs to improvise. If the knowledge is in the model, it will "hallucinate" a correct answer, otherwise a plausible but incorrect answer. If you fetch the case from the database and add it to the input, it will access more the case in the input than using the model to generate/hallucinate something, so it doesn't ma
Deloitte, eh? (Score:2)
Any word on whether that report had to be corrected after the embarrassing discovery of bot slop in it; as a number of other Deloitte gems recently have? They insisted that the case in Australia was on the up-and-up; though not so much as to refuse to refund some of the $290k they took for the job; not sure what the final outcome on their fine work in Canada [theindependent.ca] ended up being.
Often research critical information with Copilot (Score:3)
"using lmstudio how can I prepare the context. In otherwords, I want a standard document which lists rules I want the llms to follow when answering my questions. For example, I don't want it to provide any information without also providing reference links. I often get responses like "There is a research paper named..." and I want the link to the paper and don't want to search for it."
The response it provides is long and detailed. It's really quite good. If you follow the steps, it's really much more reliable than getting constant hallucinations.
If you want it to work like a champion, then ask it
"Is there a way to keep an llm up to date? It would be amazing if I could tell the llm that later today I intend to ask it more information on a specific topic. Do some research while I'm gone. And then the llm would search the internet. It would be even cooler if it could chat on message forums and then check the answers for validity afterwards"
Which will help you setup a RAG.
It sounds like Lawyer dude started his project way too early. And I get it, after all, if he didn't rush in and start early someone else would have. It also sounds like he probably got to the point where the customer expected it to provide better results and if he didn't reach version 1.0 pretty soon, not only would they ditch him, but it would probably slam the doors shut for everyone else. And finally, he probably chased a rabbit down the wrong rabbit hole for far too long and delivered a shit product.
I think to run a project like this, if I were starting today, I would talk with Google (I'd prefer Ali these days, but the whole US government/China thing is an issue), and I'd ask to license Gemma for the base of my own LLM and then extend on that. After all, training your own model from scratch is not only insanely expensive, it's also impressively stupid. Let someone else waste a few gazillion GPU hours to lay down the base weights and deal with all the other training annoyances.
But, again, he sounds like he did a great job suckering some investors into giving him money and now he's trying to convince everyone else that it's not worth their effort to make a competing project because it's really hard to do.
Honestly, cutting a deal with ANY of the mainstream LLMs and uploading the entire legal library of Alaska as RAG data and creating a context rule document which would constrain the answers provided to verifiable fact with linked references would have been far cheaper and far more effective.
Of course, at the current rate of progress of LLMs, I expect by 2030, there won't even be a need for RAGs regarding things like legal references. But this might end up only being possible on Chinese computing systems since OpenAI just killed all western AI research. After all, we spent $32,000 a card on 340 H200 cards last year. They have 141GB each. This is way to small to run decent LLMs on current generation tech. I speculate that we'll see a breaking point closer to 512GB. And I don't think we'll see 512GB from anyone but the Chinese until there are A LOT more RAM factories up and running.
The technical debt is going to be huge (Score:1)
Lawyers and martians (Score:2)
Missing: "So we decided to bring back the humans" (Score:2)
"...to staff the phone lines just like we used to, so that our citizens will continue to have access to accurate advice."
I kept waiting for them to say that! Given all the stated difficulties, the need for constant vigilance and updates, and the risk of harm to citizens.
dun rite (Score:2)
If you want something dun rite, don't give a high tech project to people with average or below intelligence.
"Everybody" (Score:2)
By "everybody" who do you think he means? 5% of programmers? 10%? I think most people would not predict a project like this would work well.
Nothing against trying radical things, but know when you're doing that.