Forgot your password?
typodupeerror
Mars NASA

What Happened When Alaska's Court System Tried Answering Questions with an AI Chatbot? (nbcnews.com) 63

An AI chatbot to answer probate questions from Alaska residents "was supposed to be a three-month project," said Aubrie Souza, a consultant with the National Center for State Courts told NBC News. "We are now at well over a year and three months, but that's all because of the due diligence that was required to get it right." "With a project like this, we need to be 100% accurate, and that's really difficult with this technology," said Stacey Marz, the administrative director of the Alaska Court System and one of the Alaska Virtual Assistant (AVA) project's leaders... While many local government agencies are experimenting with AI tools for use cases ranging from helping residents apply for a driver's license to speeding up municipal employees' ability to process housing benefits, a recent Deloitte report found that less than 6% of local government practitioners were prioritizing AI as a tool to deliver services. The AVA experience demonstrates the barriers government agencies face in attempting to leverage AI for increased efficiency or better service, including concerns about reliability and trustworthiness in high-stakes contexts, along with questions about the role of human oversight given fast-changing AI systems. These limitations clash with today's rampant AI hype and could help explain larger discrepancies between booming AI investment and limited AI adoption.
The chatbot was developed with Tom Martin, a lawyer/law professor who designs legal AI tools, according to the article. But the project "had to contend with the serious issue of hallucinations, or instances in which AI systems confidently share false or exaggerated information." "We had trouble with hallucinations, regardless of the model, where the chatbot was not supposed to actually use anything outside of its knowledge base," Souza told NBC News. "For example, when we asked it, 'Where do I get legal help?' it would tell you, 'There's a law school in Alaska, and so look at the alumni network.' But there is no law school in Alaska." Martin has worked extensively to ensure the chatbot only references the relevant areas of the Alaska Court System's probate documents rather than conducting wider web searches.
The article concludes that "what was meant to be a quick, AI-powered leap forward in increasing access to justice has spiraled into a protracted, yearlong journey plagued by false starts and false answers." But the chatbot is now finally scheduled to be launched in late January. "It was just so very labor-intensive to do this," Marz said, despite "all the buzz about generative AI, and everybody saying this is going to revolutionize self-help and democratize access to the courts.

"It's quite a big challenge to actually pull that off."
This discussion has been archived. No new comments can be posted.

What Happened When Alaska's Court System Tried Answering Questions with an AI Chatbot?

Comments Filter:
  • Ok (Score:5, Interesting)

    by liqu1d ( 4349325 ) on Saturday January 03, 2026 @10:51PM (#65900293)
    So when it fucks up (and it will!) who's at fault? Does the person relying on said information eat the consequences or the provider of the information?
    • Who would be at fault if an employee of the State of Alaska gave them bad information?
    • by gweihir ( 88907 )

      I expect we will see a court decide that. Generally, when legal questions are answered in a context that looks trustworthy to a layperson, it will be on the information provider, even with disclaimers in place, unless basically every statement is prefixed by IANAL.

    • It is government! Who is responsible? "No One!"
  • But AI will never be 100% correct.

    • by narcc ( 412956 )

      I know that. You know that. Even the AI grifters know that, they just pretend otherwise.

    • by Brain-Fu ( 1274756 ) on Sunday January 04, 2026 @12:13AM (#65900397) Homepage Journal

      You are correct. The people quoted in this summary seem to believe that hallucinations are a solvable problem for AI, and that they solved it, it just took a lot longer than expected.

      Nope, they didn't solve it. Their AI will still hallucinate, and can still be jailbroken too. Their optimism suggests a severe lack of due diligence, despite the extended period they have worked on this. I am imagining some egos are involved that simply cannot admit to failure, especially given what they have spent, but that is going to make it all the more embarrassing (and harmful) when the hallucinations cause real world harm and they take it offline at that time.

      On the other hand, maybe they actually DID make an AI chatbot that never hallucinates! What an amazing leap forward for AI tech! This is really going to revolutionize the industry! (But does anyone at all actually believe this? I sure don't.)

    • by Archfeld ( 6757 )

      Has any lawyer ever been 100% correct ?

  • by blue trane ( 110704 ) on Saturday January 03, 2026 @11:04PM (#65900307) Homepage Journal

    Is the Justice Department 100% accurate? When the highest levels of government fail your accuracy tests, is it cherry-picking to seize on AI mistakes?

    • by narcc ( 412956 ) on Sunday January 04, 2026 @12:24AM (#65900411) Journal

      Human mistakes are of an entirely different nature and quality than AI 'mistakes'. A human won't accidentally make up facts, cases, or sources. A human won't write summaries of things that don't exist. A human won't accidentally directly contradict a source while citing it. A human is also actually capable of identifying and correcting mistakes, unlike an LLM. Stop with this absurd nonsense that it's okay for LLMs to "make mistakes" because humans also "make mistakes" These things are not the same and you know it.

      As for this 100% business, with AI, you'd be lucky to get 60% accuracy. A human with that kind of track record offering legal advice would be arrested.

      • > Human mistakes are of an entirely different nature and quality than AI 'mistakes'. A human won't accidentally make up facts, cases, or sources. Actually, yes they do sometimes. The big difference between a human and an AI is that humans are accountable. If a human messes up, they get fired and there is remediation. If an AI messes up, we say the technology is maturing, good luck.
      • "A human is also actually capable of identifying and correcting mistakes, unlike an LLM."

        Why can you teach an LLM to produce simple ASCII text that will render cleanly on slashdot, by posting its mistaken attempts back to it until it learns?

        Are you hallucinating away my actual experience?

      • > " A human won't accidentally make up facts, cases, or sources."

        Have you ever met a human? People make up "facts" and references and misremember things, including sources, all the time. There are entirely fields of study related to false memories and people unintentionally sharing false information.

        This is a big part of why witness testimony is so unreliable. Even when people want to be helpful they often make up false information or fail to remember facts properly.
      • A human won't accidentally make up facts, cases, or sources. A human won't write summaries of things that don't exist. A human won't accidentally directly contradict a source while citing it.

        Clearly you're not on social media. And you're not a politician.

  • by gweihir ( 88907 ) on Saturday January 03, 2026 @11:23PM (#65900323)

    One thing LLM-type chatbots do not do is reliable information supply. There will be hallucinations, misstatements, lies by omission, and eventually they will have to switch it off again, permanently.

    • by kmoser ( 1469707 )
      Or leave it running forever, and just add a disclaimer that information provided by the chatbot is not intended to be accurate, use at your own risk, caveat emptor. Problem solved!
      • by gweihir ( 88907 )

        Good luck with that. If they have to prefix every statement with "IANAL", they might not se much use.

      • Re: (Score:3, Insightful)

        by stabiesoft ( 733417 )
        If that is the "best" that Alaska can do, then a much cheaper alternative would be to direct people to google or bing for the answer. Zero cost, and possibly more reliable as the answer would include the source citations.
      • Or they can relocate this rig to a casino and then just say, "well you were unlucky with that info you got, better luck next time!"

    • Have you considered owning your projections? How many cases where LLM answers are good do you leave out?

      • by gweihir ( 88907 )

        You are confused as to what requirement information production has. As soon as there is a significant level of false statements, it becomes totally useless in mist scenarios. You are just too clueless to understand that.

      • Re: (Score:2, Interesting)

        by narcc ( 412956 )

        Let's say we live in a fantasy land where LLMs are magically 95% accurate. Would you trust a car that only worked 95% of the time? What about brakes that only stopped your car 95% of the time?

        What about legal advice? Would you hire a lawyer that would make up silly nonsense 5% of the time?

        Sorry, kid. LLMs just aren't the science fiction fantasy that you want them to be. Your AI girlfriend does not and can not love you. You're not going to have a robot slave. Whatever nonsense it is that you're hoping

    • One thing LLM-type chatbots do not do is reliable information supply.

      Exactly. For all of the hoopla, It is a stark fact that AI can and will train itself on lies as well as truth. And manually searching the web shows that lies are as common as truth. And even if not lies, so much is opinion.

      A human in the loop can do better, even then we are not infallible, especially when dealing with opinion based "facts". But we have a better bullshit filter. AI has none. Which loops us right back to your statement on reliable information supply.

  • by oumuamua ( 6173784 ) on Saturday January 03, 2026 @11:55PM (#65900371)
    You want to avoid probate to save thousands of dollars in lawyer costs and months of delays for your children to get their inheritance.
    Have beneficiaries listed on IRAs, life insurance, or anything else that allows listing beneficiaries.
    Sell your real estate or transfer title (while keeping life estate) or Transfer-on-Death (TOD) to your children soon after you retire.
    Sell your cars,camper,etc. or transfer title or Transfer-on-Death (TOD) to your children soon after you retire.
    Add Payable-on-Death (POD) to bank accounts.

    Need more info just ask AI .... oh wait
    • Let's just get rid of death. then there is no need for probate. Call it what you want, but there needs to be some process to resolve all the leftover financials details of a person's life.
      • by zlives ( 2009072 )

        already done i thought, this is why trusts exists that take over ownership and never die, having never to to pay inheritance tax.

  • The article and commenters agree: creating an AI to answer probate questions is tricky, unreliable, takes a lot of resources, and can cause serious problems when it gives wrong answers. So how about approaching the problem from a different angle: simplify probate laws.

    We all know that's easier to say than to do, but ask yourself if it's easier than creating an accurate and reliable AI. When your people can't understand the laws and the processes they're required to live under, maybe the solution is to cha

    • Probate laws are that complicated for a reason. They need to handle all those special cases and exceptions to the general rules, and you can't always just go "Well, we just won't handle them.". They were added because they came up at some point and had to be handled, they'll come up again and will need to be handled for the same reasons they needed to be handled before.

    • by fuzzyfuzzyfungus ( 1223518 ) on Sunday January 04, 2026 @01:57AM (#65900511) Journal
      The trouble with simplification isn't merely that it's a pain; but that there's only so much of it you can do without promptly wandering into the delightful world of undefined behavior; where the problem isn't merely that people don't understand what the law or the spec says; but that it doesn't actually address whatever the matter at hand is, even if you had an expert to interpret it.

      When that happens you inevitably get moved to a more complex state: in jurisdictions that are serious about precedent, or markets where one implementation gains a commanding lead, whoever winged it most successfully at the time of ambiguity becomes(de-facto or de-jure) part of the new codification. In cases where it's more of a mixed result people might end up recognizing two dialects of a protocol or there will be a 'test' named after whatever judge pulled it out of nowhere because it sounded good that you then say you are applying in future cases to choose which of the uncodified behaviors to go with in a given instance. In some cases it remains more or less unsettled and the outcome is basically a surprise over and over and then the codification is basically that you just wing it; which is not ideal.

      This is, of course, not to say that all complexity is created equal: the line between 'flabby' and 'parsimonious' is much more subjective than between 'internally consistent' and 'overdetermined'; but there usually is at least a gradient if not a bright line. What gets extra tricky, though, is that law codes (more than some other types of spec) are something that you need to write both for everyone and to cover everyone.

      It's basically fine that AS15531or A478-95a(2019) are not really terribly accessible light reading. If you are dealing with now-aging military avionics or stainless steel cables those may well be you problems; but there's not a real sense of societal injustice in the fact that most people just want their aircraft flying and their wire ropes not snapping; so you have the luxury of nerding out however much your circle of professional specialists think is required by the problem and mandating accordingly. Something like probate law is going to end up happening to basically everybody, so the idea that it is impenetrable to the layman seems troublesome; but, because it happens to everybody, it's also not necessarily easy or simple to identify the equivalent of the 1040EZ case: maybe it's super boring and a guy in good health and generally agreed sound mind writes a straighforward will and then gets hit by a truck the next day. Or maybe some dementia patient's declining years see a fight between their children and hey, look at that, now we need a section on how forensic psychiatry will assess 'undue influence' in the context of whether you helped grandma with that will or whether you strong-armed a feeble old lady while she was in your care like your sibling you don't get on with well alleges. That sounds simple and accessible; and not at all like something that will either be completely impenetrable or fairly overtly allow a judge to just spitball it based on whether he hears the dispute before or after lunch and which of the potential heirs looks more punchable.

      None of this is to say that Alaska's probate system is not a nightmare accretion, that seems most likely; but it's probably a nightmare accretion with more parts that are actually load bearing than it appears; and possibly one that doesn't have a structurally sound variant that is also simple(especially in potentially adversarial contexts, like probate law: where one of the fairly common instances is "it's as simple as what this will says" v. "actually, there's a complication"; and therefore rules for both what actual complications count and how they work in addition to 'here's how you read a low complexity uncontested will').
  • The details will differ; but it sounds like the critical distinction between people who can afford good counsel and those who cannot is being retained by this exciting new tool.
  • why not ask the ai chat bot to give a source for the answer?

    • by allo ( 1728082 )

      That's one of the methods. The chatbot can access a database (e.g. of court cases) and is instructed to only answer using data accessed from the database. Hallucinations come from when the model needs to improvise. If the knowledge is in the model, it will "hallucinate" a correct answer, otherwise a plausible but incorrect answer. If you fetch the case from the database and add it to the input, it will access more the case in the input than using the model to generate/hallucinate something, so it doesn't ma

  • "Deloitte report found that less than 6% of local government practitioners were prioritizing AI as a tool to deliver services."

    Any word on whether that report had to be corrected after the embarrassing discovery of bot slop in it; as a number of other Deloitte gems recently have? They insisted that the case in Australia was on the up-and-up; though not so much as to refuse to refund some of the $290k they took for the job; not sure what the final outcome on their fine work in Canada [theindependent.ca] ended up being.
  • This isn't difficult. Ask an llm, in this case, I ask Gemma running on lmstudio :

    "using lmstudio how can I prepare the context. In otherwords, I want a standard document which lists rules I want the llms to follow when answering my questions. For example, I don't want it to provide any information without also providing reference links. I often get responses like "There is a research paper named..." and I want the link to the paper and don't want to search for it."

    The response it provides is long and detailed. It's really quite good. If you follow the steps, it's really much more reliable than getting constant hallucinations.

    If you want it to work like a champion, then ask it

    "Is there a way to keep an llm up to date? It would be amazing if I could tell the llm that later today I intend to ask it more information on a specific topic. Do some research while I'm gone. And then the llm would search the internet. It would be even cooler if it could chat on message forums and then check the answers for validity afterwards"

    Which will help you setup a RAG.

    It sounds like Lawyer dude started his project way too early. And I get it, after all, if he didn't rush in and start early someone else would have. It also sounds like he probably got to the point where the customer expected it to provide better results and if he didn't reach version 1.0 pretty soon, not only would they ditch him, but it would probably slam the doors shut for everyone else. And finally, he probably chased a rabbit down the wrong rabbit hole for far too long and delivered a shit product.

    I think to run a project like this, if I were starting today, I would talk with Google (I'd prefer Ali these days, but the whole US government/China thing is an issue), and I'd ask to license Gemma for the base of my own LLM and then extend on that. After all, training your own model from scratch is not only insanely expensive, it's also impressively stupid. Let someone else waste a few gazillion GPU hours to lay down the base weights and deal with all the other training annoyances.

    But, again, he sounds like he did a great job suckering some investors into giving him money and now he's trying to convince everyone else that it's not worth their effort to make a competing project because it's really hard to do.

    Honestly, cutting a deal with ANY of the mainstream LLMs and uploading the entire legal library of Alaska as RAG data and creating a context rule document which would constrain the answers provided to verifiable fact with linked references would have been far cheaper and far more effective.

    Of course, at the current rate of progress of LLMs, I expect by 2030, there won't even be a need for RAGs regarding things like legal references. But this might end up only being possible on Chinese computing systems since OpenAI just killed all western AI research. After all, we spent $32,000 a card on 340 H200 cards last year. They have 141GB each. This is way to small to run decent LLMs on current generation tech. I speculate that we'll see a breaking point closer to 512GB. And I don't think we'll see 512GB from anyone but the Chinese until there are A LOT more RAM factories up and running.
  • Yet another example of what happens when people use technology before trying to understand it. And not only will it not cost more than expected because of all the infrastructure they will have to set up to check what the LLM produces (which will never be 100% accurate, if they were dreaming of that), but it will create a huge technical debt since this infrastructure will depend on a specific version of the LLM that won't be available for very long...
  • Well I don't know yes sure there may be something to it, but I mostly wonder if that story has been tagged with help from this amazing new chatbot technology (currently showing up under science with mars and NASA)
  • "...to staff the phone lines just like we used to, so that our citizens will continue to have access to accurate advice."

    I kept waiting for them to say that! Given all the stated difficulties, the need for constant vigilance and updates, and the risk of harm to citizens.

  • If you want something dun rite, don't give a high tech project to people with average or below intelligence.

  • "It was just so very labor-intensive to do this," Marz said, despite "all the buzz about generative AI, and everybody saying this is going to revolutionize self-help and democratize access to the courts.

    By "everybody" who do you think he means? 5% of programmers? 10%? I think most people would not predict a project like this would work well.

    Nothing against trying radical things, but know when you're doing that.

The nice thing about standards is that there are so many of them to choose from. -- Andrew S. Tanenbaum

Working...