Forgot your password?
typodupeerror
AI

KPMG Wrote 100-Page Prompt To Build Agentic TaxBot (theregister.com) 89

Professional services firms are engineering AI agents through massive prompt documents to automate complex knowledge work. KPMG Australia developed a 100-page prompt that transforms tax legislation and partner expertise into an agent producing comprehensive tax advice within 24 hours rather than the traditional two-week timeline.

The TaxBot searches distributed internal documents and Australian tax code to generate 25-page draft reports after collecting four to five inputs from tax agents. Chief Digital Officer John Munnelly said the system operates on KPMG Workbench, a global platform combining retrieval-augmented generation with models from OpenAI, Microsoft, Google, Anthropic, and Meta.
This discussion has been archived. No new comments can be posted.

KPMG Wrote 100-Page Prompt To Build Agentic TaxBot

Comments Filter:
  • by fmarshal ( 5342371 ) on Friday August 22, 2025 @10:06AM (#65607704)
    Can't wait until it recommends something illegal and the customer does it.
    • It takes 24 hours to respond, so I assume an actual employee goes over the 25 page draft and it's the actual employee making the recommendation.

      Will this firm improve or worse its accurate advice percentage? I don't know, but they will be giving 5x the advice.

    • by gweihir ( 88907 )

      You mean like KPMG and the others usually do via human consultants?

      The real problem is that this artificial moron will not know about how to hide the criminal things.

      • by kmoser ( 1469707 )
        Don't worry, I'm sure several paragraphs of the prompt say something like "if anything you come up with is illegal, make sure to structure it seem like an innocent mistake, and to absolve KPMG of all wrongdoing."
    • Recommendation probably gets reviewed by a junior, which means they don't know enough to validate, but then KPMG gets to assign accountability to the human, fire them for "their error," then tweak the bot some more.

      • by Anonymous Coward

        Plausible deniability; being a corporation, they can always push their crimes onto a subset of employees if they properly shield management.

        Knowing somebody who worked at KPMG who had a corrupt government official cheating on their USA taxes every year-- they simply appease the corrupt irate customer by transferring the account to a junior to "complete" after the experienced and honest accountant refuses to risk JAIL. The crimes get either committed by a corrupt/scared employee or an ignorant one; should t

      • by AvitarX ( 172628 )

        That's not going to help them if they get a reputation for bad advice.

        Brave of them to tank their reputation to train a model. Though maybe it has better results than the junior spending 5 days.

        I would actually think the way this works is it replaces the juniors entirely.

    • Good luck doing a regression test on *that* recommendation!

  • by Brain-Fu ( 1274756 ) on Friday August 22, 2025 @10:06AM (#65607706) Homepage Journal

    The hallucination problem has not been fixed. That means that this tax agent cannot be trusted. The work it produces may be full of inaccuracies that look convincing.

    • by gweihir ( 88907 ) on Friday August 22, 2025 @10:14AM (#65607738)

      The hallucination problem _cannot_ be fixed. It is a fundamental part of the mathematical model. Getting it fixed is about as possible as making water not wet under standard conditions.

      • by davidwr ( 791652 )

        Getting it fixed is about as possible as making water not wet under standard conditions.

        Any accounting firm worth their high fee can define "standard conditions" in a way that will make the client happy. If the client wants water that is not wet under "standard conditions," they can make it happen.

        • by gweihir ( 88907 )

          Sure, you can always cheat and distort the reality of things. Does not change the facts, though.

      • Just as the law (including the tax code) cannot actually be applied using unambiguous logic to all circumstances.
      • by Ksevio ( 865461 )

        Of course it can be fixed, just not as part of a singular model. Have another layer of software that interacts with the model and can do fact checking

        • If software simpler than a LLM could do fact checking we wouldn't need LLMs

        • by gweihir ( 88907 )

          No, it cannot. "Fact checking" is not something machines can do at this time, with some very limited exceptions.

          Why do people keep pushing this BS?

          • by Ksevio ( 865461 )

            By fact checking I mean it can verify statements are consistent with reputable sources. That's absolutely something that could be done

            • by gweihir ( 88907 )

              And if that can be done, you can just find that reputable source directly, no LLM needed. You seem to be confused about what LLMs are used for.

              • by Ksevio ( 865461 )

                The LLM can be used to provide sources which can be verified. They're good at finding data among all the sources. It's much easier to verify something than to find it in the first place

      • by ljw1004 ( 764174 )

        The hallucination problem _cannot_ be fixed. It is a fundamental part of the mathematical model.

        I think it can. I've been working on getting an LLM (Claude Sonnet 3.7) to add missing type annotations to python code. When I naively ask it "please add types" then like you said it has about a 60% success rate and 40% hallucination rate as measured by "would an expert human have come up with the same type annotations and did they pass the typechecker".

        But when I have a much more careful use of the LLM, micromanaging what sub-tasks it does, then it has a 70% success rate, and 30% rate of declining because

        • by gweihir ( 88907 )

          You think wrongly. It cannot be fixed and there is _mathematical_ proof for that. As soon as you "fix" fact-checking, an LLM loses all its power.

        • (I got these numbers by spot-checking 200 cases).

          Did you look at those numbers? You got the numbers to improve by 10% by putting in a lot of effort, but it still failed most of the time. The whole point of using AI is that it can handle the task, but it cannot.

          So I think hallucination can be solved for some tasks, by the right kind of task-specific micromanagement and feedback loops.

          You cannot ever be sure it's not going to hallucinate you an incorrect answer. You will eventually be bitten. And if it does decline, and you have to go back and change how you're asking it, you're doing the work again.

      • The hallucination problem _cannot_ be fixed. It is a fundamental part of the mathematical model. Getting it fixed is about as possible as making water not wet under standard conditions.

        I can't fix your hallucination problems either, but through training, repetition and reinforcement, we can overcome your fear of the dark even if the notion that there's a there there never goes away.

        We could reduce this to no model of reality can be perfect. I don't need you to be perfect to be useful.

      • by piojo ( 995934 )

        The hallucination problem _cannot_ be fixed. It is a fundamental part of the mathematical model. Getting it fixed is about as possible as making water not wet under standard conditions.

        Is hallucination equivalent to what we do when we remember something wrong or fail to update based on new evidence? I gather your point to be that hallucination and correct output are the same process. What about for humans? Surely this is also true of us, yes?

        • by gweihir ( 88907 )

          That is bullshit. Humans can be completely wrong, but even the most stupid human does some elementary fact-checking. LLMs do not and cannot.

    • by gtall ( 79522 )

      It seems to me a big problem are small hallucinations, ones that easily slip by humans. Those can compound in the readers head to comprise a big hallucination that will not jump up and down naked on the pages shouting, "Look at me!!"

      • Checking work applies as much to calculator handiwork as it does LLM output. Especially in something with high consequences.

    • How about running the answer through another model for sanity checks?

      • by Hadlock ( 143607 )

        yes and this is already done it's called speculative decoding. it's mostly applied to allow smaller models to write the majority of tokens (words like, the, and, if etc) and then those are fed to a larger model to check. checking tokens is way, way faster than generating them.

      • by kmoser ( 1469707 )
        That's like asking one idiot to check another idiot's work.
    • Thats easy.

      Just have a disclaimer stating that it's on the client if the client follows the advice and it's found to wrong / illegal.

  • by Mr. Dollar Ton ( 5495648 ) on Friday August 22, 2025 @10:08AM (#65607710)

    over a specification of the same length implemented by real intelligence instead of a random number generator?

    • by Viol8 ( 599362 )

      Or even just a program of the same length. This nonsense proves that some people have not only drunk the AI kool aid, they've filled a pool with it and gone for a swim.

      Even if their "prompt" is 100% correct - which is almost certainly won't be - then the AI can still mess up badly with the output. Unfortunately this is what happens when non programmers think programming is easy now and attempt to do it. I guess they'll find out the hard way that hard problems remain hard no matter what pretty wrapper and s

      • The only credible scenario is that the "tax application" consists entirely of boilerplate and trivially linkable libraries.

        Which must have come into existence before the "agentic" soup and be trivial to assemble.

        How, one wonders.

      • "Even if their "prompt" is 100% correct - which is almost certainly won't be"

        It's worse than that. *You'll never be able to tell if it's correct.* You have no real idea why it's doing what it's doing and you can only judge it by, "Well, it's produced the correct results whenever we've manually checked its output," which is no assurance at all.

    • by gweihir ( 88907 )

      It is cheaper and they will probably still charge an arm and a leg.

    • It's an improvement because this is formulaic and easy for a human to proof-read. Instead of choosing from a boilerplate and typing/pasting/selecting, the human can have the whole thing already typed up and probably with the questionable bits highlighted for more careful analysis.
    • I've worked on projects that went on for months and months with specs much shorter than 100 pages. Assuming the length of the prompt is the equivalent of a carefully crafted and detailed product spec, then a human engineering team could probably do a quicker and better job if they had such detailed requirements before starting on a project.

      So many projects I've been on have had marketing chiming in every week, micromanaging development all along. Most of the micromanaging was necessitated by the initial
    • How is this an improvement over a specification of the same length implemented by real intelligence instead of a random number generator?

      Presumably, it's specifying how to the use the tax code rather than a specification of the tax code itself.

      Pros:
      * No changes are needed when the tax code is updated. (You just grab the latest version of the tax code)
      * No need to pay legal experts to analyze the tax code.
      * Interprets the tax code in a consistent fashion.
      * Clients can ask how doing XYZ would impact their taxes later.
      * Should be trivial to adapt for tax codes in other countries.

      Cons:
      * Likely not as accurate as experts
      * Uses more energy

      I think

      • by piojo ( 995934 )

        Uses more energy

        Keep in mind humans run at ~100 watts, and they need to be powered 24 hours a day, not just when working. There are economic and quality issues, but wasting power isn't one of them.

        Caveats: queries do cost much more than 100 watts, but you don't need to run 8 whole hours (a work day) of queries to be equivalent to one 24-hour day (one work day of calories) of human caloric output. Training costs much more than queries. And this is a generous analysis because it doesn't consider weekends. And I'm not saying

        • Keep in mind humans run at ~100 watts, and they need to be powered 24 hours a day, not just when working. There are economic and quality issues, but wasting power isn't one of them.

          You're not even wrong. The issue for companies is paying for energy. The energy for humans is included as part of the wage/salary of the worker. The comparison to be made is the amount of energy that must be paid for. As such, paying for a human worker is competition with the cost of the energy for running computer systems.

          • by piojo ( 995934 )

            I find it unlikely most companies pay for energy used by employees or LLMs. They pay salaries or license them, respectively, so I don't see why the cost of energy to the company is as relevant as the cost of energy usage on the whole (the way it affects everybody).

            • I don't see why the cost of energy to the company is as relevant as the cost of energy usage on the whole

              Then you fundamentally do not understand how businesses operate.

              • by piojo ( 995934 )

                Were you in such a hurry to respond that you didn't read the following sentence? Businesses care about real costs, not theoretical ones.

  • by RobinH ( 124750 ) on Friday August 22, 2025 @10:10AM (#65607718) Homepage
    It seems to me that the information like this belongs in training data, not in a prompt. Certainly it requires a very large context window.
    • by unrtst ( 777550 ) on Friday August 22, 2025 @10:48AM (#65607830)

      Use of RAG for the data seems appropriate, but the context window issues are very real. I found this to be very informative:
      https://github.com/NVIDIA/RULE... [github.com] - "RULER: What’s the Real Context Size of Your Long-Context Language Models?"

      TFA doesn't give anything more specific than "100-page prompt". Very rough estimate on size:
      * 100 pages * about 250-500 words per page = 25,000 - 50,000 words
      * my /usr/share/dict/words is just over 104k words, and is 962K (word frequency being completely ignored here)
      * so roughly 1/4 - 1/2 that, or 250K - 500K

      The open source model from that list with the largest claimed context size has a 1M context size. HOWEVER, the largest *effective* length is somewhere between 128K - 256K.

      Assuming these are professionals who actually spent significant time on this (as TFS claims), they'll be well aware of these issues and limits. They probably used every bit of the context window that they could feasibly use, as that sounds like it'd be awful close to the limits of the open source models.

      Anyone happen to know if/what models include the output in the context window? 25 more pages may be really stretching its memory.

      • The way you're analyzing this shows you don't have extensive subject matter expertise. Hint: think in tokens, which varies a bit by model, not bytes.
        • by unrtst ( 777550 )

          The way you're analyzing this shows you don't have extensive subject matter expertise. Hint: think in tokens, which varies a bit by model, not bytes.

          1. How many tokens are in a "100-Page prompt"? Good luck answering that from the info provided.
          2. Did you look at the doc I linked?

          Given those points, what is the issue with this estimation using bytes?

          • I'm familiar with that and other studies related to context length. My observation stands. Do with it as you please.
            • by unrtst ( 777550 )

              Given those points, what is the issue with this estimation using bytes?

              My observation stands. Do with it as you please.

              Gee, thanks for nothing.

              I'll go out on a limb and assume you mean, "yeah, I guess that's fine for this estimate, especially given the the limited info we have to go on, but I don't want to admit it." :-)

    • I do see a difference.

      Training data is the background information used to build the model, and is the same for each problem being solved. You don't build a new model, for example, for each application you intend to generate. That's the job of the prompt, or prompts.

      There might be a case for this to be *context* rather than prompt. But the dividing line between context and prompt is very thin, and it's possible to bury prompts within context. For example, in Google's NotebookLM, you can load as context, a co

  • by nealric ( 3647765 ) on Friday August 22, 2025 @10:17AM (#65607748)

    I am a tax attorney and I have worked with KPMG quite a bit. I've never heard of a two-week timeline as a "traditional" timeline for providing tax guidance. The limiting factor in tax tends to be the development of the facts (usually of an in-progress business deal).

    I've also worked with some tax-specific AI tools (not KPMGs). The one I use now uses Chat GPT as a backend. It is useful, but the problem is that it can still get questions completely wrong, so you have to check closely and still read all of the source material it cites. Regular Chat GPT will not consistently cite the law it relies on, so is close to useless.

    The main barrier to AI in tax is that most businesses will not and cannot give the AI access to its ERP system to train with. Most systems are still too vulnerable to data leakage. The nightmare scenario is that a third party could get the LLM to spit out proprietary non-public financial information. That barrier isn't insurmountable, but current solutions do not provide sufficient comfort.

    There may come a day when most tax compliance is done by AI, but I think we are still some years off before it becomes mainstream. Tax planning will be human- driven for the foreseeable future because pulling the trigger on a particular plan is fundamentally a human judgment call. However, AI would allow the human to dispense with a lot of work and compare different planning options quickly.

    • by davidwr ( 791652 )

      The main barrier to AI in tax is that most businesses will not and cannot give the AI access to its ERP system to train with.

      This shouldn't be hard in principle.

      If a company wants to train an AI using its ERP data, clone the AI first, then put the cloned copy under the control of the company that owns the ERP data.

      In practice, this may be expensive, but in principle, it doesn't seem had.

    • Any LLM will consistently provide citations (hallucinated or otherwise) if you simply ask it to. IANAL, but I have kicked the hell out of members of the bar with research assistance from various LLMs. Law is easier and more forgiving than my actual profession, namely electronics design and software.

      As a sideline, I'm helping some Stanford students figure out LLM applications for legal settings. The main obstacle is the backwards nature of law firms' document storage and retrieval, followed closely by man
      • The tax-specific LLM implementations I've used will annotate and hotlink to actual statutes/cases from an associated research database. I believe they prevent from hallucinating by all limiting cites to what it can find in the database.

        I agree that LLMs will increase productivity quite a bit and will render lower-quality practitioners redundant and will make the remaining ones far more efficient. I can't tell you how many hours of my life I have wasted looking for standard contract language that I know is f

        • I can't tell you how many hours of my life I have wasted looking for standard contract language that I know is found in some precedent document.

          You used to be able to bill for that. Lots of law offices are getting squeezed by larger clients to trim that fat. The whole profession is getting squeezed by other firms' efficiency initiatives, and cutting their own throats in the process. IANAL, so trot out the old reliable Shakespeare quotes. :-)

          As to "kicking the hell out of members of the bar." Keep in mind that 1) like all professions, some lawyers are terrible, 2) even the best lawyer will lose (and should lose) a case if the law/facts are against them. That said, I've come across a lot of very smart but non-legal educated folks who believe themselves to be much better at understanding law than they actually are. It tends to be a particular problem with engineers precisely because they do tend to be smart and able to figure things out on their own.

          No true Scotsman, eh? But yeah, that's what they all say a week or two before caving. I'm undefeated--something like 10-0--though: were all of those lawyers bad? They sure had nice offices and talked a big game!

          • Oh, I know full well. I'm an in-house lawyer for a large company, which means part of my job is to squeeze law firms. We are generally happy to pay for deep expertise, but most of the law firms make their money by churning and burning the hours of junior lawyers who don't know much. The problem with AI is how do you create those experts if you can't train junior folks?

            Not sure what types of matters you are referring to with your pro se advocacy. Very rare for an individual to have 10 different discrete cont

            • I am well aware of the context in Henry VI--after my decades of bad encounters with members of the legal profession (two of my best friends excepted), I will seize on any justification to throw the (you ;-) ) rascals out.

              I've had a number of disputes where someone had an attorney and I did not (I've never hired one) in civil, business, and family law. I've had others where the beef was with the attorney directly--the last one of those was the founding partner of the firm. Had his grasp of the law been be
  • by SlashbotAgent ( 6477336 ) on Friday August 22, 2025 @10:18AM (#65607750)

    It sounds like they wrote a program with a programming language that produces fuzzy output in copious quantities.

    Good for them, I guess.

  • Two more weeks (Score:4, Interesting)

    by Dan East ( 318230 ) on Friday August 22, 2025 @10:35AM (#65607798) Journal

    Do they spend an additional two weeks then verifying that all that information is correct and not hallucinated, or is it enough that the information, real or fabricated, is simply presented in a well-written manner?

    If a client then acts upon that tax advice and it costs them millions in fines or jail time, is the tax advice company liable? Must not be...

  • "Agents" do stuff for you. "Bots" and "AI" do stuff for you. You don't need the "agentic" part, we get it the "bot" or "AI" can do stuff.
    • by KlomDark ( 6370 )
      "Agentic" is the new way where it runs thru multiple AI models for better results. It's a real term, get familiar with it
  • by argStyopa ( 232550 ) on Friday August 22, 2025 @10:58AM (#65607854) Journal

    "KPMG Australia developed a 100-page prompt that transforms tax legislation and partner expertise into an agent producing comprehensive tax advice within 24 hours rather than the traditional two-week timeline."
    So, certainly they're reducing their FEE for such advice proportionally, yes? I mean, aside from the initial hours (more or less a one-time input), no human time is taken so what would we be paying their $500/hourly rate on, again?

  • how AI is going to make it all simpler, daddy.

    And now we'll get to try to debug this shit when there is *nobody* who really knows how it's working in the first place.

  • A flat tax... It's one sentence long: "20% of all income; no more deductions, ever."

    Bonus, no loopholes. And we'd all be in the same voting boat of lowering taxes versus the class warfare nonsense that happens at the polls now.

    Try it, you'll like it!
    • A flat tax... It's one sentence long: "20% of all income; no more deductions, ever." Bonus, no loopholes. And we'd all be in the same voting boat of lowering taxes versus the class warfare nonsense that happens at the polls now. Try it, you'll like it!

      Define income and a good tax specialist will find the loopholes.

      • by kackle ( 910159 )
        "Income": All that comes in. Buy a building for a million dollars, sell it later for $800k, pay income tax on $800k. Heck the tax rate might fall to 15%, or even lower. Extra bonus: People would avoid sloppy spending.
        • "Income": All that comes in. Buy a building for a million dollars, sell it later for $800k, pay income tax on $800k. Heck the tax rate might fall to 15%, or even lower. Extra bonus: People would avoid sloppy spending.

          So I take out an 800K loan, sell the building for $1 and the buyer assumes the loan. No income since it was a loan.

          • by kackle ( 910159 )
            Interesting. Apologies, I didn't know loans could be passed-on in that manner. Maybe they should be outlawed. Doesn't the bank want its money back from the borrower, ever?
            • Interesting. Apologies, I didn't know loans could be passed-on in that manner. Maybe they should be outlawed. Doesn't the bank want its money back from the borrower, ever?

              No worries, and the current tax law now has various rules to make tax free transfers harder as companies did something similar to enable tax free sales. The bank doesn't really care as the new owner assumes the loan and it is still backed by the property. My point was defining income is hard, and as soon as the rules are established you look for ways to game the system. Eventually you have a system as complicated as todays as loopholes get plugged.

    • Several fallacies. As others have noted, how you define income is not trivial. If it is all the cash you receive, this means no one can ever loan money.

      The other issues is that the simplicity is in no way related to a single rate. The rate can be progressive, and the method still simple. Don't try to push for one using the other argument. Classic straw-man.
      • by kackle ( 910159 )
        Why can't money be loaned? Receiving it back could, arguably, trigger the flat income tax, no?
    • A flat tax... It's one sentence long: "20% of all income; no more deductions, ever."

      Why do you want an inherently regressive tax? By not allowing deductions for necessities you've created a system which harms the poor, who have to spend a larger percentage of their income on those. I don't like it, and I don't understand why you do.

      • by kackle ( 910159 )
        The poor won't pay diddly because they don't make diddly.
        • The poor won't pay diddly because they don't make diddly.

          Yes, that's a good reason why it's pointlessly punitive to tax them so much on necessities. Why are the proponents of a flat tax always such heartless dumbshits?

          • by kackle ( 910159 )
            Define "so much", and why is it pointless to tax them; aren't they part of society?

            I think perceived fairness is an important, underrated attribute that prevents resentment and class warfare, especially in the voting booth. I'll bet we're stuck with "orangey" for 4 years because of it.
  • Me thinks the opposite. But, accounting is the most automate-able job, unless you want to pay even more for the illegal tax dodging service. Either is tax deductible of course. Sometimes I wonder if the laws are somehow bent in favour of certain people...

Everybody likes a kidder, but nobody lends him money. -- Arthur Miller

Working...