Forgot your password?
typodupeerror
Software Hardware Technology

CUDA Proves Nvidia Is a Software Company (wired.com) 46

Nvidia's real AI moat isn't "a piece of hardware," writes Wired's Sheon Han. It's CUDA: a mature, deeply optimized software ecosystem that keeps machine-learning workloads tied to Nvidia GPUs. An anonymous reader quotes a report from Wired: What sounds like a chemical compound banned by the FDA may be the one true moat in AI. CUDA technically stands for Compute Unified Device Architecture, but much like laser or scuba, no one bothers to expand the acronym; we just say "KOO-duh." So what is this all-important treasure good for? If forced to give a one-word answer: parallelization. Here's a simple example. Let's say we task a machine with filling out a 9x9 multiplication table. Using a computer with a single core, all 81 operations are executed dutifully one by one. But a GPU with nine cores can assign tasks so that each core takes a different column -- one from 1x1 to 1x9, another from 2x1 to 2x9, and so on -- for a ninefold speed gain. Modern GPUs can be even cleverer. For example, if programmed to recognize commutativity -- 7x9 = 9x7 -- they can avoid duplicate work, reducing 81 operations to 45, nearly halving the workload. When a single training run costs a hundred million dollars, every optimization counts.

Nvidia's GPUs were originally built to render graphics for video games. In the early 2000s, a Stanford PhD student named Ian Buck, who first got into GPUs as a gamer, realized their architecture could be repurposed for general high-performance computing. He created a programming language called Brook, was hired by Nvidia, and, with John Nickolls, led the development of CUDA. If AI ushers in the age of a permanent white-collar underclass and autonomous weapons, just know that it would all be because someone somewhere playing Doom thought a demon's scrotum should jiggle at 60 frames per second. CUDA is not a programming language in itself but a "platform." I use that weasel word because, not unlike how The New York Times is a newspaper that's also a gaming company, CUDA has, over the years, become a nested bundle of software libraries for AI. Each function shaves nanoseconds off single mathematical operations -- added up, they make GPUs, in industry parlance, go brrr.

A modern graphics card is not just a circuit board crammed with chips and memory and fans. It's an elaborate confection of cache hierarchies and specialized units called "tensor cores" and "streaming multiprocessors." In that sense, what chip companies sell is like a professional kitchen, and more cores are akin to more grilling stations. But even a kitchen with 30 grilling stations won't run any faster without a capable head chef deftly assigning tasks -- as CUDA does for GPU cores. To extend the metaphor, hand-tuned CUDA libraries optimized for one matrix operation are the equivalent of kitchen tools designed for a single job and nothing more -- a cherry pitter, a shrimp deveiner -- which are indulgences for home cooks but not if you have 10,000 shrimp guts to yank out. Which brings us back to DeepSeek. Its engineers went below this already deep layer of abstraction to work directly in PTX, a kind of assembly language for Nvidia GPUs. Let's say the task is peeling garlic. An unoptimized GPU would go: "Peel the skin with your fingernails." CUDA can instruct: "Smash the clove with the flat of a knife." PTX lets you dictate every sub-instruction: "Lift the blade 2.35 inches above the cutting board, make it parallel to the clove's equator, and strike downward with your palm at a force of 36.2 newtons."
"You can begin to see why CUDA is so valuable to Nvidia -- and so hard for anyone else to touch," writes Han. "Tuning GPU performance is a gnarly problem. You can't just conscript some tender-footed undergrad on Market Street, hand them a Claude Max plan, and expect them to hack GPU kernels. Writing at this level is a grindsome enterprise -- unless you're a cracker-jack programmer at DeepSeek..."

Han goes on to argue that rivals like AMD and Intel offer competitive specs on paper, but their software stacks have struggled with bugs, compatibility issues, and weak adoption. As a result, Nvidia has built an Apple-like moat around AI computing, leaving the industry dependent on its expensive hardware.

CUDA Proves Nvidia Is a Software Company

Comments Filter:
  • by Luckyo ( 1726890 ) on Monday May 11, 2026 @06:07PM (#66139252)

    AI could solve this by bypassing this moat to enable translation to openCL.

    Considering just how good AI is at this sort of work once properly trained, I would be surprised if this doesn't happen. Though Nvidia will certainly fight anyone trying to do this to slow it down.

    • Funniest post of the year!
    • by ShanghaiBill ( 739463 ) on Tuesday May 12, 2026 @01:57AM (#66139614)

      I'm not sure why this is modded Funny. It should be modded Insightful.

      Modern AI is pretty good at rewriting CUDA as OpenCL.

      It's not just one click (yet), but AI can do 90% of the work with some human guidance.

      AI can also create a test suite to verify that the translation is correct.

      • It's modded funny because OpenCL is all but dead for new projects. It got weighed down by industry infighting to the point that the big feature of OpenCL 3.0 in 2020 was undoing everything added to the spec after 2011.

        So the idea of using OpenCL as a CUDA replacement, rather than something like ROCm or OneAPI, is funny. It's like rewriting C++ programs to use Pascal.

        • It's like rewriting C++ programs to use Pascal.

          I guess you'd be shocked to learn that that is actually an everyday occurrence. It's usually pronounced "Delphi" instead of "Pascal," though. It's popular in enterprise because it provides deep windoze integration at the same time as portability to mobile OSes.

  • by MIPSPro ( 10156657 ) on Monday May 11, 2026 @06:09PM (#66139254)
    If it's so super-awesome and mind blowing, then just use the current crop of AI to design the next crop and create an open source API or at least something better. What? That's challenging you say? Bah! Nothing is too challenging for AI! Anthropic told me so!
    • I made the same comment about Googleâ(TM)s SDK. If AI is so awesome, why not just write a single SDK in a single language, and AI build the others on each push? Then devs can use their preferred language and it already has full first party support. Seems so simple⦠and the fact that it isnâ(TM)t being done screams pretty loudly.

      • I know everyone makes the same joke regarding AI not being able to do it. And whoever works at the edge of any field knows that the sort of large projects required to be implemented cannot simply be asked a swarm of agents to code, some things are, and will stay out of reach for a while, maybe forever.


        But, could any generous soul explain some specifics about the sort of things that make it that hard? What are the challenges there? Is it the amount of code required to be translated? The secrecy of the
        • by SoftwareArtist ( 1472499 ) on Tuesday May 12, 2026 @12:24AM (#66139578)

          It's not just CUDA itself. AMD has HIP, which is basically a clone of CUDA and works well. But that's just the core pieces, the compiler and runtime. Then there's the higher level libraries NVIDIA provides for special purposes: cuBLAS for linear algebra, cuSPARSE for sparse matrix operations, cuFFT for Fourier transforms, and so on. AMD has mostly managed to create clones of those too. But then there are all the even more specialized libraries [nvidia.com] NVIDIA has spent years creating. Look over the list to get a sense of just how many and how specialized they are. cuLitho for computational lithography. cuQuantum for quantum computing simulations. nvComp for compression and decompression. And on and on.

          And that's just the ones created by NVIDIA. Then there are the thousands of libraries other people have written with CUDA. In principle they could be ported to HIP for AMD, Metal for Apple, and whatever framework Intel is asking people to use this week. But most of them won't be.

        • by gillbates ( 106458 ) on Tuesday May 12, 2026 @11:23AM (#66140066) Homepage Journal

          The biggest problem with replicating CUDA is not the technical aspects, but finding VC with enough brains to know whom to hire. Most CS grads have the knowledge, but not the drive. Most liberal arts grads have the drive, the creativity, but not the knowledge. You need to find one with both, because creating the next Nvidia killer will require someone who is boring enough to reinvent the wheel, but has enough creativity to find novel solutions to performance problems.

          The computer science and hardware engineering behind the hardware and software (Nvidia/CUDA) have been known for decades. The Nvidia hardware could be replicated with FPGAs - notwithstanding any patents Nvidia might have. The software API could be replicated rather easily; parallelism has been known and studied in computer engineering (again) for decades now. What Nvidia did was political - they provided both the hardware and the API to easily use it in one package which could be understood by the C-Suite class. The challenge was never technical, but marketing.

          More specifically, you'd need to understand how compilers work, and how to use YACC or bison, or something similar to generate the compiler code for you. You'd have to understand digital logic and how to create logic functions with NAND gates. If you see an FPGA development kit, know what it is, and think to yourself, "What I could do with that..." you're probably a good fit for the job. And you'd need someone willing to bankroll your project until you could demonstrate that you beat Nvidia on something marketable - like floating point performance. Or power consumption.

          From an engineering standpoint, what Nvidia has done is trivial - because the solution could be reproduced by an engineer using already known techniques. But what Nvidia did was to combine technical knowledge with an understanding of their market to produce the dominant position they have today. Any computer engineer worth his diploma could produce a design with FPGAs that would beat Nvidia GPUs, but Nvidia did it first.

    • create an open source API ...

      That's what OpenCL is.

      OpenCL [wikipedia.org]

      There's a small performance hit because OpenCL runs on any GPU, whereas CUDA is tuned only for Nvidia GPUs.

    • If it's so super-awesome and mind blowing, then just use the current crop of AI to design the next crop and create an open source API or at least something better.

      That's some Deep Thought there.

  • NVIDIA bought Groq, which is never going to run CUDA well, and Anthropic ported to Google's TPU.

    Unless the software stack is a complete disaster OpenAI and Anthropic make do, architecture rules. NVIDIA leads in architecture, no competition for NVLINK and C2C in deployment for instance and Groq in its niche only competes with Cerebras.

    Small players are more dependent on open source and more easily manipulated by lazy devs, but the biggest spenders don't give a shit about CUDA.

  • But a GPU with nine cores ...

    Or any number of, now obsolete, general-purpose, vector-processor systems, like the Cray 2 or even parallel systems like the Myrias Parallel System - both of which I was an SA on *way* back. Parallel operations can speed certain type of workflow.

    Vector supercomputers [wikipedia.org]
    Vector processor [wikipedia.org]

  • Cooperate or Die (Score:2, Insightful)

    by Tablizer ( 95088 )

    rivals like AMD and Intel offer competitive specs on paper, but their software stacks have struggled with bugs, compatibility issues, and weak adoption. As a result, Nvidia has built an Apple-like moat around AI computing, leaving the industry dependent on its expensive hardware.

    Nvidia's competitors need to work together to improve open-source software tooling and to standardize hardware interfaces, or else go the way of Commodore and Tandy.

  • added up, they make GPUs, in industry parlance, go brrr.

    The subtle roast of implying that the industry is a bunch of Gen Alpha middle schoolers. Are the graphics also skibidi?

  • by ndsurvivor ( 891239 ) on Monday May 11, 2026 @07:03PM (#66139334) Journal
    I guess the CEO of NVidia played the long game on AI. They were nothing back in 2012 when they were just a cheap graphics acceleration chip company, and now they bypassed Microsoft in market capital. They don't seem "evil" to me, it seems like a thoughtful company that worked hard, took a long view, and reaped the rewards. I simply hope that they don't get the billionaire bug and becomes evil.
    • Agreed (Score:4, Interesting)

      by JBMcB ( 73720 ) on Monday May 11, 2026 @07:27PM (#66139382)
      They spent a lot of time and money making sure CUDA worked right. For a while AMD's compute API wasn't backwards *or* forwards compatible. You had to do some rewriting and a recompile every time a new API was released.

      Intel has gone through three completely different, and mostly incompatible, hardware stacks. Remember Phi? Altera? Now it's AVX for some compute tasks, and Xe for other tasks.
  • by IdanceNmyCar ( 7335658 ) on Monday May 11, 2026 @11:22PM (#66139540)

    I always hate how people often take success in isolation. A lot of the success of NVIDIA I think comes from its original strong partnership with ASUS which is a hardware manufacturing company. NVIDIA originally did the chip design and at that level it's kind of hard to ignore the software, especially on the driver front. This means they always had a "low-level" team understanding software issues. Then when it came to really building out a commodity GPU, they worked with ASUS.

    For years, I have been a huge fan of ASUS because I think in general, they understand solid hardware design of which NVIDIA's partnership with ASUS is a large part of their success. CUDA is pretty great for the role it has fulfilled in computing, but it also seems like a natural conclusion. As others pointed out, AMD and INTEL have both tried their hands at it, but they screw the pooch in building an effective framework.

    NVIDIA might be getting too cocky or maybe just the fanboys. Either way, I think they are successful because they had very strong strategic partnerships that allowed them as a company to do what they do best. This important note is so often left out when talking about NVIDIA now.

  • It's just ordinary slang.

  • A successful hardware company incrementally shifts its resources from hardware to software, because of financial reasons.
    As the shift progresses it completely changes its industrial DNA until it reaches a point where hardware becomes just a relic of the past.
    This is how ... IBM became today's IBM.

  • by Laxator2 ( 973549 ) on Tuesday May 12, 2026 @04:50AM (#66139694)

    AMD bought ATI 20 years ago, back in 2006, to be the first company (or was it VIA first?) to have an integrated CPU-GPU offering.
    They kept on talking about HSA 1.0 which would make it possible to pass pointers between the CPU and the GPU without the need to copy the actual data between the main memory and the graphics memory.
    I cannot find the original reference, but I still found these:
    https://hothardware.com/news/h... [hothardware.com]
    https://forums.anandtech.com/t... [anandtech.com]

    What happened after that? AMD stopped publishing any drivers that would allow running OpenCL on their APUs.
    They actively blocked programmers from accessing the part that provides 99% of the processing power, just as Nvidia was taking off.
    Obviously everyone working on AI flocked to CUDA and now that platform is entrenched.
    Look at the HSA Foundation now, their most recent "news" are from 2020:
    https://hsafoundation.com/ [hsafoundation.com]
    I wonder where we would be now if developers had the possibility to run OpenCL on the APUs of their laptops.

  • They hope CUDA stays the lock-in it currently is. But the communities are quick to build the required software once the promising hardware is there. If AMD builds consumer-grade cards with let's say 128 GB VRAM, then the interesting AI softwares will be adapted in a few weeks.

  • A long time ago before anyone could spell cellphone, I worked for a company that sold equipment for testing telephone lines (POTS). To help spur the larger independent phone companies into buying our equipment, we had a full software systems for tracking their customers' line history and all that fun stuff. All based on the Motorola 6800 processor. Back what 200MB hard drives rules the world.

    That company did not consider itself a software company either!

    • by ceoyoyo ( 59147 )

      Tech people love to classify things, including companies, as hardware or software. The really successful companies recongize that neither works without the other, there are a lot of opportunities that come with making both, and customers value not having to chase down various suppliers when they have a problem.

  • Is Cuda a lock in because there is a critical mass of solutions written in Cuda and people that think about problems in terms of Cuda already so nothing is really going to unseat it that isn’t a close clone of Cuda and making one of this is for some reason impossible, or is the problem that you can make something else that lets you be expressive in the imprint ways Cuda is while giving the backend the same kind of flexibility to schedule operations, but nobody else has made one that isn’t

  • Someone finally said the quiet part out loud. And nVidia's software underpinnings come from Silicon Graphics / SGI back in the day. Can you say, "OpenCUDA"?

  • Thank you Wired for this insightful article about how CUDA is an impressive tool that creates a moat for NVidia's ongoing business success. Congratulations on waking up to the year 2016, when this was already well-known in the world of computing. The only thing interesting about this article appearing in 2026 is CUDA's continued dominance, which was never really assured.
  • “How do we make CUDA optional?” should be the real question, not “How do we build a better CUDA?”

The rule on staying alive as a program manager is to give 'em a number or give 'em a date, but never give 'em both at once.

Working...