Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Japan Technology

ARM-Based Japanese Supercomputer is Now the Fastest in the World (theverge.com) 72

A Japanese supercomputer has taken the top spot in the biannual Top500 supercomputer speed ranking. Fugaku, a computer in Kobe co-developed by Riken and Fujitsu, makes use of Fujitsu's 48-core A64FX system-on-chip. It's the first time a computer based on ARM processors has topped the list. From a report: Fugaku turned in a Top500 HPL result of 415.5 petaflops, 2.8 times as fast as IBM's Summit, the nearest competitor. Fugaku also attained top spots in other rankings that test computers on different workloads, including Graph 500, HPL-AI, and HPCG. No previous supercomputer has ever led all four rankings at once. While fastest supercomputer rankings normally bounce between American- and Chinese-made systems, this is Japan's first system to rank first on the Top500 in nine years since Fugaku's predecessor, Riken's K computer. Overall there are 226 Chinese supercomputers on the list, 114 from America, and 30 from Japan. US-based systems contribute the most aggregate performance with 644 petaflops.
This discussion has been archived. No new comments can be posted.

ARM-Based Japanese Supercomputer is Now the Fastest in the World

Comments Filter:
  • by Anonymous Coward

    Hurr durr ARM chips are slow

    • by vlad30 ( 44644 )
      whats interesting is that this machine has 7.2M cores vs 2.4M for the IBM and uses 28MW vs 10MW so the speed increase in roughly on par core for core not bad for an ARM design vs a specific for supercomputer POWER architecture.
      • I do wonder why these computers are worthwhile, in the era of distributed computing with near-infinite cloud resources at our disposal, and competition driving on-demand compute down to unbelievably low costs. When you need to solve a problem in a given time, you can spin up node after node to contribute to the performance to provide a solution.
        • by Mr307 ( 49185 )

          Interconnect speed is just as important as the cpu crunch rates, if you can't supply the data then you go very slowly.

          This beast has 512bit SIMD which is stunningly powerful if you can keep the data loaded up.

        • The issue isn't necessarily the number of nodes, it's i/o and communication (internal and between nodes). The supercomputers achieve high performance by optimizing the i/o and communication, just having many nodes won't give the same performance.
        • I do wonder why these computers are worthwhile, in the era of distributed computing with near-infinite cloud resources at our disposal, and competition driving on-demand compute down to unbelievably low costs.

          Hard to tell if this was intended to be serious or sarcastic.

    • by Areyoukiddingme ( 1289470 ) on Tuesday June 23, 2020 @12:34PM (#60217950)

      Meanwhile in the Apple thread Hurr durr ARM chips are slow

      ARM is slow, core for core. This machine required 3X as many cores to be 2.8X faster than the IBM machine. That's a 7% gap, even with Fujitsu's custom floating point units. Apple is not going to put 256 cores in their ARM iMac and even if they did, it wouldn't help render a fucking web page any faster, which is what their users want it to do. Single thread performance of Xeon and EPYC chips is twice as fast [phoronix.com] as the very latest POWER9, and the IBM machine is using older chips than that.

      Yeah, ARM is slow. It's just cheap to run lots of them, so it's useful when solving embarassingly parallel problems.

      • This machine required 3X as many cores to be 2.8X faster than the IBM machine.

        You’re forgetting that the bulk of summits compute capacity is the 6 GPUs per node.

      • ... so it's useful when solving embarassingly problems.

        Researchers go to great efforts to parallelize problems that are not "embarrassingly parallel" , so no it is not only usefull for "embarassingly parallel" problems. That being said, the first part of your comment seems to be spot on.

  • by fuzzyfuzzyfungus ( 1223518 ) on Tuesday June 23, 2020 @11:29AM (#60217564) Journal
    Anyone have an article with more of the relevant details; like what interconnect it uses; and whether it's pure CPU, CPU and GPU, CPU plus specific ASICs or FPGAs?

    It's certainly interesting that Fujitsu went with ARM this time, since they have historically used SPARC64 for this sort of thing; but an article about a supercomputer is virtually meaningless without knowing how it's put together, since that stops being an out-of-box feature at around 8 sockets; and is one of the biggest factors in how well or poorly the system scales if you just keep throwing more nodes at it.
    • by vlad30 ( 44644 )
      actually there is another smaller machine with the same processor etc on the list at #37 maybe a test for this one?
    • by pavon ( 30274 ) on Tuesday June 23, 2020 @12:33PM (#60217944)

      Here are the slides [hotchips.org] from a presentation that Fujitsu did earlier this year. Some interesting points is that it has 32GB HMB2 on each die, and beefed up SIMD compared to consumer ARM chips. It uses the Tofu interconnect; not if/how it has evolved from their SPARC machines

    • Fujitsu switched from SPARC to ARM mostly because ARM now belongs to softbank, hence Japan. And they're using their Tofu interconnect (new generation I presume) as for their previous SPARC-based system.

      So nothing surprising here. Just an impressive system which seems 100% CPU-based, which would have been an awesome system for pure MPI traditional HPC codes, if the interconnect had not been a 3D torus. Also, not the best for AI research.

  • by serviscope_minor ( 664417 ) on Tuesday June 23, 2020 @11:30AM (#60217576) Journal

    Funny thing is this didn't replace x86.

    It replaced UltraSparc XIfx. It's even called the A64fx.

    Fujitsu like custom CPUs so they can tightly integrate their interconnect (one of the most important parts of a supercomputer) and fast, wide custom floating point unit (it's not NEON). They used to use SPARC, they've moved over to ARM.

  • Interestingly while much of the world lauds ARM for being the power efficient, Fugaku is only number 9 on the Green500 list. But then the top performer in this category is only 1/20th of the performance, and interestingly also Japanese.

    It was surprising to see IBM Power based CPUs best this beast in Gflops/watts though.

    • by vlad30 ( 44644 )
      Actually its prototype is higher on that list (#4) and it stayed in the top ten suggesting it scaled well
    • Interestingly while much of the world lauds ARM for being the power efficient, Fugaku is only number 9 on the Green500 list. But then the top performer in this category is only 1/20th of the performance, and interestingly also Japanese.

      It was surprising to see IBM Power based CPUs best this beast in Gflops/watts though.

      I know that Going Green is "in" right now, but worrying about power consumption when building massive supercomputers is akin to a NASCAR driver worrying about getting less than 30MPG during a race.

      Not like we're building millions of these things.

      • It matters even if you don't care about 'green':

        You can only fit so much cooling in a fixed space. Therefore, more power-efficient cores = more cores can be packed in that space. And/or more RAM. Translating into a physically shorter interconnect between 2 average nodes.

        Typical supercomputer jobs are not like Bitcoin mining where it's 'all' (local) compute work + a small amount of network traffic between nodes. The speed of some jobs depends directly on how fast nodes can exchange data. So the intercon

        • ...Not to mention the power bill... With machine running 24/7 (as supercomputers do), using less-efficient / cheaper / older CPU's is just dumb when it adds $100k..$1M or more per year to the power bill.

          And what is that cost in terms of percentage of the overall project? Again, this argument appears as weak as a NASCAR driver worried about fuel economy.

          So designers will go for CPU / GPU / FPGA / ASIC combo's that give good bang/Joule in terms of compute work done for the kind of jobs the machine is built for. Not much different from how a gamer wouldn't buy the cheapest video card since it would do poor on the games he/she cares about.

          Not sure how the gamer analogy fits here, since I've never even heard of a gamer who was worried about the power bill to the point of being selective about hardware. If you're being that cheap, then don't even bother building it. Otherwise, accept that every massive computing environment is built for purpose, and each of them create metric fucktons of meas

  • This is getting to be just like a bunch of PC fanboi builders all looking for top spot on FutureMark/3DMark, CPU-Z, SiSoft Sandra,.... you get the idea!

    Call us when an new order of magnitude is reached instead... that is news worthy!

  • Unsurprising (Score:3, Interesting)

    by slack_justyb ( 862874 ) on Tuesday June 23, 2020 @11:59AM (#60217778)

    Three and a half reasons

    1. Popularity: Tablets/Phones are selling like hotcakes and traditional desktop has been less than stellar. So obviously, if everyone is buying these things, engineers are going to be working the most on these chips.

    2. Investments: ARM is made up of a lot of players, that's a lot more money and a lot more warm bodies working on it.

    3. Direction: The x86/64 platform is dictated by pretty much a single player (maybe two if we're being generous). So if your optimization doesn't get in, it's tough cookies. ARM gives a pretty clean base to start from and each vendor works up from there.

    3.5. Product: The fact that ARM isn't dictated by a single player gives companies a lot of room to make their end product better than the others. That's an extra incentive to build a product off ARM.

    Now ARM hasn't until maybe about three years ago, been at a point where it could give the x64 crew a run for their money. However, ARM has had the most investment dollars and engineer brain share for easily the last six to seven years. So with the numbers that have been behind it, it's unsurprising that we're now where we are at.

    It doesn't hurt that ARM is a RISC machine versus the CISC Frankenstein that Intel over the decades has become. But the ARM platform has also become a lot more modular. So if you wanted to take a design for say a tablet SoC, lop off the thermal controller that assumes a passive cooler and put instead a server circuit on there that's expecting active cooling, well then that's pretty straight forward in silicone to do.

    Intel's tinfoil level grasp on their platform has pretty much killed it and they've literally no one to blame but themselves. ARM has way more dollars flowing into it now. These things like this super computer and Apple's transition to ARM, those didn't happen last week. This has been building up and up and we're just starting to see the spring break from the ground and start flowing.

    If Intel wants a future, they have to get people excited for their platform again. They've got to start bringing back investment dollars to them. They've got to get engineer mind share again. Otherwise, they're going to find their x86/x64 going the way of Itanium. You just can't hold your IP with a steel grip. That's the most effective way to kill it.

    • You just can't hold your IP with a steel grip. That's the most effective way to kill it.

      I don't know. Apple's done pretty well with it.

    • 3. Direction: The x86/64 platform is dictated by pretty much a single player (maybe two if we're being generous). So if your optimization doesn't get in, it's tough cookies. ARM gives a pretty clean base to start from and each vendor works up from there.

      3.5. Product: The fact that ARM isn't dictated by a single player gives companies a lot of room to make their end product better than the others. That's an extra incentive to build a product off ARM.

      There are indeed a lot of companies working on ARM processors. However, aside from ARM itself, those companies pretty much keep all their progress to themselves, which means that there is no synergy from all those design efforts. Effectively, each design benefits from just two design efforts: from ARM and from that one company. For example, Apple has a leading-edge ARM design, and no other company gets to benefit from that progress.

  • of these things! I bet it could execute an infinite loop in 10 seconds!

  • The picture from TFA makes this supercomputer look very Matrix.

  • When using the sublist generator to determine operating system share, Linux returns 486 of the 500 while everything else returns "Server Error (500)".

    Yep, seems about right to me.

  • A fast computer with ARM processors. I wonder how fast it could compile a Linux kernel.

  • Could a Beowulf cluster of these run:
    (Choose one or more)
    1. Duke Nukem Forever
    2.CmdrTaco's homepage http://cmdrtaco.net/ [cmdrtaco.net]
    3. In Soviet Russia, Beowulf cluster of Fugaku runs YOU
  • Comment removed based on user account deletion

"Pok pok pok, P'kok!" -- Superchicken

Working...