ARM-Based Japanese Supercomputer is Now the Fastest in the World (theverge.com) 72
A Japanese supercomputer has taken the top spot in the biannual Top500 supercomputer speed ranking. Fugaku, a computer in Kobe co-developed by Riken and Fujitsu, makes use of Fujitsu's 48-core A64FX system-on-chip. It's the first time a computer based on ARM processors has topped the list. From a report: Fugaku turned in a Top500 HPL result of 415.5 petaflops, 2.8 times as fast as IBM's Summit, the nearest competitor. Fugaku also attained top spots in other rankings that test computers on different workloads, including Graph 500, HPL-AI, and HPCG. No previous supercomputer has ever led all four rankings at once. While fastest supercomputer rankings normally bounce between American- and Chinese-made systems, this is Japan's first system to rank first on the Top500 in nine years since Fugaku's predecessor, Riken's K computer. Overall there are 226 Chinese supercomputers on the list, 114 from America, and 30 from Japan. US-based systems contribute the most aggregate performance with 644 petaflops.
Meanwhile in the Apple thread (Score:1, Insightful)
Hurr durr ARM chips are slow
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Interconnect speed is just as important as the cpu crunch rates, if you can't supply the data then you go very slowly.
This beast has 512bit SIMD which is stunningly powerful if you can keep the data loaded up.
Re: (Score:3)
Re: (Score:2)
I do wonder why these computers are worthwhile, in the era of distributed computing with near-infinite cloud resources at our disposal, and competition driving on-demand compute down to unbelievably low costs.
Hard to tell if this was intended to be serious or sarcastic.
Re:Meanwhile in the Apple thread (Score:4, Insightful)
Meanwhile in the Apple thread Hurr durr ARM chips are slow
ARM is slow, core for core. This machine required 3X as many cores to be 2.8X faster than the IBM machine. That's a 7% gap, even with Fujitsu's custom floating point units. Apple is not going to put 256 cores in their ARM iMac and even if they did, it wouldn't help render a fucking web page any faster, which is what their users want it to do. Single thread performance of Xeon and EPYC chips is twice as fast [phoronix.com] as the very latest POWER9, and the IBM machine is using older chips than that.
Yeah, ARM is slow. It's just cheap to run lots of them, so it's useful when solving embarassingly parallel problems.
Re: (Score:2)
Apple is optimizing their laptops to be ipads with a built in keyboard.
Microsoft already went that direction with Surface Book. It even lets you detach the tablet/screen for use by itself and made the keyboard a docking station.
If you are running web apps and productivity software with some games mixed in, we hit the threshold for sufficient speed some time ago if the application efficiently uses the GPU. And Apple's integrated GPU is better than Intel's.
Meanwhile in the desktop space, AMD is dominating w
Re: (Score:2)
Better let Apple know they're wasting their time then.
If Apple were foolish enough to try to release a Mac Pro with ARM CPUs, they would be wasting their time. As it is, the writing on the wall. The Mac Pro is done. When all iMacs and Macbooks are ARM, maintaining macOS for two different architectures will be too expensive and they will abandon AMD64 chips, at which point the Mac Pro vanishes.
Apple advertising will continue to lie about their platform being "for creators", but the creators will all leave. They will be forced to. There will be no product t
Re: (Score:2)
Maybe you're right on the Mac Pro and maybe not. Interesting take though.
I think you're wrong about everything else though. Apple MAY be able to make competitive processors for Macs, at least for a while. Maybe even at the Mac Pro class.
The problem is that history shows us that manufacturers start with their own custom processors but exit that market. Apple will be the only example that enters it instead, and uses technology driving from below rather than above to do it, no less.
Even if Apple were succe
Re:Meanwhile in the Apple thread (Score:4, Informative)
You better tell them that editing three 4K videos with real-time effects preview is not possible on the A12Z then, because they were foolish enough to show exactly that in their WWDC video.
Keep in mind the A12Z was designed for the battery and thermal constraints of the iPad Pro, and their custom chips for Macs will be even more powerful.
Re: (Score:3)
Because that's a task that is massively parallel and GPU accelerated.
All the stuff they showed off was carefully selected to be highly parallel or GPU bound or both.
Re: (Score:2)
And what are most Mac users doing, if not either simple tasks (web surfing, document editing, etc) or massively parallel and GPU accelerated tasks (audio and video editing)?
What you people are most probably talking about is extremely specialized tasks that only a minuscule sub-set of Mac Pro users do, which themselves are a minuscule sub-set of Mac users, themselves a minuscule sub-set of desktops/laptops computer users. And keep in mind Apple can add task-specific silicon to their future ARM-based Mac Pro
Re: (Score:1)
This machine required 3X as many cores to be 2.8X faster than the IBM machine.
You’re forgetting that the bulk of summits compute capacity is the 6 GPUs per node.
Re: (Score:2)
... so it's useful when solving embarassingly problems.
Researchers go to great efforts to parallelize problems that are not "embarrassingly parallel" , so no it is not only usefull for "embarassingly parallel" problems. That being said, the first part of your comment seems to be spot on.
Re: (Score:2)
The systems it beat are based on IBM power so they're already crying.
Re:uh oh AMD (Score:4, Informative)
To whomever modded me flamebait and troll, here's a link for you to read: https://www.top500.org/lists/t... [top500.org] AMD shows up 5 times in the top 100. You being a fanboi does not make me a troll.
Re: (Score:2)
Frontier will have as much processing power as the next 160 fastest supercomputers combined.
FWIW, I think the second sentence in your post is what the mods have trouble with.
Re: (Score:2)
FWIW, I think the second sentence in your post is what the mods have trouble with.
That I see a very strong correlation between very stupid actual flamebait posts and people who have accounts choosing not to associate themselves with the garbage they spew?
Mate that's insightful, not flamebait ;-)
Re:uh oh AMD (Score:4, Insightful)
"Why would they? AMD is barely present in the supercomputer scene."
That part got you the "informative" mods.
"Not surprised you posted AC, I wouldn't want my name displayed against such a stupid comment as yours either."
This part got you the "troll" and "flamebait" mods.
Stick to the facts and leave the ad-hominem to the trolls.
Re: (Score:2)
As though the high road works on /. ;)
Of course, bitching about mods doesn't either.
Re: (Score:2)
That part got you the "informative" mods.
False. I had not informative mods at the time I posted.
This part got you the "troll" and "flamebait" mods.
That I see a very strong correlation between very stupid actual flamebait posts and people who have accounts choosing not to associate themselves with the garbage they spew? Mate that's insightful, not flamebait ;-) Telling idiots they need to go FOAD is not trolling, it is my most sincere hope to improve this forum.
Remember, ACs have always been a cesspool of filth and shit. But now that you can't post AC without also having an account it means that pe
Re: (Score:1)
"AMD is barely present in the supercomputer scene"
Every AMD system excepting ONE in the top 500 is middle of the pack or higher. Most of the machines in the top 500 aren't even built for a REASON, just to blaze the shit benchmark and nothing else. Once you eliminate those useless fucks, AMD is *QUITE* present as there are only about 40 or so real machines in that whole list.
But they don't really NEED to be present, because they already have much, much larger supercomputers in the form of clustered cloud dat
Re:uh oh AMD (Score:5, Informative)
Unlikely, given that AMD and Cray are collaborating on an exascale system for the DoE, which will certainly take first place when it comes online later next year: https://arstechnica.com/gadget... [arstechnica.com]
DoE systems aren't always "available to academics" (Score:3)
The purpose for building some of the DoE beasts isn't for academia to run "a wide range of simulations and experiments" but rather for nuclear weapons simulations; the current #3 machine, Sierra, is absolutely not available to the public.
Re: (Score:1)
Re: DoE systems aren't always "available to academ (Score:3)
You missed the SOME DoE systems are classified. Not all, by any means, but it wouldn't be radical for them to build the world's most powerful system and then lock it away.
Re: (Score:2)
That's a common criticism of TOP500 systems. There are many that are built only to take advantage of the benchmark and never really put into practical use - this is a fairly common criticism of the Chinese systems in particular, like the Tianhe-1, which didn't do very much with its homegrown CPUs despite load work onto Xeon Phis, which also helped it achieve its ridiculously high core count. The Japanese approach is quite different, meaning that they're willing to go for longer periods of time knocked out o
Re: (Score:1)
Re: (Score:2)
It also goes the other way, supercomputer centres wanting to increasingly leverage things like Cloud for provisioning, monitoring, and workload management, especially as they open up access to industrial users outside of their institute. One of our partners in particular was quite enthusiastic about this, until they had to figure out how they were going to expose Prometheus metrics for some of their legacy simulation code. When faced with the joyous task of developing a REST API and metric exporter in Fortr
Re: (Score:1)
"Are the people current writing codes just incapable of critical thought and comprehension?"
How the fuck do you think we have all these bloated programs and applications now days?
Dingdingding. Incapable of critical thinking and writing their own solutions.
Re: (Score:2)
Most of the people writing the code are coming from the scientific community, and often lack basic skills in systems-level programming. The new generation of researchers are also no longer learning languages like Fortran, so it's really in a state of simply doing what they can to keep things running while being fairly limited in their ability to extend anything. You also face a generational clash especially between HPC purists and Cloud developers, but this is gradually improving as Cloud becomes a more acc
Any information on interconnect and other details? (Score:4, Interesting)
It's certainly interesting that Fujitsu went with ARM this time, since they have historically used SPARC64 for this sort of thing; but an article about a supercomputer is virtually meaningless without knowing how it's put together, since that stops being an out-of-box feature at around 8 sockets; and is one of the biggest factors in how well or poorly the system scales if you just keep throwing more nodes at it.
Re: (Score:2)
Re:Any information on interconnect and other detai (Score:4, Informative)
Here are the slides [hotchips.org] from a presentation that Fujitsu did earlier this year. Some interesting points is that it has 32GB HMB2 on each die, and beefed up SIMD compared to consumer ARM chips. It uses the Tofu interconnect; not if/how it has evolved from their SPARC machines
Re: (Score:3)
Fujitsu switched from SPARC to ARM mostly because ARM now belongs to softbank, hence Japan. And they're using their Tofu interconnect (new generation I presume) as for their previous SPARC-based system.
So nothing surprising here. Just an impressive system which seems 100% CPU-based, which would have been an awesome system for pure MPI traditional HPC codes, if the interconnect had not been a 3D torus. Also, not the best for AI research.
This didn't replace x86 (Score:5, Informative)
Funny thing is this didn't replace x86.
It replaced UltraSparc XIfx. It's even called the A64fx.
Fujitsu like custom CPUs so they can tightly integrate their interconnect (one of the most important parts of a supercomputer) and fast, wide custom floating point unit (it's not NEON). They used to use SPARC, they've moved over to ARM.
Re:This didn't replace x86 (Score:4, Funny)
A64fx is crap, there's barely any games for it.
I'll stick with my N64.
Re:This didn't replace x86 (Score:4, Interesting)
obligatory XKCD for people who just don't understand what censorship means
https://xkcd.com/1357/ [xkcd.com]
Not the most efficient (Score:2)
Interestingly while much of the world lauds ARM for being the power efficient, Fugaku is only number 9 on the Green500 list. But then the top performer in this category is only 1/20th of the performance, and interestingly also Japanese.
It was surprising to see IBM Power based CPUs best this beast in Gflops/watts though.
Re: (Score:2)
Re: (Score:2)
Interestingly while much of the world lauds ARM for being the power efficient, Fugaku is only number 9 on the Green500 list. But then the top performer in this category is only 1/20th of the performance, and interestingly also Japanese.
It was surprising to see IBM Power based CPUs best this beast in Gflops/watts though.
I know that Going Green is "in" right now, but worrying about power consumption when building massive supercomputers is akin to a NASCAR driver worrying about getting less than 30MPG during a race.
Not like we're building millions of these things.
Re: (Score:2)
It matters even if you don't care about 'green':
You can only fit so much cooling in a fixed space. Therefore, more power-efficient cores = more cores can be packed in that space. And/or more RAM. Translating into a physically shorter interconnect between 2 average nodes.
Typical supercomputer jobs are not like Bitcoin mining where it's 'all' (local) compute work + a small amount of network traffic between nodes. The speed of some jobs depends directly on how fast nodes can exchange data. So the intercon
Re: (Score:2)
...Not to mention the power bill... With machine running 24/7 (as supercomputers do), using less-efficient / cheaper / older CPU's is just dumb when it adds $100k..$1M or more per year to the power bill.
And what is that cost in terms of percentage of the overall project? Again, this argument appears as weak as a NASCAR driver worried about fuel economy.
So designers will go for CPU / GPU / FPGA / ASIC combo's that give good bang/Joule in terms of compute work done for the kind of jobs the machine is built for. Not much different from how a gamer wouldn't buy the cheapest video card since it would do poor on the games he/she cares about.
Not sure how the gamer analogy fits here, since I've never even heard of a gamer who was worried about the power bill to the point of being selective about hardware. If you're being that cheap, then don't even bother building it. Otherwise, accept that every massive computing environment is built for purpose, and each of them create metric fucktons of meas
Re: (Score:1)
in other words... (Score:2)
In other words just before version 1.0 of GNU Hurd is released.
Summit&Sierra, built back in 2018 (Score:3)
POWER9-based; IBM open-sourced it last August. Haven't heard of anyone even dreaming of open-source GPUs yet though.
Re: (Score:2)
Shouldn't some of these be Open Hardware instead of Open Source?
Can we retire this? (Score:1)
This is getting to be just like a bunch of PC fanboi builders all looking for top spot on FutureMark/3DMark, CPU-Z, SiSoft Sandra,.... you get the idea!
Call us when an new order of magnitude is reached instead... that is news worthy!
Unsurprising (Score:3, Interesting)
Three and a half reasons
1. Popularity: Tablets/Phones are selling like hotcakes and traditional desktop has been less than stellar. So obviously, if everyone is buying these things, engineers are going to be working the most on these chips.
2. Investments: ARM is made up of a lot of players, that's a lot more money and a lot more warm bodies working on it.
3. Direction: The x86/64 platform is dictated by pretty much a single player (maybe two if we're being generous). So if your optimization doesn't get in, it's tough cookies. ARM gives a pretty clean base to start from and each vendor works up from there.
3.5. Product: The fact that ARM isn't dictated by a single player gives companies a lot of room to make their end product better than the others. That's an extra incentive to build a product off ARM.
Now ARM hasn't until maybe about three years ago, been at a point where it could give the x64 crew a run for their money. However, ARM has had the most investment dollars and engineer brain share for easily the last six to seven years. So with the numbers that have been behind it, it's unsurprising that we're now where we are at.
It doesn't hurt that ARM is a RISC machine versus the CISC Frankenstein that Intel over the decades has become. But the ARM platform has also become a lot more modular. So if you wanted to take a design for say a tablet SoC, lop off the thermal controller that assumes a passive cooler and put instead a server circuit on there that's expecting active cooling, well then that's pretty straight forward in silicone to do.
Intel's tinfoil level grasp on their platform has pretty much killed it and they've literally no one to blame but themselves. ARM has way more dollars flowing into it now. These things like this super computer and Apple's transition to ARM, those didn't happen last week. This has been building up and up and we're just starting to see the spring break from the ground and start flowing.
If Intel wants a future, they have to get people excited for their platform again. They've got to start bringing back investment dollars to them. They've got to get engineer mind share again. Otherwise, they're going to find their x86/x64 going the way of Itanium. You just can't hold your IP with a steel grip. That's the most effective way to kill it.
Re: (Score:2)
You just can't hold your IP with a steel grip. That's the most effective way to kill it.
I don't know. Apple's done pretty well with it.
Re: (Score:2)
3. Direction: The x86/64 platform is dictated by pretty much a single player (maybe two if we're being generous). So if your optimization doesn't get in, it's tough cookies. ARM gives a pretty clean base to start from and each vendor works up from there.
3.5. Product: The fact that ARM isn't dictated by a single player gives companies a lot of room to make their end product better than the others. That's an extra incentive to build a product off ARM.
There are indeed a lot of companies working on ARM processors. However, aside from ARM itself, those companies pretty much keep all their progress to themselves, which means that there is no synergy from all those design efforts. Effectively, each design benefits from just two design efforts: from ARM and from that one company. For example, Apple has a leading-edge ARM design, and no other company gets to benefit from that progress.
Imagine a Beowulf Cluster (Score:2)
of these things! I bet it could execute an infinite loop in 10 seconds!
Love the Green Blinkenlights (Score:2)
The picture from TFA makes this supercomputer look very Matrix.
Amusing (Score:2)
When using the sublist generator to determine operating system share, Linux returns 486 of the 500 while everything else returns "Server Error (500)".
Yep, seems about right to me.
Finally (Score:1)
A fast computer with ARM processors. I wonder how fast it could compile a Linux kernel.
Old school oblig (Score:1)
(Choose one or more)
1. Duke Nukem Forever
2.CmdrTaco's homepage http://cmdrtaco.net/ [cmdrtaco.net]
3. In Soviet Russia, Beowulf cluster of Fugaku runs YOU
Re: (Score:2)