Intel Accused of Inflating Over 2,600 CPU Benchmark Results (pcworld.com) 47
An anonymous reader shared this report from PCWorld:
The Standard Performance Evaluation Corporation, better known as SPEC, has invalidated over 2600 of its own results testing Xeon processors in the 2022 and 2023 version of its popular industrial SPEC CPU 2017 test. After investigating, SPEC found that Intel had used compilers that were, quote, "performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability."
In layman's terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren't indicative of how end users could expect to see performance in the real world. Intel's custom compiler might have been inflating the relevant results of the SPEC test by up to 9%...
Slightly newer versions of the compilers used in the latest industrial Xeon processors, the 5th-gen Emerald Rapids series, do not use these allegedly performance-enhancing APIs. I'll point out that both the Xeon processors and the SPEC 2017 test are some high-level hardware meant for "big iron" industrial and educational applications, and aren't especially relevant for the consumer market we typically cover.
More info at ServeTheHome, Phoronix, and Tom's Hardware.
In layman's terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren't indicative of how end users could expect to see performance in the real world. Intel's custom compiler might have been inflating the relevant results of the SPEC test by up to 9%...
Slightly newer versions of the compilers used in the latest industrial Xeon processors, the 5th-gen Emerald Rapids series, do not use these allegedly performance-enhancing APIs. I'll point out that both the Xeon processors and the SPEC 2017 test are some high-level hardware meant for "big iron" industrial and educational applications, and aren't especially relevant for the consumer market we typically cover.
More info at ServeTheHome, Phoronix, and Tom's Hardware.
2600 counts of fraud (Score:4, Interesting)
Remember when companies like Boeing and Intel were companies with a proud engineering culture?
Yeah me neither, that was a long time ago. Intel has been caught doing this type of stuff for decades.
Re:2600 counts of fraud (Score:5, Informative)
They are accused of breaking this rule [spec.org]. IOW they created optimizations in their compiler that do nothing but speed up one benchmark (2 in total) in the suit but not other similar tasks that this benchmark is supposed to represent.
Pretty much the only reason the SPEC people are not making a big stink about this is because Intel removed these optimizations in later versions of the compiler.
Re: (Score:2)
Important take-away points from this report:
My guess is that somebody tried the benchmark with the special optimization (and without), but a different workload, and realized the optimization suddenly didn't work as well.
Re: (Score:1)
Sounds like an issue with the benchmark then. There are plenty of benchmarks that can simulate real workloads. The SPEC benchmark has very specific applicability, and yes, Xeon's are optimized overall in that realm, it's why people pay for specific SKU of these Xeons. If you need a particular optimization, it's likely you will set your compiler accordingly. I don't understand the issue overall, do they expect SPEC to simulate a scientific workload without optimizations, because that would reduce a lot of th
Re: (Score:2)
Re: (Score:1)
Have you ever seen a SPEC benchmark? It's not just changing the workload.
Re: (Score:2)
Re: 2600 counts of fraud (Score:1)
Iâ(TM)m not trolling, but: isnâ(TM)t optimizing for one target sometimes (oftentimes?) at the expense of performance for another target? If so, optimizing for a narrow, very specific set of operations would leave you underperforming in everything else. And while the benchmark was designed to RESEMBLE a real workload, if their meddling was so precise, then it would definitely only improve an extremely specific set of instructions. In which case we're back to the argument that you would not be buyin
Re: 2600 counts of fraud (Score:5, Insightful)
Iâ(TM)m not trolling, but: isnâ(TM)t optimizing for one target sometimes (oftentimes?) at the expense of performance for another target?
Not in this case. Intel cheated by modifying the compiler, not the silicon.
Intel's compiler detected when it was compiling a benchmark and emitted optimized code. But there was no cost to non-benchmark code other than a few milliseconds of delay.
Most people aren't affected because they don't use Intel's compiler. They use Microsoft compilers, GCC, or Clang.
The lesson here is that you should never trust a benchmark from an interested party. Run your own benchmarks or get the results from someone you trust.
The best benchmark is to run a system on your actual workload.
Re: (Score:1)
INTEL isn't accused of submitting a modified processor or altering the benchmark software, they are accused of setting the compiler to make the best use of capabilities the processor has already via optimizations.
No.... They're accused of tweaking the compiler for the sole purpose of inflating the benchmark. This was a performance boost that nobody was going to see. This wasn't some improved code they created to solve a bottleneck or execute a few clocks faster.... The only damn thing their tweaks did was cause the benchmark to run faster.
That breaks the rules. You don't get to claim speeds that nobody or nothing is going to see because they cannot be achieved in the real word under any circumstances.
That's
Re: (Score:2)
If you count each of those 2600 inflations as "fraud," you'd have to count every advertisement ever, as fraud.
board metting at intel (Score:2)
board member one: How are we going to beat AMD?
board member two: We are going to cheat and let our brand name make up for any ties
board member three: That right we are going to cheat and let our Bonus go high!
Re: (Score:2)
board member one: How are we going to beat AMD? board member two: We are going to cheat and let our brand name make up for any ties board member three: That right we are going to cheat and let our Bonus go high!
Optimized or not, if all it takes to feed millions in revenue directly into executive pockets is selling a “benchmark” report full of shit few would ever replicate, then I put the blame more on the suckers falling for executive lies. It’s not that they’re brilliant at sales. It’s that most consumers are that gullible.
Huh? (Score:2)
How aren't the test results relevant if the core designs are basically the same as in desktop CPUs? Only the number of cores in the SoC and the interface are different.
Re: (Score:2)
But let's not forget: SPEC numbers only tell you something about a specific computer model with specific hardware and a specific compiler and settings. That's basically why they invalidated the results for 2500 machines and not for a coup
Huh? (Score:2)
So let me get this straight...
SPEC designed a benchmark to represent a real-world workload, then INTEL optimized their compiler to maximize performance in that benchmark, now SPEC is saying that by optimizing for their (real world simulation) benchmark, users of INTEL processors aren't going to see real world performance that matches the benchmark results?
Sounds like INTEL optimized for what SPEC considered real world workloads, now SPEC is saying their benchmarks don't actually predict real world performan
Re: (Score:2)
Sounds like INTEL optimized for what SPEC considered real world workloads
No, that's simply now how "representation" works. When something is representative of something else it doesn't mean that cheating is makes you case applicable to the other thing. Intel is not optimising for real world conditions. They are optimising specifically for something that is *not* the real world.
Re: (Score:1)
SPEC benchmarks are pretty close to real world. It's the entire raison d'etre of SPEC. If I want to know which CPU is best at eg. fluid dynamics, I go to SPEC and see what CPU is best for the price, the optimizations that were made, which programming languages and compilers were used to get a certain result etc. I don't go to SPEC to see an overall useless number like the PassMark scoring system.
Re: (Score:2)
OK, I'll bite (Score:2)
The benchmark is a representative example of a real calculation workload, not an exhaustive list of all workloads.
The compiler spitting out hand-tuned machine code when it recognizes the benchmark is somewhat, but not completely unlike the Diesel gate scandal of cheating on emissions tests.
Re: (Score:2)
But if a specific compiler optimization only gives a notable improvement for a benchmark with the default workload, but not with (most) others, there is something fishy going on.
Re: (Score:1)
That's not how code/benchmarks work. You compile the code, then give it the data, the compiler cannot predict when you compile a ray tracing Fortran or C program that you will then feed it a specific workload for benchmarking purposes.
Re: (Score:2)
Re: (Score:1)
Sure, but those are gaming benchmarks.
SPEC is a totally different beast. They give reference code for classic computer science problems in languages like C, Fortran etc. and then you can build your own code in HPC on that reference code. Hence why people and vendors use SPEC to benchmark systems and not just the CPU, because the same CPU but in a different build (eg. Dell vs SuperMicro, 1U, 2U etc) can have significant differences due to things like heat management in the chassis.
What Intel here did is opti
Re: (Score:2)
Re: (Score:2)
No. Imagine a test track for an autonomous vehicle. It is a standard track meant to be representative of city traffic but it plays out the same way every time. So instead of a system that truly drives the car autonomously, you simply build a clockwork that always operates the car the same way with no awareness of the surroundings.
You'll get a 100% in the test and fail miserably in the real world.
Similarly, Intel used a compiler rigged to do especially well on the benchmark and only the benchmark.
Re: (Score:3)
And? So what. You got sold something covered under warranty due to a manufacturer defect. You didn't get lied to, and you were entitled to a replacement which AMD have honoured without issue. Things randomly break, it's a fact of life. It's why the concept of warranty exists in the first place.
Comparing that with cheating a benchmark suite is not the same thing. Your whataboutism is lame.
Re: (Score:2)
Is anyone surprised? (Score:5, Informative)
Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas. AMD has beat them handily since the launch of Ryzen in performance, efficiency, and sometimes cost. During that time, they lost Apple as a major customer because Apple would not wait year after year for chips that were not any better than the previous generation. ARM based CPUs are the defacto CPUs in smartphones and tablets.
Incidentally there shades of this cheating when Intel unveiled their "Go PC" campaign. After Apple launched their ARM based M1 computers, Intel thought it would be a good idea to go after a former customer. Some people called out some of their points as misleading or dishonest. For example, benchmarking Intel CPUs vs Apple M1 CPUs by using beta software on the Mac vs released software on Intel. Or comparing performance and power efficiency between Intel and Apple processors but obscuring that they cherry picked different Intel processors for different tests vs a single M1 processor. Or claiming that the "performance" score of the Mac on some games as zero that were not available for Mac M1.
Re: (Score:2)
Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas.
Intel is the "Boeing" of semiconductors.
Re: (Score:2)
Yep, pretty much. Gigantic egos, fundamentally deficient skills. And a lot of "useful idiot" fanbois.
Re: (Score:2)
Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas. AMD has beat them handily since the launch of Ryzen in performance, efficiency, and sometimes cost. During that time, they lost Apple as a major customer because Apple would not wait year after year for chips that were not any better than the previous generation. ARM based CPUs are the defacto CPUs in smartphones and tablets.
Apple were a rounding error. Sorry fanboys but no-one's missed you, it's the same as when Apple left PPC for Intel, IBM had so many other customers (all 3 of the console manufacturers) that they couldn't get Apple out the door fast enough.
You're right about AMD though as they've been making inroads into the two markets Intel dominated, laptops and servers. AMD has been eating Intel's lunch on the desktop for years, for most of the time since the Athlon64 was released but laptops eclipsed desktop sales ye
Re: (Score:2)
Apple were a rounding error. Sorry fanboys but no-one's missed you, it's the same as when Apple left PPC for Intel, IBM had so many other customers (all 3 of the console manufacturers) that they couldn't get Apple out the door fast enough.
If Apple was a rounding error, why was Intel so butthurt that they left? Apple may not have bought as many CPUs as Dell but behind the scenes Apple was contributing to the Intel. For example, Apple worked with Intel on Thunderbolt 1 including their recommendation of using mini DSP as a connector. Intel had proposed Light Peak years earlier but working with Apple they released a specification that laptop makers could use. Apple was always pushing the edge of lightness and thinness that influenced Intel's des
If you want to claim this is a big deal... (Score:5, Informative)
... then you have to be able to explain how these specific benchmark values influence your personal purchasing decisions.
After investigating, SPEC found that Intel had used compilers that were, quote, "performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability."
What the hell do the 523.xalancbmk and 623.xalancbmk benchmarks measure?
https://www.spec.org/cpu2017/D... [spec.org]
https://www.spec.org/cpu2017/D... [spec.org]
Apparently they benchmark XML to HTML translations.
Re: (Score:3)
Sorry, one link was bad - here's the correct link:
https://www.spec.org/cpu2017/D... [spec.org]
Re: (Score:2)
Re: (Score:2)
Re:If you want to claim this is a big deal... (Score:5, Informative)
The first thing you are missing here, Intel had prior knowledge of the benchmark code, their competition didn't - so they optimized the code using that prior knowledge, aka cheated.
The second thing you are missing here is that the CPU's destined for data-centers is a +300 billion dollar market, if Intel can cheat to increase their market-share with 0.5%, for example, it translates to a revenue-increase around a billion dollars.
That's why it's a big deal, and all this is on par for Intel since they have a long history of sleaziness when it comes to benchmarks, especially when the competition is taking market-share from them.
What do the benchmarks really measure? (Score:3)
What is the purpose of the benchmark tests? Do they validate raw processor performance, or do they validate performance in a software task-oriented environment?
Whatever anyone did in this story, it was not a hardware (tweak from what I can see reading the articles and other links).
If Intel programmers could wrangle better performance out of a testing regime by writing a better compiler to produce more algorithmically compact and efficient machine code, then doesn't that mean that there is room for improvement in how existing compilers are written? What would happen if the benchmarks were coded directly in assembler? If the benchmark tests then ran better or faster, doesn't this just mean that the existing non-Intel compilers used by the testing agency are un-optimized?
I can see where Intel or any company might use obfuscated results and numbers to their advantage, but I don't see how this impugns the Intel processors per se.
Re: (Score:3)
What is the purpose of the benchmark tests? Do they validate raw processor performance, or do they validate performance in a software task-oriented environment?
Whatever anyone did in this story, it was not a hardware (tweak from what I can see reading the articles and other links).
If Intel programmers could wrangle better performance out of a testing regime by writing a better compiler to produce more algorithmically compact and efficient machine code, then doesn't that mean that there is room for improvement in how existing compilers are written? What would happen if the benchmarks were coded directly in assembler? If the benchmark tests then ran better or faster, doesn't this just mean that the existing non-Intel compilers used by the testing agency are un-optimized?
I can see where Intel or any company might use obfuscated results and numbers to their advantage, but I don't see how this impugns the Intel processors per se.
FTA:
SPEC has ruled that the compiler used for this result was performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability.
Some optimization is about finding a better route from A to B.
Some optimization is about the fact that people are usually going from A to B, and very rarely C, so you can give them a shortcut from A to B at the cost of
Re: (Score:2)
Fine explanation - thank you very much.
In layman's terms, it's called cheating (Score:2)
In layman's terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren't indicative of how end users could expect to see performance in the real world. Intel's custom compiler might have been inflating the relevant results of the SPEC test by up to 9%...
No, in layman's terms, it is called cheating, plain and simple.
This is no different from a teacher, knowing what questions were in the coming exam, use those same questions as "examples" when teaching his/her students in class. Doing that would called cheating just like what Intel did.
Still want to argue? Imagine it were some Chinese chip company doing this instead of Intel, would you still continue to defend this practice?
Re:In layman's terms, it's called cheating (Score:4, Funny)
When Volkswagen did the same (cheated the emission benchmark by previous knowledge of the benchmark algorithm), they got hit with billions in fines. https://en.wikipedia.org/wiki/... [wikipedia.org]