Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Microsoft Internet Explorer Mozilla

Internet Explorer 9 Caught Cheating In SunSpider 360

dkd903 writes "A Mozilla engineer has uncovered something embarrassing for Microsoft – Internet Explorer is cheating in the SunSpider Benchmark. The SunSpider, although developed by Apple, has nowadays become a very popular choice of benchmark for the JavaScript engines of browsers."
This discussion has been archived. No new comments can be posted.

Internet Explorer 9 Caught Cheating In SunSpider

Comments Filter:
  • I would think Microsoft would be used to embarassing by now..

    • Re: (Score:2, Insightful)

      by Pojut ( 1027544 )

      They're kinda like the rich fat cat who constantly puts his foot in his mouth. He knows he should shut up, but then again why should he care...he's rich, bitch!

    • Embarrassing the article got slashdotted. Try the web site on port 8090.

      http://digitizor.com.nyud.net:8090/2010/11/17/internet-explorer-9-caught-cheating-in-sunspider-benchmark/ [nyud.net]

      Also embarrassing that you spelt "embarrassing" incorrectly. ;)

    • Re:Embarassing? (Score:5, Informative)

      by Mushdot ( 943219 ) on Wednesday November 17, 2010 @09:11AM (#34253714) Homepage

      Another misleading tabloid headline from Taco et al.

      Short story: Someone notices a perhaps too-fast result for a particular benchmark test with IE 9 and modifies the benchmark code which then throws IE 9 performance the other way. One *possible* conclusion is that MS have done some sort of hardcoding/optimisation for this test, which has been thrown out by the modifications.

      • Re:Embarassing? (Score:4, Insightful)

        by BLKMGK ( 34057 ) <morejunk4me&hotmail,com> on Wednesday November 17, 2010 @09:22AM (#34253846) Homepage Journal

        Thanks for someone pointing this out. I mean really, if they were going to throw this test why would they throw it quite this much? And is this the ONLY portion of this test that seems to act this way? If so then why in the world would they throw only this portion and why this much? The original result was uber fast, the result on the modified test pretty slow - if they were going to try and hide something why make it uber fast and not just slightly better?

        Something is weird, possibly hinky, but to outright declare cheating based just on this? Really? O_o

      • I distinctly remember this being heavily discussed in the last few IE9 benchmark stories as well, so its not new and its not necessarily cheating.
        • Re: (Score:3, Insightful)

          by Targon ( 17348 )

          The purpose of a benchmark is to try to show how performance will be in the real world. If a given application has been programmed to do very well in a given benchmark yet does not do as well with a real-world situation, then the benchmark results are flawed. The idea of coding an application just to have good benchmark numbers that would not be seen in the real world is considered cheating. In this case, we are talking about JavaScript speeds, so you would be very surprised if you believed that IE 9

      • Re:Embarassing? (Score:5, Insightful)

        by Anonymous Coward on Wednesday November 17, 2010 @09:47AM (#34254178)
        Optimisations done purely for use only on a benchmark to achieve far better results than normal is the exact definition of cheating. Benchmarks are meant to test the browser with some form of real performance measure and not how good the programmers are at making the browser pass that one test. If the thing is getting thrown off by some very simple instructions to the tune of 20 times longer then it is seriously broken. Optimization or not.

        It is like when ATI/Nvidia made their drivers do some funky shit on the benchmarks to make their products seem way better; This was also called cheating at the time.
        • Re:Embarassing? (Score:4, Insightful)

          by shutdown -p now ( 807394 ) on Wednesday November 17, 2010 @02:26PM (#34258666) Journal

          The benchmark in question can be considerably optimized by dead code elimination, since a computationally expensive function in there (one that loops computing stuff) does not have any observable side effects, and does not return any values - so it can be replaced with a no-op. It is a perfectly legitimate optimization technique, but the one which tends to trip up naively written benchmark suites because they assume that "for(int i=0; i < 1000000; ++i) {}" is going to be executed exactly as written.

          Thee was actually a similar case with artificial tests in the past - Haskell (GHC) scores on the Programming Language Shootout. Most tests there were also written as loops with some computations inside and no side-effects, on the presumption that compilers will leave the computations intact even though their results are never used. Well, one thing that GHC has always had is a particularly advanced dead code eliminator, and it compiled most of those tests down to what is essentially equivalent to "int main() { return 0; }" - with corresponding benchmark figures. Once they've changed the tests to print out the final values, this all went back to normal.

          In this case it's not quite that simple, because seemingly trivial changes to benchmark code - changes which do not change the semantics of the code in any way - trip off the dead code elimination analyzer in IE9 JS engine. However, it is still an open question on whether it is deliberate, or due to bugs in the analyzer. One plausible explanation was that analyzer is written to deal with code which at least looks plausible, and neither of the suggested optimizer-breaking changes (inserting an extra statement consisting solely of "false;" in the middle of the function, or "return;" at the end of it) make any sense in that context. Any dead code elimination is necessarily pessimistic - i.e. it tries to guess if the code is unused, but if there are any doubts (e.g. it sees some construct that it doesn't recognize as safe) it has to assume otherwise.

          The only true way to test this is to do two things:

          1. Try to change the test in other ways and see if there are any significant diffs (such that they are not reasonably detectable as being the same as the original code) which will still keep the optimizer working.

          2. Write some new tests specifically to test dead code elimination. Basically just check if it happens on completely different test cases.

          By the way, the guy who found the discrepancy has opened a bug [microsoft.com] in MS bug tracker regarding it, in case you want to repro or track further replies.

      • Re: (Score:3, Insightful)

        by mwvdlee ( 775178 )

        The problem is that that is the most logical conclussion.

        The other possible conclussions are both more unrealistic and worse for MicroSoft.

        The benchmark modifications were trivial and non-functional; they shouldn't have made that big of a difference.

      • Re:Embarassing? (Score:5, Informative)

        by chrb ( 1083577 ) on Wednesday November 17, 2010 @10:50AM (#34254892)

        Did you look at the diffs [mozilla.com]? The addition of the "true;" operation should make absolutely no difference to the output code. It's a NOP. The fact that it makes a difference indicates that either something fishy is going on, or there is a bug in the compiler that fails to recognise "true;" or "return (at end of function)" as being deadcode to optimise away, and yet the compiler can apparently otherwise recognise the entire function as deadcode. Just to be clear, we are talking about a compiler that can apparently completely optimise away this whole function:

        function cordicsincos() {
                var X;
                var Y;
                var TargetAngle;
                var CurrAngle;
                var Step;

                X = FIXED(AG_CONST); /* AG_CONST * cos(0) */
                Y = 0; /* AG_CONST * sin(0) */

                TargetAngle = FIXED(28.027);
                CurrAngle = 0;
                for (Step = 0; Step CurrAngle) {
                                NewX = X - (Y >> Step);
                                Y = (X >> Step) + Y;
                                X = NewX;
                                CurrAngle += Angles[Step];
                        } else {
                                NewX = X + (Y >> Step);
                                Y = -(X >> Step) + Y;
                                X = NewX;
                                CurrAngle -= Angles[Step];
                        }
                }
        }

        but fails to optimise away the code when a single "true;" instruction is added, or when "return" is added to the end of the function. Maybe it is just a bug, but it certainly is an odd one.

        This shows the dangers of synthetic non-realistic benchmarks. I was amused to read Microsoft's comments on SunSpider: "The WebKit SunSpider tests exercise less than 10% of the API’s available from JavaScript and many of the tests loop through the same code thousands of times. This approach is not representative of real world scenarios and favors some JavaScript engine architectures over others." Indeed.

        btw the Hacker News [ycombinator.com] discussion is more informative.

  • by Anonymous Coward
    No, none what-so-ever.

    Welcome to your daily two minutes hate, Slashdot.
    • *although it was developed by Apple*

      Doesn't this only make sense if there is reason to suspect Apple doesn't develop things well?

      • No, it's just stating that it has basically become a standard benchmark despite Apple not being a standards organization.
        • But then, that's how most things become standards

          A, who is not a standards organisation develops it.
          B, who is also not a standards organisation uses it.
          If sufficient numbers of Bs use it, it becomes a de-facto standard
          Sometimes C, who is a standards organisation says it's a standard.
          Then it becomes a de-jure standard.

  • Benchmarks (Score:5, Insightful)

    by Mongoose Disciple ( 722373 ) on Wednesday November 17, 2010 @08:49AM (#34253490)

    This is the nature of benchmarks... whenever people start caring about them enough, software/hardware designers optimize for the benchmark.

    Next we're going to be shocked that 8th grade history students try to memorize the material they think will be on their test rather than seeking a deep and insightful mastery of the subject and its modern societal implications.

    • Re: (Score:3, Insightful)

      This is the nature of benchmarks... whenever people start caring about them enough, software/hardware designers optimize for the benchmark.

      Except that the article writer tries to claim that that couldn't possibly be the case and thus claims that Microsoft is "cheating" instead. Basically this is an invented controversy.

    • by eldavojohn ( 898314 ) * <eldavojohn.gmail@com> on Wednesday November 17, 2010 @08:53AM (#34253544) Journal

      Next we're going to be shocked that 8th grade history students try to memorize the material they think will be on their test rather than seeking a deep and insightful mastery of the subject and its modern societal implications.

      Some things to consider: 1) I'm not doing business with the 8th grader. Nor am I relying on his understanding and memorization of history to run Javascript that I write for clients. 2) You are giving Microsoft a pass by building an analogy between their javascript engine and an 8th grade history student.

      Just something to consider when you say we shouldn't be shocked by this.

      • 2) You are giving Microsoft a pass by building an analogy between their javascript engine and an 8th grade history student.

        Indeed. The student would make a better Javascript engine.

      • 1) I'm not doing business with the 8th grader. Nor am I relying on his understanding and memorization of history to run Javascript that I write for clients.

        No, but 5 years later you'll let him vote...

    • This is the nature of benchmarks... whenever people start caring about them enough, software/hardware designers optimize for the benchmark.

      It shows that Microsoft is more concerned about getting a good score on the benchmark than they are about providing a good customer experience.

      • It shows that Microsoft is more concerned about getting a good score on the benchmark than they are about providing a good customer experience.

        Could the same be said about the numerous bugs issued for Firefox about optimizing TraceMonkey's SunSpider performance?

        • No it couldn't. Firefox has for a long time lagged on pretty much all the tests, including that stupid ACID test. They lagged specifically because they were more focused on real improvements over faking it or optimizing for conditions that one is unlikely to encounter.

          Or, it could be that they're just incredibly incompetent at cheating. I suppose that's possible. But given the degree to which the real speed has improved with the 4.0b7, I think we can largely rule out that level of incompetence.
      • Re: (Score:2, Insightful)

        It shows that Microsoft is more concerned about getting a good score on the benchmark than they are about providing a good customer experience.

        For that to be true, you'll need to demonstrate that they put more effort into scoring well on the benchmark than they did in improving performance in general. I don't think you can.

        Improving performance in general is worth doing and I'm sure it's being done, but it's hard. Improving performance on a benchmark dramatically is often not that hard, and it's worth doing if it gets your product noticed.

        I'm sure all browser makers are doing the exact same thing on both counts -- anonymous Mozilla guy is just

    • by gl4ss ( 559668 )

      read the article. their js performance is quite suspect if their results are "too good to be true" when the benchmark is unmodified and then too bad to be true when it's very slightly modified. some more 3rd party testing should be done.. and actually it would be pretty easy to do.

      • by hey ( 83763 )

        Shows a problem with benchmarks in general. Too easy to game.

        • Shows a problem with benchmarks in general. Too easy to game.

          Benchmarks are great, for improving the performance of your code. Benchmarks are terrible, as soon as they start to get press and companies try to deceive users by gaming them. That's why it is important that we call out when they are caught so they get more bad press and maybe think twice about gaming the benchmark in the first place.

        • That's not a problem with benchmarks per se, that's a problem with the idiots that insist that benchmark performance is the same thing as good performance in general.

          It really depends how the benchmark is set up, certain things are known to be costly in terms of IO, RAM and processing time. And a benchmark which measures things like that and gives some meaningful indication where the time is being spent is definitely valuable.
    • Re:Benchmarks (Score:5, Informative)

      by TheRaven64 ( 641858 ) on Wednesday November 17, 2010 @09:24AM (#34253862) Journal

      There is a difference between optimising for a benchmark and cheating at a benchmark. Optimising for a benchmark means looking at the patterns that are in a benchmark and ensuring that these generate good code. This is generally beneficial, because a well-constructed benchmark is representative of the kind of code that people will run, so optimising for the benchmark means that common cases in real code will be optimised too. I do this, and I assume that most other compiler writers do the same. Cheating at a benchmark means spotting code in a benchmark and returning a special case.

      For example, if someone is running a recursive Fibonacci implementation as a benchmark, a valid optimisation would be noting that the function has no side effects and automatically memoising it. This would turn it into a linear time, rather than polynomial time, function, at the cost of increased memory usage. A cheating optimisation would be to recognise that it's the Fibonacci sequence benchmark and replaces it with one that's precalculated the return values. The cheat would be a lot faster, but it would be a special case for that specific benchmark and would have no impact on any other code - it's cheating because you're not really using the compiler at all, you're hand-cmpiling that specific case, which is an approach that doesn't scale.

      The Mozilla engineer is claiming that this is an example of cheating because trivial changes to the code (adding an explicit return; at the end, and adding a line saying true;) both make the benchmark much slower. I'm inclined to agree. The true; line is a bit difficult - an optimiser should be stripping that out, but it's possible that it's generating an on-stack reference to the true singleton, which might mess up some data alignment. The explicit return is more obvious - that ought to be generating exactly the same AST as the version with an implicit return.

      That said, fair benchmarks are incredibly hard to write for modern computers. I've got a couple of benchmarks that show my Smalltalk compiler is significantly faster than GCC-compiled C. If you look at the instruction streams generated by the two, this shouldn't be the case, but due to some interaction with the cache the more complex code runs faster than the simpler code. Modify either the Smalltalk or C versions very slightly and this advantage vanishes and the results return to something more plausible. There are lots of optimisations that you can do with JavaScript that have a massive performance impact, but need some quite complex heuristics to decide where to apply them. A fairly simple change to a program can quite easily make it fall through the optimiser's pattern matching engine and run in the slow path.

  • by Tar-Alcarin ( 1325441 ) on Wednesday November 17, 2010 @08:52AM (#34253522)

    what can be attributed to stupidity.

    1) Microsoft cheated by optimizing Internet Explorer 9 solely to ace the SunSpider Bechmark. To me, this seems like the best explanation.
    2)Microsoft engineers working on Internet Explorer 9 could have been using the SunSpider Benchmark and unintentionally over-optimized the JavaScript engine for the SunSpider Benchmark. This seems very unlikely to me.

    I see no reason why explanation number one is more likely than explanation number two.

    • by Spad ( 470073 )

      Only because you could argue that Microsoft has a vested interest in doing #1 - I guess it depends on how malicious you think Microsoft is :)

    • Re: (Score:3, Insightful)

      by The MAZZTer ( 911996 )
      Let's check out some other benchmarks/parts of Sunspider IE9 does good on and tweak them similarly to see if the performance suddenly suffers.
    • by Inda ( 580031 )
      Accuse someone of something when phishing for information. Watch the reactions, watch people back-peddling, listen for lies, listen for an overly reactive explaination, watch for the ultra-defensive, nose scratching, bullshitters, beads of sweat...

      Does no on else use this trick in life? I doubt I've invented it; I'm sure it's taught somewhere and there's probably a fancy name for it.

      Accuse Microsoft of cheating and see what information flows back.
    • Re: (Score:3, Insightful)

      by LordKronos ( 470910 )

      I see no reason why explanation number one is more likely than explanation number two.

      I do. Given the nature of the changes that were used to uncover this, to me (as a programmer) it seems very unlikely that such over-optimization could happen in such a way that it would degrade so severely with those changes. Here is what was changed (look at the 2 diff files linked near the bottom of the article):

      1) A "true;" statement was added into the code. It was not an assignment or a function call, or anything complex. Just a simple true statement. Depending on the level of optimization by the interp

    • In my opinion, a useful benchmark reflects standard usage patterns. Therefore, optimizing for the benchmark can only benefit the end user. If shuffling the "return" and "True" is just as valid an option, perhaps the benchmark should include both approaches.

      Maybe I'm a bit naive, but when I change my code, I expect the results to change as well.

    • The keyword in number two is "unintentionally". Happy accidents do happen, but they rarely go unrecognized and once recognized they should be reconciled. If recognized but not reconciled then you can't say it was unintentional and therefore I have to agree that number two seems unlikely.
  • Benchmarks are very nice and all, but in the end, users using different browsers for real should decide which *feels* faster or better (which isn't the same as being faster or better). If real-world users can't feel the difference, then benchmarks are just there for masturbation value, and quite frankly, on reasonably modern hardware, I've never felt any true difference in rendering speed between the various "big" browsers out there.

    I reckon the only thing that truly matters is the speed at which a browser

  • by davev2.0 ( 1873518 ) on Wednesday November 17, 2010 @08:57AM (#34253584)

    There are three possible explanation for this weird result from Internet Explorer:

    Microsoft cheated by optimizing Internet Explorer 9 solely to ace the SunSpider Bechmark. To me, this seems like the best explanation.
    Microsoft engineers working on Internet Explorer 9 could have been using the SunSpider Benchmark and unintentionally over-optimized the JavaScript engine for the SunSpider Benchmark. This seems very unlikely to me.
    A third option (suggested in Hacker News) might be that this is an actual bug and adding these trivial codes disaligns cache tables and such throwing off the performance entirely. If this is the reason, it raises a serious question about the robustness of the engine.

    Everything in italics is unsupported opinion by the author, yet is treated as fact in the summary and title by CmdrTaco and Slashdot. Perhaps if Slashdot would stick to actual news sites (you know NEWS for nerds and all that), this would be a balanced report with a good amount of information. Instead, it is just another Slashdot supported hit piece against MicroSoft.

    • by gl4ss ( 559668 )

      it's just speculation on the possible reasons why it happened. should he have waited for the first replies to his post to post the replies to those obvious replies? of course not.

      now if you want, you could run the benches yourself - and this is what the blogger wants you to do.

      • by davev2.0 ( 1873518 ) on Wednesday November 17, 2010 @09:41AM (#34254088)
        So, instead the blogger should declare that MS cheated at the benchmarks with nothing more than his results for which he admits that there are at least three plausible explanations?

        And, then Taco should treat the author's biased opinion as fact? Remember, the title of this post is "Internet Explorer 9 Caught Cheating in SunSpider."

        I don't think so.

        And, where is the response from MS? Did anyone ask MS, or did someone find this and go "MS is CHEATING!!11!!one!" without actually investigating or even asking MS? Because, it really looks like the latter, which would make this just more MS bashing blogspam.
  • No proof? (Score:5, Informative)

    by 1000101 ( 584896 ) on Wednesday November 17, 2010 @08:58AM (#34253596)
    FTFA:

    There are three possible explanation for this weird result from Internet Explorer:

    1. Microsoft cheated by optimizing Internet Explorer 9 solely to ace the SunSpider Bechmark. To me, this seems like the best explanation.
    2. Microsoft engineers working on Internet Explorer 9 could have been using the SunSpider Benchmark and unintentionally over-optimized the JavaScript engine for the SunSpider Benchmark. This seems very unlikely to me.
    3. A third option (suggested in Hacker News) might be that this is an actual bug and adding these trivial codes disaligns cache tables and such throwing off the performance entirely. If this is the reason, it raises a serious question about the robustness of the engine.


    I'm not saying if what they have done is right or wrong, but this is a sensationalist headline that offers two other "less evil" alternatives to the outcome.
    • Headlines are supposed to be succinct summaries and that is enforced by the character limit here. Maybe a better headline would be "Internet Explorer 9 Probably Cheating On Sunspider, But Maybe Just Horribly Written In Ways That Make SunSpider Apply Poorly". Of course that is too long for the title.

      The important take away is that a particular SunSpider test is not a valid test for IE 9's performance in that category and that IE 9 will do much, much worse in many real world scenarios. The likelihood is that

  • by js3 ( 319268 ) on Wednesday November 17, 2010 @08:59AM (#34253606)

    Meh I think claiming they are cheating with no evidence seems a little too out there. I've never seen MS brag about how fast their browser is on this particular benchmark, and frankly seems more like a bug than a cheat.

    • by king neckbeard ( 1801738 ) on Wednesday November 17, 2010 @09:09AM (#34253696)
      They have shown their Sunspider results quite a few times on http://blogs.msdn.com/b/ie/ [msdn.com]
      • While MS-IE have disclosed a lot of information lately on their blogs, if they're going to discuss Sunspider results (as they did on 28 October with the IE9PP6 tests [msdn.com]) then use of sleight of hand to sex them up is fair game for criticism.

      • by BLKMGK ( 34057 )

        But did modifying this one test to near impossible speed make that much of a difference? It was obviously anomalous right? What about the other test results? If tweaked do things get screwy and if so what about the other browsers? So far I'm not convinced although certainly it's posisble. Frankly if who they are trying to woo to their browser is Joe Average user then this benchmark, commented on in a blog no Joe Average likely reads, seems silly IMO.

    • I don't disagree with you, but I'd love your "bug" idea to become mainstream...

      "Critical bug in Internet Explorer boosts performance by 10%"

      "Microsoft keen to develop more bugs in the hope they boost performance in other products"

      "Mozilla unavailable for comment on how come they persist in reducing bugs when everyone else seems to want more of them"

  • by digitaldc ( 879047 ) * on Wednesday November 17, 2010 @09:05AM (#34253670)
    The article clearly states:
    There are three possible explanation for this weird result from Internet Explorer:
    1. Microsoft cheated by optimizing Internet Explorer 9 solely to ace the SunSpider Bechmark. To me, this seems like the best explanation.
    2. Microsoft engineers working on Internet Explorer 9 could have been using the SunSpider Benchmark and unintentionally over-optimized the JavaScript engine for the SunSpider Benchmark. This seems very unlikely to me.
    3. A third option (suggested in Hacker News) might be that this is an actual bug and adding these trivial codes disaligns cache tables and such throwing off the performance entirely. If this is the reason, it raises a serious question about the robustness of the engine.

    So, what proof do we have that Microsoft actually cheated?
  • Does this part of the benchmark produce a result or output, and if so is it correct?

    And if it doesn't produce any output or a result that's checked, there is plenty of scope for innocent explanations. It could be a bug that doesn't arise when the extra statements are added. Or it could be that part of the code is being optimised away (because the result isn't used) and the analysis isn't clever enough to handle it when the extra statements are present.

  • by VGPowerlord ( 621254 ) on Wednesday November 17, 2010 @09:54AM (#34254266)

    If Microsoft is cheating, why wouldn't they cheat a bit better? Of the five browsers, including betas, IE is second from last [mozilla.com]. Last place is, of course, Firefox, even with the new JS engine. Oh, and that stats image? Taken from the same blog post [mozilla.com] that originally discovered the Sunspider IE9 issue over a month ago.

    Rob Sayre, the Mozilla Engineer who discovered this, filed a bug [mozilla.com] with Microsoft to get them to look at this issue. However, he didn't file said bug until today, which is likely why this is in the news now rather than a month ago.

  • by TimSneath ( 139039 ) on Wednesday November 17, 2010 @04:37PM (#34260932)

    Hi, we've posted an official response and explanation at the bottom of this post:
    http://blogs.msdn.com/b/ie/archive/2010/11/17/html5-and-real-world-site-performance-seventh-ie9-platform-preview-available-for-developers.aspx [msdn.com]

    Bottom line - we built an optimizing compiler into the Chakra JavaScript engine that eliminates dead code where it finds it.

    It's a fun conspiracy theory that we'd somehow 'game' SunSpider in this way, but it doesn't make sense that we'd go to all that risk to cheat on just one test out of 26.

    Best wishes, Tim Sneath | Microsoft

A sheet of paper is an ink-lined plane. -- Willard Espy, "An Almanac of Words at Play"

Working...