Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Bug Windows Chrome Microsoft Operating Systems Software Hardware

24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug (wordpress.com) 352

Longtime Slashdot reader ewhac writes: Bruce Dawson recently posted a deep-dive into an annoyance that Windows 10 was inflicting on him -- namely, every time he built Chrome, his extremely beefy 24-core (48-thread) rig would begin stuttering, with the mouse frequently becoming stuck for a little over one second. This would be unsurprising if all cores were pegged at 100%, but overall CPU usage was barely hitting 50%. So he started digging out the debugging tools and doing performance traces on Windows itself. He eventually discovered that the function NtGdiCloseProcess(), responsible for Windows process exit and teardown, appears to serialize through a single lock, each pass through taking about 200 microseconds each. So if you have a job that creates and destroys a lot of processes very quickly (like building a large application such as Chrome), you're going to get hit in the face with this. Moreover, the problem gets worse the more cores you have. The issue apparently doesn't exist in Windows 7. Microsoft has been informed of the issue and they are allegedly investigating.
This discussion has been archived. No new comments can be posted.

24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug

Comments Filter:
  • by Anonymous Coward on Tuesday July 11, 2017 @08:30PM (#54790397)

    Not 200S each, which is off by a factor of one million. But, hey.

  • by fustakrakich ( 1673220 ) on Tuesday July 11, 2017 @08:32PM (#54790407) Journal

    We just don't have priority...

    • Yeah, I spend way too much time watching the windows wheel spin around for no apparent reason other than the OS's inability to use more than one core.

  • I don't get it. (Score:5, Interesting)

    by fuzzyfuzzyfungus ( 1223518 ) on Tuesday July 11, 2017 @08:43PM (#54790457) Journal
    If there is an issue that keeps process termination and cleanup from being properly parallelized; I can understand why that might cause unexpectedly poor utilization of additional cores for computationally intensive tasks that also massacre lots of processes; but why would that cause the GUI to stop responding?

    Unless moving the cursor also depends on terminating a bunch of processes; and hangs until that task is finished, wouldn't the inefficiency imposed on the build process be expected to keep the GUI more responsive; by preventing it from occupying as much CPU time as it otherwise would?

    Am I just confused? Does keeping the desktop and cursor drawn actually involve lots of time sensitive process killing? Does this indeed not make sense?
    • Re:I don't get it. (Score:4, Informative)

      by Anonymous Coward on Tuesday July 11, 2017 @08:55PM (#54790507)

      The Windows GUI interface actually uses a separate process to update the mouse on the screen. Due to various historical reasons (compatibility with old applications, mostly), it was required to recycle this process every time the mouse moved, as the process could get a memory leak (which couldn't be fixed properly, in order to preserve compatibility with the aforementioned applications). Therefore, every time the coordinates of the mouse change, the process has to be killed and replaced, therefore putting it through the same lock that this build process is hogging. Combine that with the 200 second delay to get through the lock, and the responsiveness is easily explained.

      It's worth it to keep compatibility with the "After Dark" flying toasters screensaver, though.

      • It's not even wrong (to quote a famous scientist about a really ill-formed idea).

        At this point with multi-core computers, the GUI and mouse etc should be on a completely separate core that is managed somewhat separately than all of the others.

      • Re: (Score:2, Insightful)

        by epine ( 68316 )

        Combine that with the 200 second delay to get through the lock, and the responsiveness is easily explained.

        I didn't believe that number for the first microsecond. Where was your brain? Stuck on "easily explained"?

        From the original:

        And, if each of these readying events happened after the thread had held the lock for just 200 microseconds then the 5,768 readying events would be enough to account for the 1.125 second hang.

        Even Microsoft would notice 24 cores sharing a 200 s group hug.

        If the question had been

        • by dissy ( 172727 )

          The slashdot summary originally said "200S" instead of "about 200 microseconds"
          It was silently changed without an update message saying so.

          I too was very confused when I first read it, both that a capitol S and no space isn't any standard notation I know of, and that the only interpretation was 200 seconds which made no sense at all.

      • by gweihir ( 88907 )

        What an incredibly bad design!

        • by Megol ( 3135005 )

          Yes but I guess your mother and father did their best trying to make you a smart boy(?).

      • +1 Funny

        http://www.tothepc.com/pic/fak... [tothepc.com]

    • by gweihir ( 88907 )

      You are not confused. A sane kernel does not have this issue. A sane GUI stays responsive even with this issue. Unfortunately, Win10 does not have either.

      • You've never had the UI go unresponsive in X11 under heavy load?

        • You've never had the UI go unresponsive in X11 under heavy load?

          FTFA it appears to go unresponsive without a heavy load - the cores are unloaded. So, no, I've never had an unloaded Linux/BSD machine get unresponsive with X11 .

          • by tlhIngan ( 30335 )

            FTFA it appears to go unresponsive without a heavy load - the cores are unloaded. So, no, I've never had an unloaded Linux/BSD machine get unresponsive with X11 .

            I've had Linux go unresponsive without a heavy load - back in the bad old days of a decade and a half ago, untarring the Linux kernel itself would stall out the machine. The CPU was busy, but not so much - it was pure I/O locking up the kernel. So for the 5 minutes or so it took for the kernel to untar back in those days (this was when you didn't g

      • Let's be honest though, only the old commercial unix machines could do this in the 90s (IRIX, Solaris are two good examples). I don't use a GUI on my linux machines, so I don't know how well written the GUI is there. Now, neither Apple nor MS is capable of making a responsive GUI.

      • by ( 4475953 )

        No mainstream operating system has responsive GUIs under heavy load, especially not under heavy i/o load. GNU/Linux goes down very rapidly, Android is sluggish out of the box, and OSX have their spinning beachball of death. They are designed incorrectly.

        As a test, you may surf to this [haschek.at] page to see how your system handles an embedded zip bomb. (Warning: Don't click this link unless you're willing to kill your browser session or even hard-reset your machine.)

    • Short answer: context switching.

      I'm sure others can pipe in here with more detailed explanations because I am not that familiar with the Windows kernel, but the basic gist of the problem is that calls to this function (NtGdiCloseProcess) cause it to acquire a global kernel lock which blocks thread execution...for ~200 microseconds, usually. The problem in this scenario is that around 5,768 calls to this function are being serialized onto the Ready Thread call stack which, combined, are delaying all other pr

      • In summary, the delays in responsiveness and interactivity are being caused by context switches, which is the usual culprit. It has nothing to do with the speed and number of CPUs because it is not a CPU resource problem. It is purely a kernel scheduling issue.

        It has a bit to do with the CPUs: The reporter had a machine with 24 cores that actually managed to create and destroy 5,000 processes per second. My 4 core machine would have only created and destroyed less than 1,000 processes per second, so no problem.

    • by Dog-Cow ( 21281 )

      The serialization happens in the kernel, which means that hardware events are not being processed and transmitted to the mouse driver, which in turn isn't informing the process responsible for drawing the cursor.

  • Marketing - "How do we monetize this...."

    Engineers - "You mean after we fix it?"

    Marketing just begins laughing - "Only if it get more money then leaving it in and marketing it as a feature"

  • I remember BeOS (Score:5, Insightful)

    by rsilvergun ( 571051 ) on Tuesday July 11, 2017 @09:00PM (#54790527)
    being the only OS I've ever seen in my life move a window around screen w/o tearing. Yeah, it doesn't make much difference, but you'd think in 2017 my quad core CPU and 8 gigs of ram could do what a 400 mghz AMD K6 did in 1996 with 512 mb ram.
    • Amiga.

      • Re: I remember BeOS (Score:4, Interesting)

        by Miamicanes ( 730264 ) on Tuesday July 11, 2017 @10:37PM (#54790953)

        The Amiga could scroll a "screen" vertically with zero tearing (and very little effort), because it was just updating a memory pointer during a horizontal retrace interval. Ditto, for updating the mouse pointer (it was just a sprite). Both worked even when the app (or OS) died because it was serviced semi-independently of the OS as a whole during the vertical retrace interrupt.

        Intuition-rendered windows were another matter entirely... I think window gadgets & outlines were rendered in the vertical retrace interrupt, but contents & outside-erasures depended on the app and/or os running properly.

        Likewise, the mouse pointer was only robust when it was a 320x200/400 sprite... apps like DeluxePaint & WordPerfect (which needed more precision on a 640x200/400 screen than sprites could provide) that used XOR'ed software-rendered overlays could still crash (though if you clicked outside of the crashed app's window, the sprite-rendered pointer returned)

        AmigaDOS was groundbreaking, but it still had some serious issues of its own. Like an event queue that used single-bit flags, allowing users to click BOTH 'ok' AND 'cancel' if the app stalled/crashed with a dialog on-screen.

        • The Amiga could scroll a "screen" vertically with zero tearing (and very little effort), because it was just updating a memory pointer during a horizontal retrace interval.

          Yes but no. If you had used CygnusEd [wikipedia.org] (a text editor), you'd knew what's it like to have frame-perfect "kinetic" smooth scrolling even under CPU load. And scrolling text in a window is a little bit more complex than just updating a pointer.

        • by AmiMoJo ( 196126 )

          Until Windows Vista sprites were used for the mouse pointer on PCs too, including in Windows. VGA cards of the day supported hardware acceleration in the form of a single sprite used for the mouse pointer. Most Amiga graphics cards, which used the same chips, also supported that single sprite for the mouse but the Picasso96 driver did also support a "soft pointer".

          Having recently booted up my old Amiga system, one thing that struck me was that everything freezes when you open the drop-down menus. I had a fi

    • Yeah. You should try an Amiga....
    • I don't have window tearing issues on this laptop I'm using.
      Windows 8.1. Even with two of the monitors connected via a docking station connected to a single USB 3 port.

    • Re:I remember BeOS (Score:5, Interesting)

      by adolf ( 21054 ) <flodadolf@gmail.com> on Tuesday July 11, 2017 @10:08PM (#54790827) Journal

      I was accomplishing this on 486DX2 hardware using OS/2 in ~1994, and by 1995 on a P120.

      Several years ago I stopped by a buddy's retail establishment. He was transitioning network to Ubuntu on more modern hardware (with OS/2 in a VM), but still had an old and crusty OS/2 machine (probably a K6-2, but maybe a DX4) on the bench by the back door.

      This was the last time I ever saw such a thing in the wild.

      It was remarkably snappy doing normal, productive things -- scanning documents, browsing web pages, writing and viewing proposals -- just like it was when it was built. (And what window tearing?)

      Sometimes I think that the more abstraction layers we add, the slower things get. I think this coupled with programmer laziness (and/or pay based on lines of code), makes human-interactive things continue to behave just as slow as they have been for ~20 years.

      Do we even use accelerated 2D desktop graphics anymore, or are we completely back to the bad old days of every application drawing into a dumb framebuffer?

    • "I once preached peaceful coexistence with Windows. You may laugh at my expense - I deserve it."

      -- Jean-Louis Gassee, CEO Be, Inc.

    • Furthermore, in BeOS user input was king: no matter what shit the OS was doing, mouseclicks and keypresses trumped all else. Boy did BeOS run smooth (from my perspective). Sure, some files got copied 100 milliseconds later - nobody gives a fuck!

  • by Anonymous Coward on Tuesday July 11, 2017 @09:25PM (#54790617)

    2 Core for DRM
    2 Core for DRM Protection
    2 Core for Telemetry
    2 Core for Telemetry Protection
    2 Core for Genuine Advantage
    2 Core for Genuine Advantage Protection
    2 Cores for Driver Signing Validation
    2 Cores for Driver Signing Validation Protection
    2 Cores for Cortana
    2 Cores for Cortana Telemetry
    2 Cores for Cortana Telemetry Protection
    1 Core for the Base OS
    1 Core, at 25% for user processes

  • So it's not just me (Score:2, Interesting)

    by blindseer ( 891256 )

    In my basement office I have six computers I use regularly. Two are running MacOSX, one is running Ubuntu, two are Windows XP, and one is Windows 10. I just went around the room and checked uptimes. All of them were up for more than 3 months, except the Windows 10 computer. This one computer is supposed to be pretty fast compared to the rest but it gets bogged down where I feel compelled to reboot it. It also has the nasty habit of demanding to reboot when I'm trying to get work done, but that's a diff

    • by fatboy ( 6851 )

      So Windows 10 is the only one that is actually patched? <Ducks> :)

      • You have something of a point there about updates. I'll update the computers when it is convenient for me, like I'm forced to reboot due to a power outage. I just checked and it looks like an update is waiting for a reboot on one of my Macs. It seems only Windows 10 has "critical" monthly updates that require a reboot.

        Sure, XP probably has security problems where it should not be on the internet but due to their age I don't go web surfing with them a lot, and they are behind a firewall, so risks are mini

        • > Serious question, is there something I should be doing different to keep these XP machines from becoming a security problem?

          Don't have them be a member of your domain, Have unique passwords on them, don't access the internet from them, don't check email on them, and don't allow internet access on them.

          There is still the possibility that they'll be compromised in a lateral traversal attack, but this minimizes the probability that they'll be the initial attack vector.

          • Hah, you sound like you work in security. I swear, my security team would rather have me just sit and stare at a powered off machine than get any work done,
    • by superwiz ( 655733 ) on Wednesday July 12, 2017 @02:34AM (#54791789) Journal

      People may ask why I run Windows XP. It's because I have some old software that I like and it won't run on my newer Windows 10 computer.

      It's why people virtualize old PCs now. You run your old PC in a window.

      One of the Windows XP computers claims to have been on for over 15 years.

      32 bits of milliseconds is 49 days. Windows XP is a 32 bit system and a common way to measure how long it's been up is by issuing a system call which returns the number of milliseconds since the system startup.

  • This has happened when starting Chrome since first trying Chrome.

    Tried limiting Chrome to 3/6 cores and even then mouse goes jerky.

    It may not be the exact same cause, but it is the exact same symptom.

  • I wonder if Process Lasso would fix the problem. It solved a weird issue with Xplane11 for me.
  • by dbIII ( 701233 ) on Wednesday July 12, 2017 @01:08AM (#54791579)
    Win10 on something with so much grunt?
    Why turn an expensive system into a limited toy?
    If you need to run MS compatible stuff MS Win7 and various MS server systems are available.
  • Windows is rather hoggish on hardware resources.
  • Fork Bomb ! (Score:4, Insightful)

    by BESTouff ( 531293 ) on Wednesday July 12, 2017 @03:35AM (#54791979)
    Soo, that means that a simple DoS is possible via old-school fork bombs [wikipedia.org] ? In 2017 ? Well done Microsoft !
  • Windows has detected that your mouse has moved. Please restart your computer for the change to take effect.
  • Since when does moving the mouse involve closing a process? Oh wait! Microsoft Windows.

Two can Live as Cheaply as One for Half as Long. -- Howard Kandel

Working...