Performance of 64-bit vs. 32-bit Windows Dual Core 319
mikemuch writes "ExtremeTech's Loyd Case has done extensive testing on the same dual-core Athlon X2 4800+ system to explore performance differences between Windows XP Professional x64 and good ole Win32. The biggest hurdle is getting the right drivers. There are a few performance surprises, particularly in 3D games."
Biggest Benefit (Score:5, Funny)
Re:Biggest Benefit (Score:2)
The spyware companies all SAY they won't touch the second core now, and they may stick to it for a little while, but that's so that when you buy 2 cores, they have twice the zombie power per PC at your expense!!!
Depends. (Score:2)
Generally, there's no benefit to locking to (or against) a given core, but if you've one core designated for supervisor operations, you don't want applications to interfere with it. In many-way SMP clusters, you also don't want threads geographically spread too far, or you'll lose performance.
The easy way to fix the problem on OS' that don't support CPU-boun
Swapfiles (Score:2)
You ought to know that by now.
Yet another great reason to use Linux, where the only type of swapfile you can create is contiguous!
Re:Swapfiles (Score:2)
Re:Actually... (Score:2)
Re:Biggest Benefit (Score:4, Funny)
Plenty of time to wait for 64 bit apps. (Score:5, Interesting)
Desktop applications (even games) don't need the one thing that 64 bit computing really excels at: massive addressing space. A database server that is compiled to 64 bit code will have access to much more RAM, and thus have much better performance if RAM bound (which many DBs are). Meanwhile for POV-Ray the fastest result of 383 seconds was the 32bit application on 64 OS!
I think that it is safe to hold off on 64 bit for your personal desktop until a larger share of applications are compiled with 64 bit optimizations, but unlike the 16 -> 32 bit shift, I suspect the results will be underwhelming except for extremely memory consuming applications.
Re:Plenty of time to wait for 64 bit apps. (Score:5, Insightful)
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Anyone care to comment on MSVC's capabilities in the 64-bit arena?
--S
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Re:Plenty of time to wait for 64 bit apps. (Score:4, Informative)
Seeing as how MSVC and icc both conform to the x86-64 ABI, I would assume that both are equally capable (they're already damn near the same anyway) of utilizing the extra registers.
Re:Plenty of time to wait for 64 bit apps. (Score:2)
In other words, if I have eight 32-bit counter variables used heavily in a short block of code, The AMD64 version should be able to stick them all in regsters and use them. If it's using the IA32 register allocation algorithm though, they'd end up get
Re:Plenty of time to wait for 64 bit apps. (Score:3, Insightful)
The greatest speed increase will not come from the number of registers, per se, but rather the compiler's ability to explicitly access those registers. There is
Re:Plenty of time to wait for 64 bit apps. (Score:3, Informative)
Almost non-existent in 6, 7 (.Net 2002) and 7.1 (.Net 2003). We've switched some of our 64-bit test platforms at the office from using an SDK to using the beta of version 8 (VS 2005), which seems to be much better at targetting such a platform, but obviously it's unlikely you'll see the latest and greatest game today built with a compiler that's not due for release until 7 November...
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Is it possible that diminishing returns is kicking in on the register set size, or simply bad compilers (or use thereof)?
Re:Plenty of time to wait for 64 bit apps. (Score:4, Interesting)
Bad compilers or more likely they haven't hand optimised their inner loops.
Most high performance ia32 (Intel Architecture 32 bit) software has hand tuned assembler for the tight inner loops, but it takes time, experience and skill to create such assembler. Some discussions I've seen put recent gcc compiling generic C for amd64 at close to the performance of hand optimised assembler for ia32 on the same Athlon 64 (for tight inner loops).
There was an article about an assembler version of a cryptographic function that showed amd64 was capable of a *huge* performance increase over ia32, due to its increased register set.
However it can also come down to implementation quality. IIRC, benchmarks of early amd64 xeon chips showed that they performed worse than ia32 on the same chip for tests that athlon 64 shows a performance *boost* in its 64 bit mode.
Re:Plenty of time to wait for 64 bit apps. (Score:2)
More likely those tight inner loops are optimized for the current amount of registers, so if they need to access the same address several times per frame they do all the accesses in the row, so they don't need to move it to and from a register.
Re:Plenty of time to wait for 64 bit apps. (Score:4, Interesting)
Register renaming eliminates, or at least minimizes most of the problems with a small register set.
(Athlon64 has something like 72 integer registers and 122 90 bit FP registers (two of these are combined to make an XMM register for SSE vectors), almost all of which are availible in 32 bit mode).
The extra achitectual registers will help with moderate to long term storage (more than a few dozen clock cycles between uses) as the programmer will explicity specify the data remains in the register, where as with current shuffeling it's up to the CPU (and to some extent how the renamed registers are inteded to work) to determine if a write to cache is in order, or not.
And really with the longer storage times, you often have the flexibility to write out to L1 and schedual the load so that there's no penalty for the load. (ie issue the move back to the register the 3 clock cycles prior to when you need it that an L1 load usually takes).
The new registers probably won't make all that much difference in the end. But the again, nothing from the move to 64 bit will be a major impact for a while (at leat on the desktop).
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Very telling, in fact, I think you'll find that the 16-32bit shift only helped in applications that were extremely memory consuming for their time.
Odd thing is, it's about 2^16 times easier to exhaust a 16-bit memory address space
Re:Plenty of time to wait for 64 bit apps. (Score:2)
One really nice thing about the 32 -> 64 bit move is with a few exceptions, it looks like we won't have to do any paging and overlay
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Unlike most kids I had fun as a child. If by fun you mean, drilling holes in my head by dealing with braindead architectures.
Re:Plenty of time to wait for 64 bit apps. (Score:2)
CPUs request data in cache-line sizes, and not in word-sizes. Since the CPU is requesting in cache-line sizes, then having a bus wide enough to transport the cache-line will make things faster, anything else expects waste.
Standard phallacy (Score:5, Interesting)
BTW, I don't know about windoze, but in the Linux world going from 32 bits to 64 bits almost always seems to produce a performance gain of 10->20%. I personally tried a simulator I'm using with 64 bits (recompiled with gcc), and got a speedup of 12%.
Re:Standard phallacy (Score:5, Funny)
No (Score:3, Funny)
Re:Standard phallacy (Score:2)
A typical, "yes, but..." is in play here. Additional registers also mean that you have more saves/loads to do on function entry/exit, as well as during thread and process context switches.
Re:Standard phallacy (Score:2)
Sure, there is some overhead in all that translation, and having a broken instruction set d
some debunking here (Score:3, Informative)
While the x86 ISA leav
Re:Plenty of time to wait for 64 bit apps. (Score:3, Insightful)
It's a bit like the jump to 32bit. When all we had was 16bit software to test, the performance numbers tended to be equal. Once the software started showing up that was written for 64bit processing, we started seeing a major performan
Re:Plenty of time to wait for 64 bit apps. (Score:2)
Re:Plenty of time to wait for 64 bit apps. (Score:2)
64bit operations - there are a lot of places where you can make use of 64 bit vs 32 bit integers to reduce (halve) the number of instructions you execute. This assumes that your performance is instruction bounded rather than memory bounded, which is sometimes the case.
easier handling of 64bit color formats without conversions
massive memory - games will happily use as much memory as you have, peddling of course to some least common denominator. But if ev
Why doesn't the submitter do this? (Score:5, Informative)
Printable Version [extremetech.com]
-theGreater.
Re:Why doesn't the submitter do this? (Score:3, Informative)
After reading the benchmarks... (Score:5, Interesting)
The reason why x86-64 is a win is because there are more registers as well. This allows compilers to do a better job.
Re:After reading the benchmarks... (Score:2, Flamebait)
Of course they didn't. You think these "ExtremeTech" guys have the slightest clue what a register even is?
They tested a whole bunch of 32-bit apps on a 64-bit OS. They found that the 64-bit OS was slightly broken in a couple cases. That's about it.
Marketing Hype (Score:4, Funny)
Re:Marketing Hype (Score:5, Insightful)
In addition, the kernel can provide the full 4GB of virtual address to userspace apps without having to resort to performance-robbing kludges.
Once you switch to 64-bit userspace apps with their huge virtual address space you can also do things like mmap() your entire 500GB disk and manipulate it as though it's all in memory.
The end user might not notice a lot but it's much nicer for coders.
Re:Marketing Hype (Score:2)
Better solution than Linux? (Score:3, Interesting)
Is there a Linux equivalent available?
Having said all that I well remember getting MS to agree with me that there was a bug in their Win32 bolt on for Win16 that meant my software wouldn't run, but they then said they wouldn't fix it! No wonder I eventually switched to Linux... but that'sa whole other story.
Re:Better solution than Linux? (Score:2)
They have the source code, you got back, you recompile, you get at 64-bit binary.
Linux is 64-bit kernel and userland.
WINDOWS is 64-bit kernel and device drivers, with 64/32-bit libraries (often time both at the same time) and 32-bit binaries.
Re:Better solution than Linux? (Score:2)
Try doing that to openoffice. Every distro I've seen so far has only 32-bit office, and that alone drags in a huge train of 32-bit libraries, so you end up with almost a dual install of Gnome. Then there is 64-bit Firefox or Mozilla but it won't load 32-bit shared libraries, so if you want flash or acroread or Real plugins you need 32-bit Mozilla. By the time you're done half the libraries on your system are dual-arch.
Re:Better solution than Linux? (Score:2)
Re:Better solution than Linux? (Score:2)
To a degree. Sure, the compiler will create code that only runs on a 64 bit cpu if that's what it's supposed to do, and it may use the extra registers and the like to improve performance, but that doesn't really mean your code is really using 64 bits now.
(Granted, you didn't say it was, but I thought I'd be a bit more explicit.)
For example, if the application did a lot of integer math, and it was programmed to use
Re:Better solution than Linux? (Score:2)
If you're using gcc's "long long" extension to achieve 64-bitness in 32-bit environments, then recompiling
Re:Better solution than Linux? (Score:5, Informative)
Re:Better solution than Linux? (Score:2)
Re:Better solution than Linux? (Score:3, Funny)
Re:Better solution than Linux? (Score:2)
Re:Better solution than Linux? (Score:2)
In the mean time, I suffer with 2005.1 and constantly adding packages that should really have been included as dependencies in the gnome meta-package (hal and dbus, anyone?). But it *is* more stable than anything else around, IMHO.
And it is *WAY* faster on AMD64 (although goin
Re:mod down retarded zealot (Score:3, Informative)
Means that you download and compile your software packages in-situ instead of waiting for your distro of choice to offer packages precompiled for 64-bits systems. Like the parent poster said, a lot of Linux distro don't offer 64-bit binaries (as of yet), but in a source based distro this is a non issue. Zealot my ass.
Re:mod down retarded zealot (Score:2)
> a lot of Linux distro don't offer 64-bit binaries
> (as of yet), but in a source based distro this is
> a non issue. Zealot my ass.
The parent was specifically mentionning choosing a distribution that does support x86_64.
Of course it is a huge issue even for Gentoo to make sure all the packages on offer are 64-bit clean at the source level. There are plenty of ways for a C/C++ applic
Re:mod down retarded zealot (Score:2)
Which is perfectly fine. I'm n
Re:mod down retarded zealot (Score:2)
Some of us like to compile older stable tried-and-true versions from source so we can get the compile-time options set the way we want them.
You should see how much faster some gui apps are when you eliminate GNOME support. How much smaller your memory footprint is when you remember to strip debugging symbols (which, believe it or not, distributors
Re:mod down retarded zealot (Score:2)
Nice "l33t" speak, by the way.
Re:Better solution than Linux? NOT! (Score:2, Interesting)
Re:Better solution than Linux? (Score:4, Interesting)
With 32-bit apps, you need a 32-bit userland. That's the WoW64 bit; it's the 32-bit Windows on Windows cruft.
The main difference is that the linux stuff is organized differently. lib is your 32-bit libraries, while lib64 is your 64-bit stuff.
On Windows, the 'normal' location is where you would find the 64-bit libraries, and the WoW64 stuff is loaded from a separate directory.
Implementation details: http://msdn.microsoft.com/library/default.asp?url
Select Quote:
The WOW64 emulator runs in user mode, provides an interface between the 32-bit version of Ntdll.dll and the kernel of the processor, and it intercepts kernel calls. The emulator consists of the following DLLs:
Wow64.dll provides the core emulation infrastructure and the thunks for the Ntoskrnl.exe entry-point functions.
Wow64Win.dll provides thunks for the Win32k.sys entry-point functions.
Wow64Cpu.dll provides x86 instruction emulation on Itanium processors. It executes mode-switch instructions on the processor. This DLL is not necessary for x64 processors because they execute x86-32 instructions at full clock speed.
Along with the 64-bit version of Ntdll.dll, these are the only 64-bit binaries that can be loaded into a 32-bit process.
At startup, Wow64.dll loads the x86 version of Ntdll.dll and runs its initialization code, which loads all necessary 32-bit DLLs. Almost all 32-bit DLLs are unmodified copies of 32-bit Windows binaries. However, some of these DLLs are written to behave differently on WOW64 than they do on 32-bit Windows, usually because they share memory with 64-bit system components. All user mode address space above the 32-bit limits (2 GB for most applications, 4 GB for applications marked with the IMAGE_FILE_LARGE_ADDRESS_AWARE flag in the image header) is reserved by the system.
It's a different methodolgy, but most likely one that works as well. I appreciate the Linux one better-- the "normal" 32-bit stuff lives in the "normal" places-- that way, you don't *need* an emulation layer for the 64-bit unaware apps. Rather, 64-bit aware apps know to look in the correct location for the libraries (well, they are told by the OS, anyways). The Linux Way (TM) is slightly more backward compatible, me thinks. You'll *never* experience a problem with a 32-bit app on a 64-bit linux system, while there are some bugs in WoW64 which will probably never be fixed, rather, they'll be 'phased out', in the usual MS fashion (ignored until irrelevant).
Information on the Linux approach is here: http://www.hp.com/workstations/pws/linux/faq.html [hp.com]
Mainly, when recompiling your apps to be native 64-bit, you need to observe the following:
Simple. Just rebuild from scratch and the compiler will build 64-bit by default. This is true for most apps. However, some apps must be made 64-bit clean which means that the developers must review the code to get rid of any assumptions about 32-bitness, such pointer arithmetic issues. Some makefiles that explicitly declare paths such as
Re:Better solution than Linux? (Score:3, Informative)
Here's what Gentoo says about it, but there really isn't much detail:
Are 32bit applications supported? Is it through emulation or native?
Yes, 32bit applications are fully supported by the CPU, and are executed natively. A standard x86 OS can be installed on an amd64 processor, and can execute 32bit applications from a 64bit operating system if it is capable of mapping the 32bit syscalls to the kernel's 64bit interfaces (such as Linux is capable
And another change in marching orders: (Score:5, Funny)
Re:And another change in marching orders: (Score:2)
Good deal (Score:3, Funny)
Re:Good deal (Score:2)
Until you try to find 64bit drivers for your hardware...
Re:Good deal (Score:2)
Until you try to find 64bit drivers for your hardware...
Sadly, that jet engine you thought you heard was actually a well-executed (and badly Underrated) joke about Windows' memory requirements soaring past your head.
Mod parent down Redundant, mod grandparent up t3h Insightfulz0r, and have a nice day.
whew! (Score:3, Funny)
Architecture change (Score:4, Informative)
Re:Architecture change (Score:3, Informative)
The "coolest" thing that you can actually do in x86 is called memory value forwarding. (Or something like that).
Basically, you assign an internal register to cache the value of the memory access in an unarchitected register. This means that you can write some code like:
ROR [mem], 1
ADD [mem], 2
ROL [mem], 1
And it will go faster than:
MOV reg, [mem]
ROR reg, 1
ADD reg, 2
Re:Architecture change (Score:2)
XP 64bit sux go with Linux 64 (Score:2)
10 years of support (Score:3, Insightful)
I think that's one of the reasons why everything works so well with AMD64 today under Linux.
Microsoft recommends other operating systems (Score:5, Funny)
Not in these apps (Score:5, Informative)
Re:Not in these apps (Score:2)
MOV32 reg, [src1]
ADD32 reg, [src2]
MOV32 [dst], reg
MOV32 reg, [src1+1]
ADC32 reg, [src2+1]
MOV32 [dst+1], reg
Which works nice and all, but it's a ripple carry adder. Ripple carry is SLOW, because you have to wait on definitive resolution of the c
Good to see them drop the old cruft... not quite (Score:5, Informative)
Hooray, it's about time. Further in the same paragraph:
"Program Files" is reserved for 64-bit apps, while "Program Files (x86)" is for 32-bit software. This will sometimes result in strange installer behavior, as with Steam, Valve Software's game download application. Steam insisted that the parentheses in "Program Files (x86)" were illegal characters, and refused to install. You can either install Steam into a different folder (e.g., \games\valve) or change the folder name in the installer to "Progra~2\valve".
Some things never change...
Oh dear... parentheses! (Score:5, Insightful)
And there we go, the MAIN DIRECTORY for storing the program files uses them! Don't they ever learn? We had the same problems when dealing with Program[INSERT BIG UGLY SPACE HERE]Files. Couldn't PROGRAMS work? And look, it's 8 characters long!
Sheesh... (/rant)
Re:Oh dear... parentheses! (Score:5, Insightful)
We had the same problems when dealing with Program[INSERT BIG UGLY SPACE HERE]Files. Couldn't PROGRAMS work?
That was kind of the point - forcing programs to deal with spaces forced (some) app developers to deal with spaces generally.
Re:Oh dear... parentheses! (Score:3, Funny)
Re:Good to see them drop the old cruft... not quit (Score:2)
And it's hard to know just which apps have 16 bit code in them sometimes. I hope I'm not upgrading myself into a terrible avalanche of secondary upgrades or "learn-to-live-without-it"-itis.
I want what I've got, just *faster.* What's wrong with that?
Sad to say (Score:3, Interesting)
I'm taking a big risk by asking this.. (Score:3, Interesting)
Re:I'm taking a big risk by asking this.. (Score:2)
Re:I'm taking a big risk by asking this.. (Score:3, Interesting)
Re:I'm taking a big risk by asking this.. (Score:3, Insightful)
I'd expect much the same thing to happen here. Microsoft will wait for proliferation of AMD64/EM64T chips before they make a strong push to 64-bit Windows. I'm actually surprised they've released it at all, personally...
--S
Re:I'm taking a big risk by asking this.. (Score:5, Informative)
The first x86 processor to feature 32-bit registers and addressing was the i386 [wikipedia.org] released in 1985. Support for the new 32-bit features of the chip was added to Windows slowly starting with Windows 2.1 in 1987(also known as Windows/386) [wikipedia.org], and provided support for virtual memory and somewhat improved multitasking. The 32-bit features in Windows were optional right through to Windows 3.1 in 1992, infact Win3.1 runs fairly well on a 286/AT with 2MB of memory. Although Windows included some 32-bit code as early as 1987, it did not provide a 32-bit API for applications until the introduction of the Win32 API with Windows NT 3.1 [wikipedia.org] (1993) and Windows 95. There was also a free update released for Windows 3.1 called Win32s [uiuc.edu] that provided a subset of the Win32 API for Windows 3.1 amd Windows for Workgroups 3.11, though it provided rather poor compatibility; major features like comctl32.dll and a real registry were not provided.
The first version of Windows to offer a complete 32-bit kernel and drivers was Windows NT 3.1 [wikipedia.org]. It provided proper support for the 32-bit funtionality as early as 1993, but it was not used much outside of a corporate environment. Home users had to wait for Windows 95, 10 frickin' years after the release of the 386!!! Even then, Windows 95 still contained a large ammount of 16-bit code!
Anyhow, I find it funny that people With Athlon64's are complaining about having to wait a year or two for a version of Windows that can make proper use of the processors. At least users now have the option of running 64-bit Linux or BSD, but alternative operating systems for the 386 didn't become available until 1993 with the release of BSD/386 [wikipedia.org] and OS/2 2.0 [wikipedia.org], neither of which were free.
Well, enough of my rambling. Hope that answers your question :)
Re:I'm taking a big risk by asking this.. (Score:3, Informative)
Great post. Just a minor nitpick...Linux was released (albeit in a crude form) in 1991.
Coding practices need rethinking... (Score:5, Interesting)
Most obvious are char * fields. If the string is 8 characters or less, it is cheaper to just store in the structure (and pass by value, where possible).
Considering, that most such strings (and substructures) are malloc-ed (with a couple of pointers worth of malloc's overhead), the case for embedding them becomes even stronger...
Re:Coding practices need rethinking... (Score:2)
--S
Re:Coding practices need rethinking... (Score:2, Informative)
To determine whether or not it's a pointer or a char string, you have to have some bit or set of bits dedicated to a switch.
Well you could say, dedicate the last byte to a true/false value. Then you can't address any memory that corresponds to a string at certain addresses. So if you pick 0x00 then you can't use low memory addresses. If you pick 0xFF you can't use high memory addresses.
If you use the first byte then you can't address any location
WOW (Score:2)
Re:WOW (Score:2)
Now, 32-bit applications can trip up on bugs in the WoW64 implementation, and since that is intimately tied to the kernel there's *another* thing to break
For people scratching their heads... (Score:2, Interesting)
Yes, I played the Sims, compiled gcc, ran Python chatterbots, had KDE in maximum eye-candy-mode and ran multiple processes in desktops 1-10, but the day I began trying to render a scene with transparent height-fields and looped ISOsu
Another Slashdot Classic (Score:3, Insightful)
A "new era", no, an incremental improvement (Score:4, Insightful)
Re:performance difference (Score:4, Interesting)
Re:performance difference (Score:2)
Is there a practical difference between the two? I mean, if I sat down and used a dual processor machine for a day, then somebody magically swapped it for a dual core machine the next day, would I notice?
Re:performance difference (Score:2, Funny)
Re:performance difference (Score:3, Insightful)
For that matter, dual core processors often report as two separate processors, which potentially would cause Windows user license violations
Re:Now that Win can finally run on 64 bit (Score:2)
Well, hey, that's why AtomChip built the 6,8 GHz Quantum Processor!
Re:Now that Win can finally run on 64 bit (Score:2, Informative)
Interestingly, (to me, anyway), 64 bits can address almost the number of silicon atoms in a typical silicon chip.
Re:Do more registers really help? (Score:2)
The limitations of this are described in more depth here http://64.233.179.104/search?q=cache:5mZte35ICdQJ: www.answers.com/topic/register-renaming+limitation s+of+register+renaming&hl=en&client=safari [64.233.179.104]
And here http://arstechnica.com/cpu/03q1/x86-64/x86-64-3.ht ml [arstechnica.com]
And here
http://www.aceshardware.com/Spades/read.php?articl e_id=53 [aceshardware.com]
One significant problem is that this process is not transparent to the coder; it's done