Philips, ARM Collaborate On Asynchronous CPU 163
Sean D. Solle continues "Back in the early 1990's there was a lot of excitement (well, Acorn users got excited) about Prof. Steve Furber's asynchronous ARM research project, "Amulet". The idea is to let the CPU's component blocks run at their own rate, synchronising with each other only when needed. Like a normal RISC processor, one instruction typically takes one clock cycle; but in a clockless ARM, a cycle can take less time for different classes of instructions.
For example, a MOV instruction could finish before (and hence consume less power than) an ADD, even though they both execute in a single cycle. As well as energy-efficiency, running at effectively random frequencies reduces a chip's RFI emissions - handy if it's living in a cellphone or other wireless device."
Intel were first... (Score:1, Troll)
Re:Intel were first... (Score:1, Funny)
Duh everybody know Intel is all about the clockspeed. How can they sell a clockless cpu?? How could they claim their processor was better than AMD's without some silly numbers to use?
Re:Intel were first... (Score:2)
Re:Intel were first... (Score:4, Informative)
Re:Intel were first... (Score:2)
Re:Intel were first... (Score:4, Interesting)
Re:Intel were first... (Score:2)
Re:Intel were first... (Score:2)
I read about this once ages ago. The reason it didn't get out the gate was that it would still have taken time to produce. By the time it was done, it wouldn't be 3x faster anymore, it'd actually be slower than whatever was out at the time.
Take with a grain of salt, I'm not claiming to have a strong grasp of this particular topic.
Re:Intel were first... (Score:2, Insightful)
Re:Intel were first... (Score:5, Informative)
The AMULET1 microprocessor is the first large scale asynchronous circuit produced by the APT group. It is an implementation of the ARM processor architecture using the Micropipeline design style. Work was begun at the end of 1990 and the design despatched for fabrication in February 1993. The primary intent was to demonstrate that an asynchronous microprocessor can offer a reduction in electrical power consumption over a synchronous design in the same role.
Commodore/Amiga had ASYNC expansion bus (Score:2)
Ok, wasn't a processor and one of the chips that drove it was clocked, but it goes to show you how clever the designers of that system were.
True, but look to the PC world (Score:2)
Re:Intel were first... (Score:2)
Re:Intel WAS first (Score:2, Funny)
Re:Intel WAS first (Score:1)
"Intel are a big corporation."
How can it be "are" if it is "a corporation"?
Re:Intel WAS first (Score:1, Insightful)
Irregular distinction (Score:1)
Re: (Score:1, Informative)
Which is exactly the point. A corporation is only considered a single entity in the United States of America because there is a legal basis for treating corporations as a real person. In the rest of the world a corporation is a distinct legal entity and thus is not treated as a single entity. A corporation is composed of many people, hence it is a collective noun.
Re: (Score:2)
It's the exact same thing in Canada and I guess in most countries.
More efficient (Score:1)
Re:Let's take a non-human group (Score:1)
The computer cluster are running. *Incorrect*
And yet, Pie are squared.
Self-contradictions (Score:1)
Re:Self-contradictions (Score:1)
Re:Self-contradictions (Score:1)
Re:All dialects are correct (Score:2)
as if such a thing exists...
I defer to logic itself to decide the greater dialect.
In that case, English in general is right out.
Re:Let's take a non-human group (Score:1)
Re:Let's take a non-human group (Score:1)
Re:Let's take a non-human group (Score:2)
Re:Let's take a non-human group (Score:2)
Bob said, "Joe said, 'I am Joe', but he was lying."
Re:Let's take a non-human group (Score:1)
Re:Let's take a non-human group (Score:1)
Such a processor already exists (Score:5, Informative)
Re:Such a processor already exists (Score:2, Interesting)
1) ARM like to licence CPU core design IP, as mentioned in a later thread.
2) One of the major upsides of asynchronous CPU design (said Prof. Furber on the Manc. Uni course) is that because the subcomponents of the CPU aren't nearly so tied to temperature, voltage and clock speed requirements (which directly affect flip-flop "set up" and "hold" time), the intellectual property invested in cre
Re:Such a processor already exists (Score:2)
Intresting implications (Score:4, Interesting)
Re:Intresting implications (Score:3, Informative)
The big advantage is that not every flipflop has to be active at every clock pulse and thus saves a lot of energy. Also the chip doesn't turn into a giant clock transmitter.
Jeroen
Re:Intresting implications (Score:1)
way more elegant (Score:5, Informative)
just with higher speed and hence, brute force, performance could be achieved easily.
The problems which could not be solved back then were the obvious synchronisation issues. Setting up a common clock seemed the only way to resolve them.
The idea behind clockless designs is less a "back-to-the-roots" idea, but more a step to gain the advantages of such a design, which are, amongst others:
Reduced Power Consumption
Higher Operation Speed
Moreover, highly sophisticated compilers could tune program code to match a given performance/power ratio.
Yet, I would not bet on clockless cores to become the new mainstream, by far not. Clockless cores will most likely be aimed at embedded design appliances, and low- and ultra-low-power applications.
Re:way more elegant (Score:2)
Mainly because Intel's marketing has depended on clock speed for the last 20 years. I wouldn't be at all surprised to see some of the technology used in future generations of mainstream processors - low power consumption is a selling point when your electricity and air con bills are somewhere up in the stratosphere, particularly if it can still achieve reasonable performance. I don't see it replacing x86 or x86-64, but I co
Re:way more elegant (Score:5, Interesting)
Let me explain: before to reduce power consumption the "easy" thing was to use a process which created smaller transistor, but smaller doesn't means 'reduced power consumption' anymore..
So clockless CPU becomes more interesting now.
Re:way more elegant (Score:2, Insightful)
I think that "embedded appliances" are even more "mainstream" than anything else, since there are far more embedded systems around than general-purpose PC workstations, servers, laptops etc altogether.
Re:way more elegant (Score:1)
But its those GP-workstations etc. that an extensive bigger amount of money is being made with.
Re:way more elegant (Score:2)
Re:way more elegant (Score:2)
Asynchronous is the right term, but is mis-leading. In this context, it means no clock to synchronize ops, it doesn't mean no barrier sync at all.
The key to any out of order execution, asynchronous or clocked is that only independant instructions (or those where register renaming can make them independant) are allowed to pass each other. If B is dependant on A, it will simply not be dispatched at all until A retires. It will wait while C and D are dispatched.
The whole idea behind hyperthreading is that
Re:way more elegant (Score:2)
and they need to know when they're going to be doing what they're doing in order to be efficient.
Certainly now they are clock synchronized, but the only actual requirement is that dependant operations occur in order.
As long as there is a reasonable upper bound on the time an instruction might take the processor need be no less efficient than a clocked CPU. Put another way, a clocked CPU simply makes sure that all instructions execute in the worst case time by holding the result in a latch until the cl
Encouraging technology, but useful soon? (Score:3, Interesting)
Nowadays, the whole CPU is not powered at any one time. If an instruction does not access certain parts of the chip, they are dark. Now this does not hold for some predictive processors which may be processing not-yet-accessed instructions, but in general if an instruction is not using some part of the chip, that part of the chip does not require juice.
Taking out the clock and relying on the chip parts to fire and return means that each application in the system must return to the OS at some point to allow the OS a chance to queue up the next thread. Without the clock interrupt, the OS is at the mercy of the program, back to the bad old days of cooperative multitasking.
The clock is what tells the OS that it is time to give a time slice to another thread. If we say "OK, well we'll just stick a clock in there to fire an interrupt every x microseconds," then what have we accomplished? We are back at square one with a CPU controlled by a clock. No gain.
This kind of system would work in a dedicated embedded system which did not require a complex multitasking operating system. Industrial solutions for factories, car parts, HVACs, and other things that need reliability but don't really do that much feature-wise seem to be prime candidates for this technology. "Smart" devices? Not so much.
Re:Encouraging technology, but useful soon? (Score:4, Insightful)
For embedded systems where interrupt latency is the primary aspect, other approaches have to be found. also, if the CPU checks after every x instructions if there is an interrupt to process, you get a margin of the timely behaviour.
I am no embedded / safety critical developer, but I know that the fastest response times on interrupts and worst-case response times vary greatly depending solely on the (RT)OS used.
Re:Encouraging technology, but useful soon? (Score:5, Informative)
Re:Encouraging technology, but useful soon? (Score:5, Informative)
The clock being dispensed with is the one that causes the registers inside the CPU to latch the new values that have been computed for them. At 3GHz, this happens every 333ps. The reason this clock exists is basically because it makes everything in a digital system much, much easier to think about, design, simulate, manufacture, test and re-use. But, it's not an absolute requirement that it be present, if you're clever. (Too clever by half, in fact.)
The other clock, which you were referring to, fires off an interrupt with a period on the order of milliseconds, to facilitate time-slicing. If your application requires such a feature, you can have one, regardless of whether your CPU is synchronous or asynchronous internally. It's a completely separate issue.
Not relevent (Score:5, Informative)
As for the power problem, all parts of the CPU is powered, except that gates that aren't switching consume less power (mostly leakage, which seems to be quite significant now). In synchronous circuits, at least the gates connected directly to the clock signal switch all the time, while in asynchronous circuits unused parts of the CPU can avoid switching altogether, so some power may be saved, but I don't know how much it will be.
Re:Not relevent (Score:3, Informative)
You didn't understand what this is about. It's not about timing.
You talk about "CPU frequencies". What is that? That's the frequency of the CPU clock signal. It runs everything inside the CPU - at every 'tick' of the clock, instructions move through the CPU, registers are updated, etc. This is about CPUs that don't use a clock signal at all, different things happening aren't synchronous. These CPUs don't have a frequency.
(Probably wrong also, I don't have the time to express myself more clearly - just wan
Re:Not relevent (Score:2)
Re:Not relevent (Score:2)
Re:Encouraging technology, but useful soon? (Score:2)
Except said "clock" is one million times slower if we speak millisecond-wide granularity (HZ in the kernel is, what, 1024 now ?), and a lot of asynchronous processing can happen between task switching.
Re:Encouraging technology, but useful soon? (Score:3, Informative)
We're talking about two different types of clocks:
These two are completely different things. The former can have a pretty low resolution as well -- but is needed for other tasks as well. Any non-degenerate processor will need some kind of timing source, but there is no reason why it would be connected to the number of instructions executed.
In a multitasking operating sy
Quite impressive... (Score:3, Interesting)
To eliminate clocks you would new circuitry such arbitrers and some sort of completion logic which could be used to trigger a flip-flop. To break a slashdot law, i haven't done any reading on any modern techniques so would some one enlighten me on some design issues involving simple tasks such as accessing a register file, or making a memory read. Surely a bus would still maintain a clock.
Re:Quite impressive... (Score:1, Funny)
Hey, go Philips! Go ARM! I'd love to get John Kerry removed from my chips...
standardised implemenation. (Score:2, Interesting)
Re:Quite impressive... (Score:1)
Re:Quite impressive... (Score:2, Interesting)
if you remember your digital design, there's an asynchronous counter. basically, it involves handshaking just like handshaking in a protocol level but at a lower level. yes, there's arbiter, muller c-element (rendezvous), and other nifty components.
Re:Quite impressive... (Score:3, Informative)
David Fang and Rajit Manohar. Non-Uniform Access Asynchronous Register Files. Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems, April 2004.
http://vlsi.cornell.edu/~rajit/ps/reg.pdf
The fastest/lowest energy asynchronous circuits do not use clocks for anything. Moreover, very few arbiters are used in practice. The "completion logic" of course is always the hard part, but about 1
Re:Quite impressive... (Score:2)
It is also possible, though, to make fundamental mode logic. If you feed outputs b
Philips growing into a Major R&D company (Score:1)
Re:Philips growing into a Major R&D company (Score:4, Interesting)
Philips has been a world-class R&D company for a long time. Philips Research was established in 1914 [philips.com], and has contributed much, from the invention of the pentode [google.com] vacuum tube (valve) by Tellegen in 1929 to the audio cassette [google.com] in the 1960s and their more modern work developing CDs and DVDs.
The fire has been lit under IBM and other corporate research organizations for a long time.
ARM Business Model (Score:3, Interesting)
Also, anything that might boost my pitiful ARM shares value is most welcome! Why?... Why did I believe the hype?
ENIAC was first (Score:5, Funny)
Refer to 1944 for prior art.
Re:ENIAC was first (Score:1)
Re:ENIAC was first (Score:1)
you know, it's not a laughing matter.
some of the ideas on this have been patented.
try google with "micropipelines patent". you'll find plenty of them.
Re:ENIAC was first (Score:2)
I can guarantee that they have.
Benchmarking Nighmare ahead :) (Score:2)
(i can see the "the xxx-chip would have won against the yyy-chip if they had used a bigger HSF" flames comming...)
Re:Benchmarking Nighmare ahead :) (Score:1)
Re: What benchmarking nighmare? (Score:2)
AFAIK, that is already the case with all IC's (including analog) that I have ever come across.. nothing new here. A fixed clock just limits operating speeds to 'known reliable' over specified voltage/temperature ranges.
And for including voltage/
no way - (Score:1, Funny)
Idea has been around for 30 years (Score:3, Informative)
The problem is that the first implementations were very slow.
I had an idea once (Score:5, Informative)
The clock is run at a speed that allows for the slowest propagation, with data being transferred in or out of the processor only on the rising or falling edges. This allows time for everything to get stable. It's also horrendously inefficient because propagation delays are actually variable, not fixed.
If you wire an odd number of NOT gates in series, you end up with an oscillator whose period is twice the sum of the propagation delays of all the gates. If you replace one of the NOT gates with a NAND or NOR gate, then you can stop or start the oscillator at will. Furthermore, by extra-cunning use of NAND/NOR and EOR gates, you can lengthen or shorten the delay in steps of a few gates. Obviously at least one of the gates should have a Schmitt trigger input to keep the edges nice and sharp; but that's just details.
My idea was to scatter a bunch of NOT gates throughout the core of a processor, so as to get a propagation delay through the chain that is just longer than the slowest bit of logic. Any thermal effects that slow down or speed up the propagation will affect these gates as much as the processing logic. Now you use these NOT gates as the clock oscillator. If you want to try being clever, you could even include the ability to shorten the delay if you were not using certain "slow" sections such as the adder. This information would be available on an instruction-by-instruction basis, from the order field of the instruction word. The net result of all this fancy gatey trickery is that if the processor slows down, the clock slows down with it. It never gets too fast for the rest of the processor to keep up with. Most I/O operations can be buffered, using latches as a sort of electronic Oldham coupling; one end presents the data as it comes, the other takes it when it's ready to deal with it, and as long as the misalignment is not too great, it will work. For seriously time-critical I/O operations that can't be buffered, you can just stop the clock momentarily.
The longer I think about this, the deeper I regret abandoning it.
Re:I had an idea once (Score:1, Insightful)
Re:I had an idea once (Score:3, Interesting)
Re:I had an idea once (Score:1, Insightful)
Re:I had an idea once (Score:5, Informative)
I assume that you hope to use your self timed logic (as it's known in the industry) to avoid all the problems associated with clocked logic and provide an easy to use asynchronous solution. Please do not forget manufacturing tolerances and that you have to make your self-timed logic 99.99999% certain slower than the slowest asynchronous path. This means that you have to qualify your entire logic library with a specific technology, then guardband it to make sure that when manufacturing shifts due to reasons you cannot explain, your chip still works. For this reason, in my experience, self timed logic has been slower than clocked logic for nominal cases and much slower in fast cases (in special cases, better than breaking even in slow process conditions).
Self-timed logic of the kind you describe would likely still end up with latches to capture the result / launch into the next self-timed logic block. In this case, you're still paying the latch cycle time penalty for clocking your pipeline. You're still burning the power associated with the clock tree (although you are gating your clocks to only the active logic, known as "clock gating", an accepted practice), and you're additionally burning the power for each oscillator, which I suggest would likely be more than the local clock buffers in a traditional centrally PLL clocked chip.
An ideal asynchronous chip would be able to not use latches to launch / capture and still be able to keep multiple instructions in flight -- using race conditions for good and not evil. This would involve a great deal of work beyond simply using inverters and schmitt triggers. This is a larger architecture question requiring a team of PhDs and people with equivalent professional experience.
Re:I had an idea once (Score:2)
There are several ways to implement self-timed logic in general and asynchronous architectures as well. I don't claim to know how Philips did this. My points were not meant to be insurmountable, just a statement about how one specific poster's ideas were difficult to implement and just the tip of the iceburg.
I was responding to my parent post, which stated:
Re:I had an idea once (Score:3, Interesting)
Way Back When (Score:5, Interesting)
Re:Way Back When (Score:3, Interesting)
Re:Way Back When (Score:2)
Interesting... (Score:5, Interesting)
FWIW, ARM have probably known (at least informally and at a level not much deeper than your average slashdot article) a large fraction of what Philips have been up to in this area for at least a decade.
speeds (Score:1)
The WIZ Processor (Score:4, Interesting)
It's a drastic departure from common CPUs. Definitely intresting.
Bye!
Re:The WIZ Processor (Score:2)
Re:The WIZ Processor (Score:2, Interesting)
The WIZ - a new and radical processor architecture [masmforum.com]
P.S. I'm not associated with Mr. Bush in any way; I simply like this kind of things.
Bye!
The 68000 had async operation with /dtack pin (Score:2, Interesting)
Re:The 68000 had async operation with /dtack pin (Score:3, Informative)
Re:The 68000 had async operation with /dtack pin (Score:2)
Inconsiderate researchers (Score:1)
With chips that don't need a clock, there's no room for obsessive tweaking and rediculous liquid nitrogen cooling systems.
What will all the overclockers do with their time now?
Congratulations Science, you've ruined another perfectly good hobby.
Re:Inconsiderate researchers (Score:2)
Re:Inconsiderate researchers (Score:2)
More voltage => more heat => same "overclocking" game.
Some numbers (sorry, old from the '90s) for an Asynchronous MIPS R3000
(Vdd, MIPS, power dissipiation (W))
(1.00, 9.66, 0.021)
(1.51, 66, 0.29)
(3.08, 165, 3.4)
(3.30, 177, 4.2)
(3.51, 185, 5.1)
(4.95, 233.6, 13.7)
source: http://resolver.library.caltech.edu/caltechCSTR:2 0 01.012
Sun has also done work in this area (Score:3, Informative)
NO NO NO!!! (Score:3, Funny)
I want all my precioussss... gigahertz!!!
I worked at Philips for a while. (Score:2)
Re:speed (Score:2)
Async designs are reversable, hence the speed of the chip (and the power consumed) are directly related to the voltage drop over the calculation path. If you increase the voltage, there will be a stronger tendency to finish the calculation, although the power consumption will go up.
Re:speed (Score:2)
1) get the transistors to switch faster
2) reduce the number of transitions on the critial paths
This is the same for synchronous and asynchronous designs.
1) can be handled using voltage scaling (higher V, faster transitions) or simply with scaling (e.g. going from 180nm to 90nm).
2) This ultimately comes down to hyper-pipelining something which is slightly easier in asynchronous circuits.
references:
http://www.async.caltech.edu/Pubs/ P S/2002_energyde layefficiency.ps.gz