
Clockless Computing: The State Of The Art 140
Michael Stutz writes: "This article in Technology Review is a good overview of the state of clockless computing, and profiles the people today who are making it happen." The article explains in simple terms some of the things that clockless chips are supposed to offer (advantages in raw performance, power consumption and security) and what characteristics make these advantages possible.
Sounds very interesting, but... (Score:1)
Sort of reminds me of the Rolling Stone cover back in '90 (or so) that had Jesus Jones on the cover. "Will Jesus Jones save Rock & Roll?" (And notice where they are now)
Re:Sounds very interesting, but... (Score:1)
and where are they now? ;-)
How... (Score:2, Funny)
Re:How... (Score:2)
Re:How... (Score:1)
I believe clock is still needed, but CPU itself doesn't depend on it. OS will surely require clock.
Proof your statement...(was: Re:How...) (Score:1)
No changes required. (Score:2)
A computer's clock (as in date, time, etc) is on another part of the motherboard, and runs (correct me if I'm wrong) off the CMOS battery. That'll always be a "clock" in the sense we understand.
Re:No changes required. (Score:2)
Re:No changes required. (Score:1)
Except every frickin' mobo clock (date/time) I have ever come across can't keep proper time worth a damn. Why is it I could buy some cheap-ass cheesy Care-Bear watch and it is assured of keeping better time, by a long stretch, than any mobo clock? Why is that?!
Re:How... (Score:1)
Of course, this system relies on software to do all the work. A real system clock would just be the same thing implemented in hardware.
This is cool. (Score:1, Interesting)
I'm just wondering, would such a processor execute the same machine code using the same internal sequence of signals twice ? I guess asynchronous communication between elements would introduce some kind of randomness.
Re: This is cool. (Score:1)
(Of course for the benefit of the consumer who doesn't know the differences between benchmarks, some standard benchmark would have to be used so the consumer can simply know "a bigger number on the box is better".)
Re:This is cool. (Score:1)
In the case of Null Convention Logic, it's the extra signal saying 'wait until I'm done!' to the next logic unit. This results in relative dataflow being more random-like. However, both approaches use design to insure the end result is properly achieved.
With Null Convention Logic, this 'relative randomness' means the chip is producing more 'white noise' since multiple clocks are not joining together to produce a steady electromagnetic frequency. This should make design easier as you don't have to fight your own chip design to keep it from interfearing with itself. This is a HUGE problem with current clocked designs. I believe it results in many forced re-designs to get it right, and the problem only gets worse with higher clock rates and bigger chips.
What will they advertise now? (Score:3)
The real reason they haven't moved to this yet is their marketing team doesn't want to give up on the MHz race.
Re:What will they advertise now? (Score:2, Insightful)
Designing something as complicated as a CPU without clocks is a daunting challenge. Keeping
everything in sync, removing race conditions,
keeping order of execution the same. There's a
lot of challenges in a clockless design.
Re:What will they advertise now? (Score:1)
Re:What will they advertise now? (Score:2)
Sarcasm aside -- the SPEC benchmarks have been around for a long time and are well respected. You can see some SPEC CPU 2000 results here [spec.org].
Re:What will they advertise now? (Score:1)
Given that the K7 is 9-way superscalar, you worry about compiler quality.
Re:What will they advertise now? (Score:1)
digging (Score:1)
Power saving, yes.... Good performance???? (Score:1)
Suppose we want to increment the register for 1000M times, clocked circuit will generate hell lot of the noises when all the signal pushes thru the circuit,at say 2GHz,for a duration of say, 0.5s
In terms of noise generation, it will be on par of convention design. As all the gates still need to switch at pretty much the same speed, other physical barriers still operates.
Anyone has more detailed info on this topic?
Re:Power saving, yes.... Good performance???? (Score:1)
Re:Power saving, yes.... Good performance???? (Score:4, Informative)
Of course there is some overhead. There has to be a system telling other parts of the computer when something is finished. But if that is a long enough stage (perhaps thousands of instructions) then it'll be faster overall.
Re:Power saving, yes.... Good performance???? (Score:1)
Actually you can do even better. If the instruction executed does not need the memory stage of the pipeline, it can exit the pipline before that stage. This will allow multiple quick instructions (eg shift) to execute and exit the pipeline while the slow memory instruction ties up the memory stage. This psuedo parallel operation is what clocked processors can only do with multiple pipelines.
Re:Power saving, yes.... Good performance???? (Score:1)
Not so; many instructions take multiple cycles. Which ones depends on the machine, but multiplication, division, jumps, and of course memory accesses, are usually 2-20 clock cycles to execute.
More time is spend doing memory access and missed branches than anything else (IIRC: Pentium Pro guesses 90% of branches correctly, and missed branches count for about 30% of the overall time of executing a typical piece of code). IA-64 does some interesting things to prevent missed branches from hurting the code (basically, it executes both branches in parallel, throwing away whichever one was wrong). IA-64 has so many functional units that I guess in the long run, it turns out to be a win.
If the advatages they cite for these chips are true, things could get very interesting in a couple of years.
Re:Power saving, yes.... Good performance???? (Score:1)
useful for transmeta? (Score:1)
Any comments from someone more knowledgable than I?
Re:useful for transmeta? (Score:1)
crypto regulations (Score:1)
Not if you have a backdoor. Guess these guys don't read Wired [wired.com]..
and it doesn't even make sense... (Score:2)
No doubt this was a sw path put in by a well intentioned programmer trying to save battery life, but now all respected encryption systems reccomend a "veil" strategy, where all encryption/decryption operations take the same amount of time and power regardless of the key.
In practice this means that you find out the max time and power (plus some margin) and if you are done early and without using enough power, you waste time and power to pad out the the veil...
Nice thought, but this just goes to show that cryptographic systems really need to be designed by experts...
Re:and it doesn't even make sense... (Score:1)
That's not really necessary. All you have to do is randomize the compututation. For example, power analysis of a smart card doing RSA can recover the secret key, if it knew what the input was (in many situtions, a reasonable setting). But if you multiply (or is it exponentiate?) the input by a random number, then do the RSA op, then demask the output, poof! - PA, electromagnetic emission analysis, etc all get very very hard.
Also, it can be hard to disguise your "wasting time" as being part of the computation, if the attacker can, for example, track which memory is being accessed when.
I wonder how well these clockless chips would fare against differential fault analysis; basically progressively destroying gates in the chip and looking at it's output over time. Almost any chip will fail against this attack (but it requires lots of expensive equipment and a fair amount of expertise).
Clockless ARM (Score:2, Interesting)
Re:Clockless ARM (Score:1)
I was only about 14 and didn't have a clue about half the stuff that was being talked about, but the AMULET simulator they showed at the end looked kinda cool
Maybe it was longer than 4 years, I remember we were waiting for the first shipments of the StrongARM processor upgrade cards for our RiscPC 600's and 700's
Ah well, guess I'm getting old now...
Asynchronous vs. synchronous computing (Score:3, Interesting)
I have previously read (forgotten where) that in theory async. computers will always be slower that sync. computers. It seems that that is not true anymore. I guess that the latests-and-greatest CPUs have a non-trivial percentage of idle time for instructions which takes slightly longer than an integral number of clock ticks. If an instruction takes 2.1ns and the clock runs at 1ns, everything have to assume that the instruction takes 3ns.
Also imagine a fully async. computer. No need for a new motherboard or even changing settings in the BIOS when new and faster RAM chips are available - the system will automatically adapt.
I think that we will see more and more async. parts in the year to come. But I don't know if everything is going to be asynchronous.
Re:Asynchronous vs. synchronous computing (Score:2, Interesting)
Now i'm not an engineer, but in the article it mentioned that it was important to have wires and gates connected in a special manner so the data arrives in the proper order. It seems to me that it would make the microprocessor more dependent on the hardware and not less so. Maybe this wouldn't be a problem if all of your RAM was the same speed, but it could cause a problem if you had one 100Mhz simm and one 133Mhz simm. I would think that the information coming from the 133 could screw things up. Can anyone clarify this for me?
Re:Asynchronous vs. synchronous computing (Score:1)
If the RAM delivers data in serial manner (on bit at a time in one wire), faster RAM would definetely cause problems because the CPU would not know how to distinguish the individual bits
On the other hand if the data bus is e.g. 33 wires = 32 data wires and one "handshake" wire, the protocol between the CPU and the RAM chip could be:
CPU -> "give me the contents of address 0x38762A63"
CPU then waits for the handshake wire to go high
The RAM chip sees the address, puts the contents on the data wires, and then sets the handshake wire high.
And the the CPU can read the data.
The above asynchronous protocol does not depend on the speed of the RAM chip. The RAM chip could be a future high-speed "zero-latency" chip, or a slow flash-ram chip. The CPU does not need to know.
There are problems with this too. The protocol is sort of request-reply / step-lock. And how do multiple devices share the same bus. And and and...
Noone said it was easy
Re:Asynchronous vs. synchronous computing (Score:1)
I don't think the advantage of async is a direct increase in speed, but rather a decrease in die size (because the clock signal doesn't have to be propagated to all parts of the chip), which leads to a decrease in power requirements and allows the chip to operate faster without overheating.
Sun has made a prototype (Score:1)
(I'm sorry, I can't use HTML: the lameness filter don't want to allow the posting otherwise.)
I imagine the "perfect" laptop:
- an OLED screen (no need for backlighting)
- an asynchronous processor (low power)
- no HDD, but plenty of MRAM (this RAM is persistent)
Ideal Laptop (Score:2)
Old news? (Score:2, Informative)
http://www.cs.man.ac.uk/amulet/index.html [man.ac.uk]
Reliability (Score:3, Interesting)
BTW, does anybody here remember analog computing? A bunch of cleverly connected operating amplifiers? These things were asynchronous, just as mother nature is. If you can get the physics work for you, bingo - compare the time the nature needs for raytracing a complex scene compared to a digital model
Re:Reliability (Score:1)
If some unit takes a bit long to respond, you don't get a glitch, as you would in synchronous designs, but instead the unit it is talking too
slows down a bit.
Synchronous and Asynchronous are really misnomers. Better terms would be "globally synchronized" and "locally synchronized".
Re:Reliability (Score:1)
Not so. In fact, one of the greatest problems with clocked boolean design is the interference caused by all the clocks on the chip. Fabrication will routinely result in broken chips, forcing multiple redesigns and long development cycles.
Tremendous resources are dedicated to getting around this problem. Also, you can't really just change the design 'a little bit', as doing so results in more interference issues. Want to add a new unit to a clocked boolean logic chip (a new cache, 3d unit, new pipline, etc)? Sure you can do it, but it will require fundamental redesign as adding those clocks associated with the new unit will interfere with other clocks on the chip, and other clocks will interfere with your new unit. The fact that they all have to fire off simultaneously, generating electromagnetic interference, is a real needle in the eye for chip designers.
With well thought out async, all you have to do (more or less) is add the unit to the design. The 1st fab should work, no redesign cycle required. You can add cache memory or whatever and as long as the design is logically valid you will have a functioning chip in a few days time (as long as it takes to fab the chip). Try that with syncronous logic.
Re:Reliability (Score:1)
Hmm... But in an async setup they maybe fire simultaneously - you simply don't know, it's up to the statistics. I fear that in that complex chips you will end with a system that works by pure coincidence - some picosecond fluctuation somewhere and you get one glitch per 1000 hours of operation.
It is probably not that simple and as someone wrote, the more proper name would be globally or locally synced. I fully agree with you that there is no reason to tie the bigger units to a single universal clock. But I think that on the lower levels you can get a more reliable design by using traditional approach.
I have no experience in chip design (so I don't know specific problems of trying to stuff tens of millions transistors onto a square inch), but I designed some non-trivial circuits.
Re:Reliability (Score:1)
1. Lack of availability of design and synthesis tools.
2. Lack of engineers trained and experienced in the use of clockless logic.
3. Lack of multi-sourced, high volume production of clockless logic components.
4. Lack of a clear economic incentive to abandon clocked logic in favor of clockless logic.
I would dearly love to be able to experiment hands-on with a clockless CPU myself, but the cost and difficulty of obtaining just one such device is more than I can justify personally.
This isn't new... (Score:1, Informative)
The reason be built them clockless is that the propogation time to get the clock signal across the machines (which were fairly large) would have significantly slowed the performance. Instead, all of the wires are the right length so that all of the signals arrive at their destination at the right time. I've been told horror stories by ex-CDC salesmen that when they installed new machines, they would spend days or weeks clipping wires to different lengths and debugging hardware failure modes until it all ran smoothly.
Cray also solved the heat dissapation problem by designing the computer to run hot. This meant that when you turned it on it didn't work reliably until all of the ceramic boards heated up (and expanded) so that the connections were solid, etc.
F-ing brilliant.
Re:This isn't new... (Score:1)
CPU Primer (Score:2, Interesting)
If you design a multiplier circuit using a bunch of full-adders, you'll notice that the output take a long of time to settle. In fact, depending on what numbers you are multiplying together, the circuit may take more or less time before the output settles.
You can always determine the worst-case scenario for a multiply operation to settle. If the multiply takes longer than any other operation, then the multiply op is the "critical path".
A chip's frequency is the inverse of the period of the critical path (in most cases). So, if it's possible to do 100 million critical path operations in a second, then your machine can run at 100MHz.
What the article is hinting at is the amount of wasted time because everything is (currently) done on the clock cycle. Allow me to illustrate: Let's say a multiply takes 5 seconds, but an add only takes 1. A fixed clock rate (or having a clock at all) forces that add instruction to take the extra 4 seconds, and use it for nothing. Wasted computer time.
Now, the reason people are skeptical is because there is no efficient way to tell if a multiply operation (or any other operation) has actually completed and the outputs have settled.
Incidentally, if this interests you, go grab a free program called "diglog" or "chipmunk". The software (for linux/windows) allows you to simulate almost any digital circuit.
Another thing to keep in mind about current CPUs is the way they execute an instruction. Every instruction is actually made of smaller instructions (called microinstructions). Microinstructions take one clock cycle each, but there is an arbitrary number of microinstructions for each larger instruction. The microinstructions perform the "fetch execute cycle" - the sequence that decodes the instruction, grabs the associated data, performs the desired task, and goes back for more.
If you're interested in designing a CPU yourself, go grab a book by Morris Mano called "Computer System Architecture". With that book and DigLog, it's pretty easy, but it takes a long time.
There's a solution to one problem mentioned (Score:4, Informative)
But at least here there's an accidental solution - the Cross-Check Array.
Conventional clocked chips can be tested by scan: A multiplexer is added to the flop inputs, and a test signal turns them into one or more long shift registers. The old state of the flops is shifted out for examination while a new state is shifted in to start the next phase of the test. This only works when the flops to be strung together are all part of a common clocking domain.
The Cross-Check Array is more like a RAM. A grid of select lines and sense lines are laid down on the chip, with a transistor at each intersection. The transistor is undersized compared to those of the gates, forming a small tap on a nearby signal - or it can inject a signal if the sense line is driven rather than monitored. Select drivers are laid down along one edge of the chip, sense amplifiers/drivers along another.
This approach does not depend on the flip-flops to be active participants in the observation process (though it can still force their state), and thus can observe signals in asynchronous as well as synchronous designs. It also gives observability of testpoints in combinatorial logic without the addition of extra flops. Compared to a fullscan design it gives much greater observability and takes about half the silicon-area overhead.
Programming difference? (Score:2, Interesting)
Re:Programming difference? (Score:5, Informative)
Disclaimer, I was a student at Caltech, and I took 1 async VLSI course, and not very in depth at that.
One way to go about it is to make an async CPU that externally looks like a sync CPU; then you drop it into just about any system, and it works. Speed is wholey dependent upn VCore settings, cooling solutions, and drive strength, I think, though of course there's always gate and transistor performance bottlenecks. Programming and using such a chip would be no different than any other CPU.
Another method is to have a partially async system, in which the CPU, some of the motherboard, and the ram interface is async because of how fast they operate; go ahead and clock something like PCI, USB, etc, because those operate slow enough that the effort of async isn't worth it. This solution is just a question of degrees, really, on how much of the system is async and how much isn't.
Now, that aside, there's the software aspect; how do you program an async system? At the lowest level it resembles, slightly, multi-threaded programming, in which you have multiple threads equating to the multiple function units, execution units, decoders, and stages in the pipeline, etc.
You shuttle data around and wait for acknowledges that the data has been processed before you continue shuttling and processing data. You can synchronize around stages or functional units by making other stages or units dependent upon the output of said unit; instead of waiting for a clock to signal the next cycle of execution, you wait for an acknowledge signal.
To be a little more clear, at the ASM level you would mov data, wait for an ack before another mov data, wait for an ack before sending an instruction, etc. Due to the magic of pipelining, the CPU doesn't have to be finished before you can start stuffing the pipeline, and because it's asynchronous, that means you can actually feed in data as fast as the processor can recieve it, even if the back end or the core is chocking on a particularly nasty multiplication.
So you're feeding data at a furious rate into the CPU, while the CPU is processing prior instructions. If the front end gets full, or whatnot, it fails to signal an ack, so whatever mechanism is feeding data in (ram, cache, memory, whatever) pauses until the CPU can handle more data.
The core, independent off the front end, is processing the data and sending out more instructions, branches, setting bits. With multiple functional units, each unit can run at it's own speed at it's own rate. So if all it's doing is adds, checking conditionals, etc, it may be able to outrun the data feed mechanism, since an add can be completed in one pipeline unit, while data always has to wait upon a slower storage mechanism.
Or if the execution units are waiting because it's doing a square root or something, it just tells the prefetch or whatever front end units to wait, because it cannot handle another chunk of data or instruction, yet, which propogates back to the data feed to wait as well.
When it finishes with it's current instruction a ready signal would get propogated back through all the stages or so, and then more data would get fed in.
So at the lowest levels it would start to resemble writing threaded code, in which you have to wait for the thread to be ready, to be awake, to be active before you send data, and if the thread is asleep, you wait until it awakes, or something like that.
Multiprocessor async is similar, except that each CPU is just another thread, and if there's a hardware front end that decides which CPU to send instructions to, then it's really just a function of stuffing instructions into the least loaded or fastest running CPU; each CPU could, more or less, look like just another functional unit, and clusters pretty well because they all run asynchronously, meaning you don't have to do anything particularly special for load balancing; just send the data to the first one who signals ready, or if there are multiple cpus ready, read a status register to see which is more empty or whatever.
Apologies if I made some errors, especially to those who know much more than I; this is a 4 year old interpretation of my async vlsi class =)
Re:Programming difference? (Score:1)
> resemble writing threaded code,
In case of a pipeline stall, would SMT be advantageous?
To me it looks like asynchronous non-CPU devices, pipelined CPU, and SMT would be an ideal combination.
Re:Programming difference? (Score:2)
So in the case of a pipeline flush (and accompanying stall), it doesn't take N clocks (whatever the pipeline depth is), it goes as fast or as slow as the flush mechanism reset takes...
If done well, then a pipeline flush can operate at thousands of times faster than the normal operation of the pipeline because, well, you're just dumping data without doing any work; raise the proper bits and reset signals, and the whole pipeline dumps as fast as it can, while the front end feeder just slows down a bit (without stopping) in feeding data into the pipeline.
Above assembly, btw, the programming language for the CPU doesn't have to look like SMT; it can, but it doesn't have to.
Re:Programming difference? (Score:1)
correctly, even force schedules with NOP instructions.
In async programming NOP instructions can't be implemented. They don't really make sense anyways. The pipeline in an async chip is technically allways stalling. So you will need to learn many more consequences to the instructions you choose. At times the data has an effect on the speed of the instruction.
Using logic like DCVSL a 32 bit shift operation would finish faster if the data was a binary 1 versus any number that used multiple 1's in its representation. This makes optimization rather intersting. For example you can perform byte operations faster than word operations.
In the async processor I did for my thesis, we simply ran the optimized synchronous code throwing out the NOP instructions. The result was fater execution even if the code was not optimized for the correct processor.
Re:Programming difference? (Score:1)
Efficiency with growing clock speeds (Score:2)
Re:Efficiency with growing clock speeds (Score:1)
Even if you removed all other bottlenecks, a 1GHz version of a 500MHz CPU (with no other architectural improvements) will not perform twice the work of the 500MHz version due to clock overhead.
Re:Efficiency with growing clock speeds (Score:1)
Even with a hypothetical chip that doesn't incur speed decreases due to pipelining, the clock will still end up nearer to parts of the chip than to others, which will result in latency at the end of the pipeline.
Hence if you've got a 500Mhz chip with 2 stages and the clock physically placed near stage 1, then stage 1 of the pipeline will run at 500MHz, stage 2 will also run at 500MHz but with some latency, so the two-stage pipeline will complete an instruction very slightly over 2 cycles. Add more stages, you'll get a bigger effect at the end. And as clock speeds go faster, you'll eventually hit the ceiling -- the latency might actually be as fast as a single cycle itself.
And having multiple clocks to offload the work (and to bridge the gap from the other stages) can only do so much -- eventually it becomes an issue of timing all these clocks together. You'll eventually wish to remove the clock altogether. =)
As for I/O with the rest of the system, it's not really an issue here -- what is being discussed is the processor's raw speed. I/O bottlenecks are already being solved via intelligent caching, and for more improvement we will probably have to wait for a totally new architecture.
Re:Efficiency with growing clock speeds (Score:2)
Mix&Shake (Score:2)
You guys are way behind (Score:1)
Gee...
Re:You guys are way behind (Score:1)
Gee indeed.
People are to stuck on MHZ. (Score:1)
Smells Like Vapor Ware (Score:1)
Of course the new Pentium 4 contains some elements of asynchronous design... all synchronous chips do! In a synchronous design, the logic between registers (article calls Flip Flops) is asynchronous. The gating factor on the amount of asynchronous logic you can place between registers in a synchronous design is a function of the clock speed and the gate speed -- the faster the gates, and/or the slower the clock speed the more logic you can place between registers. Looks like the article is about a system with a clock rate of 0 without changing gate speed, so the processing rate will be the sum delay of the asynchronous logic -- I wonder what this would be on a chip the complexity of a P4 or G4?
The upside to slower clocks is reduced piplineing, which can be useful in designs with limited data paths.
The down side to slower clock speed is increased complexity. Data skew has to be monitored across the chip, so gate delays have to be accounted for every gate in every possible data path (vewy complex). The chances for glitching increase with logic. With no clock it gets worse, every glitch can be seen -- not the case with a clock (glitches between clocks edges may be tolerated).
I also disagree that clock distribution is limiting factor. This problem is overcome in larger ICs by distributing PLLs throughout the silicon. The limiting factor in clock speed has more to do with materials used in the chip -- gate speed, skin effect, etc.
Finally, there are quite a few ways to increase the performance of synchronous design. One way is to have multiple data and ALU paths like the Pentium and G4. Another is IC technology. Personally, I'm waiting for the day an all optical processor hits the market.
So an asynchronous chip runs a little faster, the trade is an enormous design risk, maketing, OS development, etc. I say leave the anarchy to the software.
Asynchronous mainframes (Score:2)
Parts of processors are already asynchronous. The basic way you get stuff done in a clocked machine is that you have a register feeding an array of logic gates some number of gates deep, with the output going to some other register. Within the array of logic gates, which might be an adder, a multipler, or an instruction decoder, things are asynchronous. But the timing is designed so that the logic will, in the slowest case, settle before the register at the receiving end locks in its input states. The worst case thus limits the clock rate, which is why the interest in asynchronous logic.
The claims of lower power consumption are probably bogus. As Transmeta found out, the power saving modes weren't exclusive to their architecture. Once power-saving became a competitive issue, everybody put it in.
circles and squres (Score:1)
The usual problems... (Score:2, Informative)
However, it seems to have spawned the usual problems here with misunderstanding and confusion. Practically a
Whether you construct a processor using conventional or asynchronous logic makes no difference to the programmer. The programming paradigm can be completely independant from the underlying hardware. (Admittedly, if you want to squeeze the absolute most performance from a given hardware design, you need to program with it in mind, but there is no reason why an ix86, or PPC, or SPARC, or MIPS chip couldn't be implemented asynchronously.)
One of the most interesting advantages of asynchronous logic is that it allows the use of arbitrarily large die sizes. In synchronous logic, you're limited by the delays that arise from transmitting your clock pulses across the chip... at some point maintaining a global lock-step becomes infeasible.
One of the most marketable advantages of asynchronous logic is the power saved by not having to constantly drive the same clock circutry. Most chips support a 'sleep' or 'low power' mode where they turn off the clock or provide it to only a limited portion of the chip. The chip then has to go through a 'wake up' cycle to re-establish the clock throughout the chip before returning to normal operation. The power saved by asynchronous operation can be substantial, and the lack of a 'wake up' latency can be critical in certain applications.
The biggest problem right now is that the vast Layout and Design masses are used to solving the synchronous problems and not the asynchronous problems, ditto for the availible tools. Howver, with an asynchronous-savvy group, a given solution can be designed in less time than the equivalent synchronous solution (someone here was claiming otherwise...).
And this technology is -not- vaporware... it's real and it's here. And whether you believe it or not, it's at least one part of the future.
-YA
PS: BS in EE from Caltech. Working for a company mentioned in the article, although their opinions have no logical relation or tie to mine.
Null Convention Logic (Score:1)
Stating the Obvious- Human Brain (Score:1)
An interesting thing to think about is, with no clock speed, how we still can perceive time. We need to do this to predict the paths of moving objects, like birds and arrows and spears... or more recently car trajectories when we're driving. With no absolutely authoritive center time in our minds, how do we still have such an accurate sense of time when it comes to predictiong these paths?
I personally imagine that the brain does have some sense of ratios...I imagine that neural loops have some sense of ratios... for example, if hypothetically the motor loop between between say the basal ganglia and the corpus collupsum is were twice the speed of an eyeblink? The exact milliseconds could vary between people but still give a basis for comparing motion and "time" in the real world. Of course, this would be affected by age as the loops break down- this would account for the way the old people I've seen tend to drive.
A practical example (Score:1)
Re:A practical example (Score:1)
Pulled up Intel's Instruction Set Reference
(ftp://download.intel.com/design/Pentium4/manua
surprised to discover that they are apparently not giving the programmer
any clue as to how long, or how many clock cycles, it takes these instructions
to execute.
Likely this is because this is a very difficult question to answer, clock
cycles per instruction being highly variable dependent on what else is
going on in the processor at the same time.
As I recall in earlier versions of the the pentium clock cycles per instruction
would range from 110 to 20 cycles.
If we assume the average pentium 4 instruction takes 30 clock cycles to
complete, then a pentium 4 running at 2 gigahertz is executing 66 million
instructions per second.
The X18 executes 2.4 billion instructions per second. That's 36 times
faster.
Further the X18 in any quantity would probably cost several cents per cpu to
produce. The pentium 4 at 1.7 gigahertz cost about 209 dollars.
A little fairer comparison would be the X25 costing one dollar in quantity once
one has gone a million units down the learning curve. This is an array of 25 cpus
and its practical instruction processing rate is probably highly variable dependent
on application. There might be special cases where one could use all the cpus and
deliver 60 billion instructions per second (909 times faster than the pentium 4),
but more typically I would guess it would be a fraction of that although still
of course much faster.
Re:A practical example (Score:1)
Although I suspect C++ would be a bear, simply because C++ is apparently difficult to implement for any processor, the others might actually be easy.
Java has a forth-like bottom layer.
Visual Basic, well, I am reminded of Marcel Hendrix (see http://home.iae.nl/users/mhx/). Some time back he developed a general procedure for implementing well-described computer languages in forth.
As test cases he did this for several languages one of which was Pascal. He was able to do these implementations quite rapidly, almost automatically, and as I recall, Pascal implemented in Forth was significantly faster than Borland's Turbo Pascal (implemented in, I think, assembler?).
Chip Security / # of wires (Score:1)
Don't you just have to look for the handshake signals instead?
Also, what are the implications of the "dual-rail" circuits -- doesn't this mean that you won't be able to fit as many transistors on the chip?
Intel (Score:1)
Who? (Score:1)
You know, as with the police, I have a lot less trouble with Bill & co. than with their sycophants. At least the way B.G. and M.S. operate makes sense for _them_; to hear these cheerleaders prate along as if Bill might actually _like_ them....
Re:what about other problems? (Score:2)
Busses can be made asynchronous. Handshaking is the key. New statigies will be needed, but people are bright so I feel they will be developed. With a little thinking I've sketched out a packet type asyncronous bus in my head. It would work nicely for up to a meter or so. Longer lengths would be slower than shorter ones. One thing I feel may work best is for any signal/data that needs to travel significant distances is to then go into synchronous transmition. Otherwise you end up adding in delays from the back handshake signals.
I remember some of the first articles in SIGARCH and how they sparked my interest. I've always felt that async was the way to go when you don't know how long an operation will take. I'm happy to see it's still getting research dollars.
Re:what about other problems? (Score:2)
Okay, the way I suppose this would work, considering that Intel had developed a chip that was compatible with the pentium series, would be an asynchronous design, with some kind of logic translator to communicate with the bus. Yes, at first you would be wasting processor power, but eventually, the bus technology would catch up (See ISA to EISA to VLB to PCI to AGP and on...). As for the RAM, it could either run on an independent clock-bus, or, I do not see why it would be a problem to develop asynchronous RAM if they have the technology for the chips. Also, the article states that the P IV utilises some asynchronous componants, maybe that is port of the reason for the push to use RDRAM with it?