Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Technology

Self-Timed ARM Provides Low Power Consumption 83

hardcorebit writes: "The Amulet Group at the University of Manchester is working on a 'self-timed' or 'asynchronous logic' chip which uses the ARM architecture and instruction set. The benefits? Much lower power consumption, lower EMF emissions, and it works with everything written for the ARM. Their latest effort is 'broadly equivalent' to an ARM9. Anyone had a chance to get their hands on one of these beasts?"
This discussion has been archived. No new comments can be posted.

Self-timed ARM Provides Low Power Consumption

Comments Filter:
  • Since CPU marketing has reduced to saying my clock speed is faster than yours (whether my CPU can do more useful work being irrelevant) what will be the "dumbed down figure of merit" (tm) for systems that don't have clocks?
  • ARM chips already have a vast amount of market share -- they dominate the cellular phone market. (In fact, I don't know of a single cellular phone chip set based on any processor core except the ARM7 or ARM9. Somebody help me here -- there has to be one...)

    The thing is, that market is a great deal more power sensitive than anything you or I can imagine in a desktop machine, or even in a laptop or palmtop configuration. All the power cost of the system in the processor (except when the user is actually transmitting), and any increases in processor power consumption come at the expense of standby battery life. If you want to add rich content to a cell phone, you need a faster processor, and standby life can't be reduced much more without losing users...bottom line, without improvements in processor technology, smart phones won't ever be a reality. We may have smart briefcases, but, in that case, why use a phone and not a wirelessly connected laptop?
  • Disclaimer: I'm a CS student at Manchester

    I believe EDSAC came one year later. Check the links on the above posts and you'll find Manchester Baby dates back to 1948 & EDSAC to 1949.

    I think Cambridge claim it was the first 'proper' computer as it was more like modern computers. Possibly had a teletype style keyboard/printer? Can't remember exactly.

  • For anyone curious about that statement, the way ARM does it is by making ALL instructions conditional. Rather than branch for a small piece of conditional code, you can just scream right thru it!

    I used to work for Acorn (the original "A" in ARM before it was changed). The guy doing the Amulet work at U. Manchester is Steve Furber, who was one of the original Acorn design engineers and original architects of the ARM.

  • Ever seen the Transmeta webpad? OOoooOOhhh I want one of those.

    Quite so. Kinda like the ARM-based NewsPAD of old, innit. Hope it's rather more successful.

    Has anyone ported linux to ARM?

    Yes [linux.org].


    --
    This comment was brought to you by And Clover.
  • Might not work just yet.


    I'm a 2nd year CS student at Manchester University, and one of the final year projects on offer this year is to port Linux to the Amulet.


    I didn't look too deeply into it, but I assume from the fact that this project is on offer that some work must need doing to get the present arm ports running on the Amulet.


    just a little insider information :-)

  • However, an asynchronous design requires that the delay lines be very conservatively designed, as if the delay line was a little faster, and the logic a little slower, on the worst case critical path, the chip would fail completly, which results in a slower processor by design.

    This is the same argument I've heard for using asychronous logic. The idea was, that the speed of the entire core is limited by the portion that can't accept a clock higher than N MHz. With asynchronous logic theoretically this allows each function block to work at their maximum speed, with the slow ones affecting only the operations that they're involved with.

    I'm not necessarily disagreeing with you, but that's the general idea of the counterpoint ;-).

    Larry
  • O.K, i'm sorry if my orginal post implied that i thought that PC's & Macs "where the only computers". Certainly not, i used to own an Amiga (Not more than 2 years ago), and have used Acorns before. So i know what the ARM is like in use. :)

    The question was a real one. Embeded systems honestly didn't cross my mind. You give some good examples. But, does/would the self-timed ARM designs still consume more power than a Transmeta? Does anyone have any hard data on the two for a comparision?
  • It's what I did :)

    There's plenty of projects [man.ac.uk] available using the Amulet for students at Manchester University. One guy [www.axio.ms] I know was building a robot control board using one of these (it had a particular lysick offboard communications system - it was self-programming using a Xilinx array, and used the bootstrap code to soft-load it).

    One of next years projects is to port Linux to Amulet. Should be interesting, as I don't know as there's an MMU yet.

    Anyway, I have seen one of them running and it's quite impressive. It has a genuinely low power consumption (almost literally nothing when not doing anything), and because of the wonderful CMOS speedup effect, you can increase the speed of the thing by wacking up the core voltage. Very cool.

    The implementation spec is fairly tightly controlled, though, so don't expect to get hold of it just yet.

    Anyway. Come to Manchester! It's got pubs!

  • Nice to hear what the enemy has to say ;-)

    they [both] would say that though wouldn't they.

    Cheers for the info.

  • First off, I'm not an expert here (again, I don't get to get too deep because I second as the sysadmin -- a "pee-on as needed" engineer if you will ;-). I hope to get deeper into NCL design as the sysadmin duties die down (they were quite heavy when I got here because they had been operating without a sysadmin for ~3 years as they grew).

    Secondly, I assume you understand the purpose of the "acknoledgement", which is essentially the "hey, I'm with the previous result, I'm ready for the next set of inputs"? The acknoledgement along with the normal properties of CMOS prevent any "race condition" from occurring (I assume that is your fear?). Again, the physical design is pretty much "100% delay INsensitive". Again, I'm not exactly following you here, but remember, we're not using traditional NAND, etc... gates in CMOS, but NCL gates (e.g., 3 of 5) and they are designed specifically for NCL and the acknowledgement flow.

    Third, the only problem 2NCL (the NCL math used in CMOS -- 4NCL is ideal, but NOT practical in physical design) has to do with is what we call "orphans." They are unforseen results that may either cause a condition where data can be "hung" from moving on, or (more likely), the input triggers a gate to open when it shouldn't (e.g., forgetting to take the acknoledgement into account). AFAIK (I've never messed with finding them myself) "orphans" are a "pure logic" problem and we can identify most of them through a post-design "orphan checker."

    Again, I do *NOT* speak for Theseus Logic and there are much better individuals here who can clear up any questions. Feel free to fire off some questions to the address(es) on the web site.

    -- Bryan "TheBS" Smith

  • Actually, tolerance of manufacturing variability is one of asynchronous design's strengths. Because each "chunk" of gates only performs its computation when all input data is available, it doesn't matter if a piece of data arrives early or late. The computation is data-driven. It runs as fast as the lines can switch. So it does not have to be "more conservative."

    Async designs require more silicon area for simple things, like data lines. Rather than having a single "high = 1, low = 0" data line + a clock line, our group used three wires. First wire high = 1, second wire high = 0, third wire high = downstream component got the data, reset please.

    The coolest geek feature of async processors is that if you improve the transistor physics (e.g., put an ice cube or some liquid nitrogen on the processor) the instruction rate increases. Whee!

    James Cook
    ex-"cook@vlsi.caltech.edu"
    now james@cookmd.com
  • Remember, I said that when computers were invented, besides the "operator" and "operand," you had to account for the "control", which was previously the mathematician him/herself with boolean logic on paper.

    The first computers WERE async (e.g., Eniac, etc...). But by the speed of components in 60s, the race conditions were great enough that prompt a "formal" control line. This became the clock, which was simple and did the job. By the time ICs rolled around (early '70s with Intel's first memory package), the clock was the "thang" to use.

    But starting with speeds of 100MHz+ and the number of transitors in the millions, clocks were limited by the speed of light (combined in the delays of the semiconductor material itself, silicon). As such, clocks are now localized to certain portions of the IC, but yet, have to be "synchronized" somehow.

    Karl Fant, the man behind NCL, devised this embedding of control in logic at Honeywell over two decades. Theseus Logic was founded in 1996 to take NCL commercial (Honeywell had no interest in doing so). Our main argument is that WE FINALLY ADDRESSED "CONTROL" in the way computers should be designed. Remember, boolean logic and algebra was designed for mathematicians, NOT computers. And most of the industry is starting to side with us that dual-rail/acknowledgement is the way to go.

    -- Bryan "TheBS" Smith

  • Here's a quick summary of the benefits of NCL:

    1. Asynchronous logic with all the benefits -- Although not the most efficient async (Furber's "packetized boolean" is damn good at low-power), it does inherit the low-power, low-EMF generation, high EMI-tolerant characteristics (over clocked boolean). Most people do not realize that can be upto a magnitude of power savings (yes, 1/10th of clocked boolean), let alone the EMF/EMI benefits (e.g., in devices like cell phones or, more critically, pacemakers).
      [ One really "neat" feature of NCL is the inverter gate, THERE IS NONE! Invert in NCL is simply done by swapping the rails! ]
    2. Nearly 100% delay insensitive -- NCL logic is delay insensitive, and will self-synchronize with any input or output, even if it is coming from a boolean input or going to a boolean output. This is the main problem with most asynchronize technologies! Likewise, it is easy to connect NCL combination logic together for a complete circuit ... or an entire chip ...
    3. Completely solves the increasing problem of DESIGN RESUSE! -- Following onto #2, because we can always reuse and reuse NCL combinational designs because they are self-synchronizing, we can use the over and over again as the feature sizes shrink, and complexity grows. Every wonder why Intel, AMD, IBM, Motorola and others have to "go back to the drawing board" everytime the feature size shrinks 1-2 times? Because the logic no longer sychronizes between localized modules! NCL is the ONLY WAY to get to custom, repeatable SoC (system on a chip)!
    4. Lastly, NCL can be designed with existing tools -- This is a big plus and not found with really any other major, competing asynchronous method. NCL sounds all "fine and dandy", but who cares if we have to chuck our existing designs and learn it all over again -- well you do NOT. Various, slightly modified, tools (e.g., Synopsys Design Compiler) can take boolean or boolean-oriented designs and crank out complete NCL designs. Although not completely optimized, that's where our optimizer comes in and finishes the job. But you CAN use existing tools to start making working NCL circuits now.

    The only negatives to NCL are:

    1. Dual-rail implementation requires extra chip real-estate -- This can translates into the die being upto 50% larger in the worst case. Fortunately, NCL still ends up requiring less power (even with the increase in transitors or wiring -- much less in fact, like 1/4th!) due to its asynchronous characteristics. This is the only area where alternative, single-rail asynchronous holds a slight advantage (again, NOT worth all the disadvantages).
    2. "Orphans" -- Not really an "negative" argument overall as "orphan checking" and redesign due to them is miniscule compared to the wasteful redesign and design failures that are occurring today with clocked boolean logic in timing verification. Even comparing NCL against over asynchronous logic designs, it's no contest, orphans will not cause you to do major redesigns, because orphans are a localized issue, not a whole chip timing issue. We're talking more than one order of magnitude in time to post-design verify (assume the clocked version doesn't have to be chucked and fully redesigned ;-).

    -- Bryan "TheBS" Smith

  • First off, you are comparing two entirely different markets. ARM is NOT designed to run in an end-user, general-purpose desktop.

    Secondly, StrongARM did go the same as Alpha ... to Intel in the cross-license and fab buy-out! Every thing about the main, single reason Intel made the deal points sole at ... yes, StrongARM (of course, the nice side-effect was the dropping of the lawsuit -- and Digital wanted to dump its fabs anyway). The damn thing was eating everything up, MIPS, Hitachi, etc... and Intel's own i960 really needed a good replacement.

    The smartest thing Intel has done in a long time (at least technically ;-) was to buy StrongArm from Digital. Man is StrongArm just gaining market and mindshare or what?!?!?!

    But ARM itself (non-StrongARM) is far from dead. It's used in numerous products you use, just like MIPS.

    -- Bryan "TheBS" Smith

  • Secondly, I assume you understand the purpose of the "acknoledgement", which is essentially the "hey, I'm with the previous result, I'm ready for the next set of inputs"? The acknoledgement along with the normal properties of CMOS prevent any "race condition" from occurring (I assume that is your fear?).

    My fear isn't a race condition; it's a spurious signal emitted from a previous output stage causing processing to begin before it should in the following stage, with invalid data. Spurious signals like this occur all of the time, and are called "glitches"; they result when multiple paths through a logic block have different lengths. The canonical solution is to ignore all outputs until enough time has passed for them to stabilize. Glitches can also be minimized by adding redundant logic terms.

    Again, I'm not exactly following you here, but remember, we're not using traditional NAND, etc... gates in CMOS, but NCL gates (e.g., 3 of 5) and they are designed specifically for NCL and the acknowledgement flow.

    However, your NCL gates are still composed of transistors set up using CMOS logic rules (or any of a variety of dynamic schemes that accomplish the same thing). This winds up giving effects similar to those you would see with standard boolean logic circuits. As far as I can tell from the documentation, in actual implementation NCL isn't so much a departure from boolean logic as a layer of meta-logic on top of it that makes it self-clocking. The actual physical signal encoding on individual lines is boolean (the lines are just grouped in interesting ways).

    Thus, while the gates are self-clocked, they seem to be as vulnerable to glitching as any other combinational logic blocks.

    Information regarding "orphans" noted. It's interesting, but doesn't relate to my question.

    Again, I do *NOT* speak for Theseus Logic and there are much better individuals here who can clear up any questions. Feel free to fire off some questions to the address(es) on the web site.

    Noted; thanks for posting the link, btw. This is a very interesting approach to asynchronus circuitry.
  • PalmOS is also moving to the ARM processor.

    This is interesting because a cellular phone takes so much power for the RF transmission that the CPU consumption is relatively negligible. I don't know about you, but I really like the fact that a Palm runs forever on a pair of AAAs. I don't like the rechargeable Palm V, Palm IIIc, PocketPCs, etc.

    ----
  • Check out a Psion device then - they're even better at it, with MUCH more functionality than a Palm, and with a 32 bit OS ...

    I'd suggest a Psion Revo! Not much larger than a Palm V - and with a keyboard.

  • > Still, long live Arthur!

    Ah! This brings back memories.... none too pleasant ones. The days when GUIs were written in BASIC will not be missed.
    Of course if you're hankering for that Archimedes feeling there's a pretty decent emulator up at this guy's [geocities.com] page.
    Most games and demos run, but sadly there's no sound yet.
  • Hahahahaha

    Check your facts. ARM chips are used in many mobile devices, including the entire Psion family of organizers. They are also in the Cobalt Qube and Raq, the and are about to become the CPU in all the new devices from Palm. Calling them "rusty old chips" betrays your ignorance of technology: the ARM family of processors is one of the most rich, varied and technologically-advanced around.
  • The Alphas were not "head and shoulders above the 386." When they were first introduced, they were sucking up way, way more power and requiring much more cooling than an average Intel chip. Faster, yes, but at a price. Alphas were targeted at the "performance at all costs" CPU market, not something for the average desktop or laptop.
  • by BrianW ( 180468 ) on Monday May 15, 2000 @07:17AM (#1072319)
    I saw a demo of the Amulet a couple of years ago, when I was at Manchester University. They'd wired up one to a variable-voltage power supply, and a speaker.

    By putting it into a loop where it powered the speaker every couple of cycles, it generated a tone. By adjusting the voltage of the power supply, it was possible to make the tone higher or lower, as it wa having a direct effect on the running speed of the processor.

    Also, when put into a 'halt' loop, it would power down until interrupted. An ammeter connected in series with it showed that it was using almost literally no power.
  • You could still use BogoMIPS, everybody's favorite!
  • Our main project for the semester was to build a behavioral and structural model of a pipelined ARM7 processor.

    That does sound kind of harsh, but then I'd hate even more to have to do it for any other kind of modern chip architecture.

    The ARM instruction set is pretty clean, and dead dead easy to program even large projects in. Mind you, some of the newer ARMv4, Thumb instructions must be pretty hairy from an implementation POV, especially keeping backwards-compatibility with 26-bit addressing.

    Hang on, what's this story doing on /., anyways? The Amulet project has been going a long, long time and achieved ARM9-level performance some time ago, IIRC. Asynchronous chips are interesting but the power of mainstream (particularly x86) processors has kept increasing at such a rate no-one has yet needed to make the huge change of design strategy. I don't expect to see async chips in the mainstream until Moore's law is well and truly broken.


    --
    This comment was brought to you by And Clover.
  • by Junks Jerzey ( 54586 ) on Monday May 15, 2000 @07:20AM (#1072322)
    Maybe you haven't been exposed to enough processor archictectures? The ARM chips have the cleanest instruction set and overall archictecture that I've seen, and that includes lots of hands-on experience with the PowerPC, x86, SHx, and MIPS chips. The ARM designers had some very good ideas for keeping instructions simple while getting a lot done and they had a novel way of avoiding the usual branch prediction troubles. Very slick.
  • by BitMan ( 15055 ) on Monday May 15, 2000 @08:52AM (#1072323)

    Amulet's lead, Steve Furber (who also designed the original ARM), wrote a recent editorial coverstory called "Kicking out the Clock" [isdmag.com] in the May 2000 edition of Integrated System Design (ISD) magazine [isdmag.com].

    In the article, he used an example of a "dual-rail" logic (as opposed to "single-rail" found in most boolean-designs) call Null Convention Logic (NCL) from Theseus Logic [theseus.com]. Theseus' NCL approach not only goes a long way to not only solving the power and noise problems (like most asynchronous), but also the greater problem of design reuse (a problem with both async and, especially, synchronous) -- the later is something Furber was quoted on in a past EE Times [eetimes.com] article (cannot seem to find it on-line anymore?).

    Timing verification is becoming increasingly difficult in IC design, adding rediculous ammounts of extra effort and, in some cases, complete design failures (e.g., AMD, IBM and Intel have all had timing-related design failures). Clocks may soon disappear in favor of async designs, especially those like Theseus Logic's nearly-100% delay INsensitive NCL technology. NCL's delay INsensitive nature comes from the fact that it is NOT boolean logic based, but a new method that breaks the traditional foundation of what boolean logic was design for, mathematicians, not computers.

    In addition to an "operand" and an "operator," as with traditional, human-based math, computers require a third "control" line. In synch/boolean, this is the clock. With the limitations of the speed of light, it is IMPOSSIBLE for 10M+ transistor ICs on one section of the chip to be timed synchronous with another. As such, most modern ICs have localized clocks, which further adds to design complexity.

    NCL removes the clock as the control (as with most async) *BUT* it places the control back in the data flow lines themselves! NCL is a 3-state logic of "true" and "false", plus the control which is derived from NCL math to be "null" (no data). This representation is 2NCL in NCL math (see Theseus' site [theseus.com] for more details on NCL including 4NCL and 3NCL, the later being used with most off-the-shelf tools and optimizers). In 2NCL, the lines (again, "dual-rail") puts the false value (0) on one line and true (1) on the other line *IF* voltage is present, otherwise, no voltage (or low) results in the state of "null" (again, no data). Acknoledgements are used to maintain a delay INsensitive combinational logic circuit, including the fact that NCL can be place alonside synch/boolean and maintain 100% data flow and integrity (again, totally delay INsensitive). So instead of data having to "wait" on a clock to move forward, data moves forward when it arrives! This further increases performance!

    Although Theseus' NCL technology is NOT boolean based, it works with off-the-shelf synch/boolean IC design tools (unlike attempts like Cogency's), it is still CMOS-based, and it not too difficult for an engineer to learn coming from the synch/boolean world.

    [Bias: I am an employee of Theseus Logic and know Mr. Furber, the Amulet lead. I am NOT an engineering lead, just a regular engineer (who seconds as the sysadmin ;-).]

    -- Bryan "TheBS" Smith

  • by beebware ( 149208 ) on Monday May 15, 2000 @07:20AM (#1072324) Homepage
    The A(R)mulet has been in development for a few years now (as readers of Acorn User [acornuser.com] would be aware).

    Its processor core is based on the ARM [arm.com]9 series, but since it is asynchronous (ie it hasn't got 'clock cycles' like normal synchronous processors) it should go very very fast (simple processes will rush through without being delayed by slight harder/longer processes).

    While I haven't had a chance to get my hands on one of these yet, the spec's I've seen (I can't remember if they are public or not) look good and the chips should be compatible with current ARM chips - as used in my RISC PC [castle.org.uk] (BTW a RISC PC is used to run the 'Who Wants to Be A Millionare' shows!).

    It is difficult to place an exact Mhz rating on these chips due to the way they work, but the current version (AMULET3i) runs at roughly 120Mhz - but they have started from the basics, without using much 'proven technology', so expect development to last a few more years - but the 120Mhz version should be out next month/late this month.


    Richy C. [beebware.com]
    --
  • Nah, ARM processors are used all over in embedded devices. This isn't just PDAs and palmtops, but all those other electronic devices that have some smarts and don't use `70s derived OSes (*nix, MSDOS, WinX (not using those isn't so bad - do you really want to program your micorwave oven from a command line or a GUI? at 5AM Monday morning? after a late night ?)
  • <a href="http://www.theseus.com/">These guys</a> have an interesting way to deal with it.

    They describe a way to build asynchronous ciruits (using the same design even for different fabs) that run as quickly as the gate/wire delays allow. It takes more surface elements to build the same logic, but once you take removal of the clock lines into consideration, things look a lot closer.

    IMHO, the real beauty of async designs is that your bit shifter op can take 1 nanosecond, your add op can take 3 nanoseconds, and your subtract op can take 4 nanoseconds, rather than having them each take a 4 nanosecond cycle. It really disturbs me to see designs where a multiplication (inherently slower by a minimum factor of lg(bits)) takes the same amount of time as an addition.
  • When I was in grad school at Caltech, I worked on software tools in Alain Martin's asynchronous microprocessor group. The group had actually developed and fabricated a processor before I arrived. To quote their web page (www.cs.caltech.edu/~alains/previous/uP.html):

    "Above is the layout of the 1.6 micron version of the Caltech Asynchronous Microprocessor, fabricated in 1989. It is a 16-bit RISC machine with 16 general-purpose registers. Its peak performance is 5 MIPS at 2V drawing 5.2mA of current, 18 MIPS at 5V drawing 45mA, and 26 MIPS at 10V drawing 105mA. The chip was
    designed by Professor Alain Martin and his group at Caltech. You can read about the chip in Caltech CS Tech Reports CS-TR-89-02 and CS-TR-89-07."

    Keep in mind that this is a 1.6 micron process. The chip was later fabricated in gallium arsenide with very few design changes. This is because the chip, being completely data driven, will perform computation as fast as the underlying device physics will allow. There are no "timing issues" as these must all be worked out in high-level design (or the chip won't function at all... race conditions in hardware really suck).

    Of course, the neatest geek feature is to pour liquid nitrogen on the chip and watch the instruction rate climb.

    Since I left the group, they have also fabricated an asynchronous "digital filter" or simple DSP. Details at http://www.cs.caltech.edu/~lines/filter/filter.htm l

    The downside of all this stuff is that the design process is very formalized and arduous. Our group designed by writing parallel programs in a special chip-design notation, then transforming the program by hand and by software into a VLSI gate layout. It was a completely different synthesis method than most designers are used to, so it requires completely new software and designer training to be productive. It's sad, really, because the output chips are so very very nifty.

    James Cook
    ex-"cook@vlsi.caltech.edu"
    now-"cook@alumni.caltech.edu"
  • This Asyncronous ARM has been around for awhile, and it has yet to hit the shelves.

    I recall reading announcements for it back in the mid 90s (I believe it was in Byte, or something silly like that), and despite my frantic attempts to aquire small quantities, I was not successful. It seems that, based on what they say on their web sites, they have no intention of manufacturing it unless you are a large corporation with a specific need.

    Bottom line: Who cares. It isnt available to the average silicon hacker.
  • (1, Troll)? Sorry? Anyway, I think porting Linux to the Amulet sounds fun...

    Disclaimer: Yet Another Manchester University Student...

    --

  • ...how many people actualy know about the company that started up ARM in the first place. They made what are in my opinion the best desktops around. Acorn computers may not exist anymore but Castle technologies has taken up the task of developing them. RISC OS was doing things in 1986 that windows has only implemented in Windows 95. RISC OS, now owned and developed by RISC OS Ltd.(I think), seems to be going from strength to strength! People might be interested in Acorn computers which would have taken over the world if people at acorn hadn't decided that there was no need to advertise their new products because they were so good, they would advertise themselves.
    BTW, I recommend you read Steve Furbers book on VLSI design (I can't remember the name). Very informative and interesting, using ARM chips as examples and the such.
  • (ngh, pressing enter accidentally posted article)

    Compare the ARM's nice simple orthogonal instruction set with the crawling horror that is to be Merced. 128 general-purpose registers, 128 floating-point, 64 predicates, 8 branch registers. Background register loads/spills when you do a function call. Multiple instructions issuing at once. No page faults- until you explicity "commit" a bunch of memory accesses. Rollback.

    ftp://download.intel.nl/design/ia-64/downloads/2 4535801.pdf

    (sadly not the 200 page full description, I can't find that)
  • I'd figure eCos [cygnus.com] would be ported before Linux. Amulet is not exactly something that you would use in a traditional thin-client/server system, but more, ultra-low-power/embedded systems.

    eCos is the Linux complement in small-footprint, real-time space. Blows Windows CE out of the water, and Cygnus/RedHat are working hard to make EL/IX an API for cross-Linux/eCos development. An excellent model IMHO. Linux is great, but it can't run in the smallest of footprints.

    -- Bryan "TheBS" Smith

  • by Anonymous Coward
    i use my arm quite a bit when i pour a hot bowl of grits down my pants, and it doesn't consume a lot of power, though it helps if i eat a chunky bar first. thank you.
  • The AMULET itself has been around for *years*. It's led by , IIRC, Steve Furber, one of the original designers of the ARM when it was part of Acorn
  • by Anonymous Coward
    I was one of members who made AMULET3. Here are myth and truth about Asynchronous design. (as far as I know) 1. Asychnoronous design use delay cells: not necessarily. Amulet 3 uses delay cells because of commercial consideration in terms of the chip size and the usage of synchronous CAD tools. 2. Asynchronous design does not have CAD tools: false argument. There are several tools available. Even an industry level tool is being used by Philips. 3. There is no commercial asynchronous chip: wrong. Philips made several chips and one of them is being used for their product. 4. Asynchronous design is not safe: wrong. This problem is solved in terms of CAD alogorithm. Now totally depending on your brain. Using Asynchronous design is mainaly engineering trade off in my opinion. There are advantages and disadvantages. However, mostly depending on your brain.
  • Makes sense to me, Redundancy being the important part. However because it makes sense the military probably won't use such an idea.
  • This could mean just as much a new wave as the Transmeta Crusoe - meaning portable devices will become even better! Oh boy, it's great to be a nerd nowadays.
  • This Slashdot story [slashdot.org] mentioned the US Government's plan to create a corp of 'connected soldiers' using palmtops and GPS equipment (among other things).

    I hope whomever is in charge of this project becomes aware of this technology - as other posters on the aforementioned story noted, EMF radiation could make these JEDI's a glowing target. Lower EMF means fewer KIA (Killed In Action, not the crappy car company) JEDI's.

    Besides, the low power consumption is something that nearly every PDA user can appreciate: and in field-critical situations, could be another lifesaver.

    --

  • Hm. Maybe now I can chat on a cellphone without worrying deep down that I'm going to get brain cancer.... And, thus, give a nervous mother (mine) one less thing to harangue me about during that same chat.
  • I'm still waiting for some consumer devices that will actually make use of these sorts of chips. Like, I'd love to see a laptop that will run for 10 hours and is cheap - if they were only 1000 bux or so, I'd pick one up, but they're still really expensive... it is going to be wonderful when these things get to market :)
  • The EMF from the CPU may be lower, but you still have all that Microwave radiation being bounced off your head. Time to invest in a lead balaclava if you want to stop your mother worrying ;)
  • That would depend on a couple of things:

    1. The power requirements are significantly lower for the ARM based CPU than they are for Transmeta based CPU's.

    2. The advantages of the Transmeta (being x86/PowerPC compatible) do not outway the advantages of using the ARM.

    In fact, if anyone can give some examples where the users benefit is greater using an ARM solution than an x86 or even PowerPC based solution, i'd love to know what they are. ARM's are cool CPU's and all, but hardly predominant in the current market place.
  • Porting Linux to Amulet3 [man.ac.uk].

    Impress the dept. and gain the respect of /.! Certainly much more fun than the Java web-trawler I did this year...

  • Ever seen the Transmeta webpad? OOoooOOhhh I want one of those. First company to come out with one that runs decent gets my business.
    Has anyone ported linux to ARM? That would be cool to have linux on a low power, portable device.
  • and get lead poisoning....
  • by vanth ( 34291 ) on Monday May 15, 2000 @06:58AM (#1072346)
    Self timed CPU's have been on the design stages for quite some time now, and some groups have built prototypes.

    A group at CalTech built a 16bit RISC style self-timed CPU some years back (early 90's I believe) on a 1.5 micron process (I believe, somebody please correct me if I am wrong)

    One of the cool features is that as you coll the cpu, it literarly becomes faster!.
    The basic design of self timed CPU's has been around for probably more than 20 years.
    S Unger's Asynchronous Sequential Switching Circuits, Krieger, Malabar, FL, 1983 is probably one of the books one encounters when taking a course in this subject. (the book is pretty rough going though - )

  • by mindstrm ( 20013 ) on Monday May 15, 2000 @07:23AM (#1072347)
    ARM has lots of market share. LOTS.
    You are assuming that the main market for this type of chip is the home PC. This is absolutely not the case.
  • In light of the recent article about Palm switching to the ARM family of processors, one can oly hope that they would consider a low-power alternative like this.

  • DEC failed because they over-engineered. The Alphas were head and shoulders above i386, but nobody used them. Why? Because DEC kept "improving" them. They also acted like "if you build it they will come".

    over engineered? that's a misuse of the term. also, i think you don't have any idea what you're talking about. if alpha is such a failure, why didn't compaq can it when they bought dec?

    just cuz all you've ever seen are architecturally obsolete x86s, doesn't mean that's what the whole world uses. ever hear of a little thing called vms? also, alphas are big in sciencetific areas. So intead of trying to perfect the ARM--why not work on getting some market share first?

    better products increase market share.

  • Still, long live Arthur!

    Arthur Lives! [demon.co.uk]


    --
    This comment was brought to you by And Clover.
  • The Cobalt Qube and RaQ 2 products use the SGI MIPS processor, not ARM. (And the RaQ 3 uses an "Intel compatible" processor, according to the data sheets found here. [cobalt.com])

    I've seen information indicating expressions of interest in a port of PalmOS to StrongARM; I'll believe in there being product when I actually see it on store shelves.

  • Define over-engineered for me then.

    Alpha was doing poorly because of poor marketing. Compaq didn't can it because they want to remedy that.

    I have seen more than x86's. I'm not saying Alphas suck--I'm saying they aren't popular.

    "better products increase market share."

    So conversely, poorer products decrease market share? I guess that explains why Microsoft is doing so poorly...
    --
    Have Exchange users? Want to run Linux? Can't afford OpenMail?
  • I'm a Cambridge student from Manchester.

    Cambridge teaches us "EDSAC was first coz Baby was just a device to test the memory tubes."

    However, I've also heard Manchester's side of the story (having worked in the CS Dept. one summer) and nyaaaaaaaaah to Cambridge - I think Manchester has it.

    Maz
    -- not daring to walk on the streets for the next few days...

  • And they still have parts of it lying around the hallways. They used a CRT as the memory core, where a pixel / memory cell was lit / unlit.

    A true sign of the global-ness of the web. I grew up in Manchester, went to the University of Manchester, married a US Citizen, I now live in the US, go to /. and I end up reading about projects in my old University.

    It is a small world...

  • I too have seen the AMULET processor and in fact have a final year degree exam on the very fundamentals of the processor. The new version of the Amulet (3i) is currently awaiting fabrication from what I am aware. I have lectures from a number of the design team, including Steve Furber, and have seen working examples of the processor. I believe there is also an ARM9 which is available with the asynchronous multiplyer from the AMULET processor. This allows the processor to be optimised and use less power. There are a number of aims surrounding the AMULET, mainly in low power, low EMC and actually proving that you can use asynchronous technology in real worlds applications. For anyone who doubts the use of this technology it has incredible potential. I think its best feature is that ability to enter and idle state where no power is consumed. Anyone wanting to use one of these processors should do a degree at Manchester!
  • that looks kinda cool actually. can it run linux? :)
  • Great! I have an aposite and timely post at last!

    five minutes ago, I clicked the on-line submission button indicating that I would NOT choose (for my 3rd year project at manchester university):

    Porting Linux to Amulet3
    Project: 704 Supervisor: DAE Categories: SH=C

    Amulet3 is the latest asynchronous version of the ARM microprocessor. Last year, a student designed a demonstrator board based around Amulet3 + an on-board Xilinx chip. This project is to port a cut-down version of linux to the board. Several ports of linux to similar systems exist. See DAE for Details.

    Ah well...
  • >NCL is a 3-state logic of "true" and "false", plus the control which is derived from NCL math to be "null" (no data).

    Interesting. This sounds a lot like the "data-driven" graphical language LabVIEW [ni.com], which I spent about three years programming in.

    In LabVIEW, the operators only execute when data is present. "Data present" is a condition inherent in the incoming data stream, and, as you said does not require an extra control line to indicate.

    The operators themselves can be programs, being activated only when data appears. So, the language is extensible by creating custom "instruments" which are activated by the presence of data.

    I've always said implementing LabVIEW in hardware would be a kick!

    --The QuantumHack

    (no relationship to National Instruments except a satisfied customer)

  • IMHO, the real beauty of async designs is that your bit shifter op can take 1 nanosecond, your add op can take 3 nanoseconds, and your subtract op can take 4 nanoseconds, rather than having them each take a 4 nanosecond cycle. It really disturbs me to see designs where a multiplication (inherently slower by a minimum factor of lg(bits)) takes the same amount of time as an addition.

    Properly-implemented asychronous circuitry ALSO has the quality of modularity - you can hook each individual module together w/o regard for inter-module timing - and if you come up with a better implementation for one of the modules, you can swap it in w/o adjusting timing in any of the other modules - just as long as the interfaces are well-defined and asynchronous.

    Most engineers would probably agree that this is a good thing.

  • The ideas presented in the papers on the Theseus Logic site are interesting. However, the True/False/Null logic scheme defined seems to be vulnerable to glitches in gate inputs. A brief transition to a valid state on all inputs as the previous stage's logic settled would be interpreted as a new input datum by the gate in question, possibly resulting in unwanted output being produced. In other words, using T/F/N logic seems to place stricter timing requirements on input signals than clocked logic with edge-triggered registers.

    Is this correct, or am I missing something? I realize that glitching can be reduced by careful logic design, but this seems to be an issue that is addressed neither in your post nor in the papers on the Theseus site.
  • Yes, all very interesting, but this is hardly a new project! I have a recollection of reading about the Amulet project back in the heady days of the ARM 3. I think it might have even had a fairly large lump of magazine dedicated to it when it was still called Micro User! But ancient history aside, it's good that people are still pushing ARM processors even though x86 seems to have all but won the war. Even Intel seem to think so as there's a 400Mhz StrongARM due real soon now, I hear.

    Still, long live Arthur!
  • ARM has some market share - the ARM chip is used in all sorts of small low-power devices. The most popular of which is probably the Psion range.
  • by nweaver ( 113078 ) on Monday May 15, 2000 @07:01AM (#1072363) Homepage

    Asynchronous logic appears, every once in a while, as a "new" hot topic within VLSI and computer architecture research. Yet it has consistantly failed to offer the benefits it promises. Why?

    It is true that clocks in synchronous design consume a great deal of power, but when low power designs are required, it is well understood how to gate and conditionalize clocks so they don't use power when the associated logic is not operating.

    And asynchronous design has to be much more conservative than a synchronous design. With a synchronous design, a chip can be designed to operate at the maximum frequency, and then binned down if it fails to meet its target.

    However, an asynchronous design requires that the delay lines be very conservatively designed, as if the delay line was a little faster, and the logic a little slower, on the worst case critical path, the chip would fail completly, which results in a slower processor by design.

    Finally, the design methodology for building pipelined, synchronous devices is well understood, as a purely digital system. While asynchronous logic relies on building delay lines, essentially analog operations, which is a great disadvantage.

  • Has anyone ported linux to ARM?

    Heh, what kind of a question is that?

    http://www.arm.uk.linux.org/ [linux.org]

    Also, the uh.. is it Corel now? Or Compaq? Whoever the hell owns the netwinder, it's an ARM box, and tiny and nice.

    Also, in response to the other guy, asking why someone would want ARM vs Transmeta, the answer is that code can be natively compiled for ARM, but the xmeta chip will only do translation. I dunno if there's any actual difference in that, speedwise, since they're both pokey little chips.

    I do think that you can do ASM for ARM, and not for Xmeta. Could be wrong, tho.

    --
    blue
  • I just finished a computer architecture course here at college (in fact, I'm just out of the final exam). Our main project for the semester was to build a behavioral and structural model of a pipelined ARM7 processor.

    At this point in my life, there is not much I hate more than the ARM architecture. Well, maybe complexity theory... but that final doesn't begin for another hour, so I'm okay with it, I guess...
  • And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines).

    They aren't; what asynchronus logic in an IC context deals with is reducing power consumption by not clocking all parts of the chip all of the time.

    In a synchronus microprocessor, the system clock is distributed to all functional units, and the functional units even when not in use usually wind up having some kind of internal state change every clock cycle. This results in a lot of heat production, because every time the state of a bit in a register or of a bus line changes, heat is dissipated (by nature of the way the parisitic capacitances are charged and discharged).

    In a truly asynchronus microprocessor, there is no master system clock distributed to the functional units of the chip. Instead, actions in a functional unit take place when input data changes (i.e. new input data arrives). This results in only the state of units being used changing, which in turn means much less power dissipation if only one or two units is being used at a given time.

    In practice, real systems don't fit into either category. Fully synchronus circuits burn a lot of power, but truly asynchronus circuits are difficult to design and are very sensitive to certain types of process variation. An often-used compromise is to use gated clocks - A synchronus clock is propagated, but only to the functional units that are being used. This principle is extended within the functional units themselves; internal clocks and data are propagated only when they need to be for the operation being performed. This results in a circuit that is much easier to design and fabricate than a truly asynchronus circuit, and that is almost as good from a power consumption point of view.

    I hope this clarifies what the debate over asynchronus computing is about.
  • by Anonymous Coward
    ARM has the largest marketshare of any 32 bit embedded processor. It overtook the 68K last year, when around 150-180 million ARM chips were sold.
  • by Troed ( 102527 ) on Monday May 15, 2000 @07:05AM (#1072368) Homepage Journal
    ARM is what Epoc [symbian.com], Symbian [symbian.com]'s OS runs on. Considering that Ericsson, Nokia, Motorola, Psion and Matsushita (Panasonic) owns Symbian and will use its operating system in palmtop computers with built in phones [ericsson.com], handhelds [symbian.com] and smartphones [ericsson.se] the future looks extremely bright!

    Oh, forgot. Sony is also an Epoc licensee - and they make cool devices!

    Go ARM!

  • Don't panic, all ARM processors are low-power in 'PC/handheld' terms, this just happens to be even lower again.

    A lot of people seem to think the Crusoe is fabulously conservative in terms of power consumption, but the ARM family is even less power hungry, so in PDA/mobile device terms the ARM has a definite advantage.

  • Theseus Logic [theseus.com] have some interesting papers on asynchronous logic design on their website, not directly connected to the story, but they're interesting nonetheless.
  • There's also a pretty decent Archimedes emulator called Arcem by David Alan Gilbert, who coincidentally used to work on Amulet a few years ago. Unfortunately his site is down ATM. It's based on the GPL'ed Armulator code released by ARM Ltd. (Why is it ARM Ltd when they're a publicly traded company? )
  • One of the purposes of clocking is to allow data to "resolve". That is, the output of a gate will change to its new state within so many nanoseconds, and this had better be less than the time it takes for the next clock edge to arrive. In this sense, much of the time used in computation is wasted, because the design was based on the worst-case published specs of the manufacturer. In reality, the gate may only take 6ns to change to a new state, but the design spec is 25ns so the minimum clock period is 25ns (roughly).

    In a self-timed circuit, the instant the gate changes, the next phase of the circuit is ready to go so there is no time "wasted" (19ns in the above example) waiting for the next clock.

    This concept of uncertainty (between how much time the gate really takes to propagate and what the published maximum is) is also the reason why a small number of asynchronous lines can be faster than more synchronous lines. The more lines you have, the higher the possibility that there will be "skew" (i.e., different propagation delays) through them, hence you have to wait longer for all of them to come to the same state. The fewer the lines, the lower the skew, the less you have to wait (there is a reason why USB is a serial bus, not a parallel one).

    There's some interesting reading on this topic at www.theseus.com [theseus.com]. (I have no connection to them)

  • I think your last point is the most important one - we have great (and expensive) CAD tools to build synchronous logic - but none for asynchronous stuff

    (having said this I have to spend today building something that's a perfect fit for an async solution - if I had a tool that would do a safe glitch free synthesis - instead I have to waste 100s of gates resolving metastability etc etc to get the signals I need near a clock so I can live in a timing methadology I know works)


  • As most of you probably won't realise this, I thought It'd mention it, even though it's kind of semi-off-topic.

    The University of Manchester is where the world's first stored-program computer was built [man.ac.uk].

    D.

  • I has been done, apparently, but does not appear to be folded into Debian as stated back in 1998:
    http://www.debian.org/News/1998/19980826b [debian.org]

    --

  • Yep. ARMs and StrongARMs are selling well, in PDAs (Palm may migrate to it soon) and network computerish type things. Very good for embedded devices.

    I once heard about an ARM chip running off the waste heat of a Pentium, almost as fast. Sod the heatsink... shove a co-processor on there! =)
  • http://develo per.intel.com/design/strong/quicklist/eval-plat/sa -110.htm [intel.com]

    This is an ARM chip on a PCI card. You can also get it with a little backplane and build your own linux ARM box. Fun fun.

    I need more me's, or more time in the day. So many fun things to hack, so little time.

    --
    blue
  • In fact, if anyone can give some examples where the users benefit is greater using an ARM solution than an x86 or even PowerPC based solution, i'd love to know what they are. ARM's are cool CPU's and all, but hardly predominant in the current market place.

    Get out of your Everything is a PC or Mac box and you will see why you are wrong.

    At work we have make device with (up to) 80 SA110s. It is designed for a job which is easier to do in parrell then serial. We need to do real time tasks, and it is easier to do 64 real time tasks on 64 different processors then to figgure out the timing issues on a single [faster] processor, even if in theory the single processor would have as much power as all combined. With that many processors heat is an issue. Hardware could not have made any other processor work. (Non-arm that is) Also, since our job parrelizes so well it was easier for them to design the hardware with 64 processors then to run all the external ports to one chip.

    This is not the only example of where the strongarm is good. I've seen microwaves for campers. They run off batteries. There is nothing that can be done to get 750 watts of microwave with less then 750 watts of power, but the less power you need over that the better. Not to mention power consumed at when not running. Here the asyncronious arm shines. They don't need much of a processor, but it is easier to compare your sensors with tables in software then hardware. (I've seen microsaves that smell when the food is done, it is one sensor, and then 100 different look up tables for each type of food)

    I have intintionally covered non-computing devices. However if I could buy a linux laptop with a reasonably fast processor with ultra-low power consumption I would. I currently am not using any significant processing power, and often I offload my hard tasks the the Sparc down the hall. Give me a linux laptop that can supply bursts of power when I need it and I'll be happy. (Granted my boss wouldn't be because he needs windows programs where I compile from source anyway for all my programs)

  • Not having been up on the topic, I just never thought that integrated circuit logic was an alternative. And I'm still trying to figure out why asynchronous smaller bandwidth (number of lines) buses are faster than synchronous parallel (more data lines). But I guess the speed has at least something to do with the noise tolerance. Anyway, I'm reading from one of the links [man.ac.uk] followed from the site that seems to be a pretty good explanation/history of the asynchronous logic.
  • Yup - EMF (and battery drain) from a cell phone or wireless modem is mostly from the transmitter, not from the rest of the electronics. That's not likely to get much less as the background noise at the cell site tower has nothing to do with support electronics in the portable devices.

    If you want wireless you're going to get microwaved.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...