No, you're all reading this wrong. Look at the bottom of the patent, with the code samples. Notice the use of the word commit. That's basically an exception barrier. At that point, the stores actually occur. trapbarriers are used commonly on the Alpha. A number of instructions are executed, then when a trapb happens, any exception which occured in that block of code fires. This is a hardware method of rolling back stores to get the exact state at the time of exception without tracking state on an instruction by instruction basis. Allows for pipelining, scheduling, etc. Read this [slashdot.org] comment for more from me!
This patent exists because they can't determine ahead of time if there will be an error. This patent, inconjunction with one of their other patents, provides a method for them to muddle on in the usual case of no error and still have a means of rolling back in the less usual case of an error.
This particular patent convers stores that are speculatively executed, but may need to be killed because an instruction that occurred logically before them in the original code stream faulted.
Pick up a copy of the Alpha Architecture Reference Manual and look up trap shadows. Rollback can be very useful when pipelining instructions because there is no guarantee of precisely where a fault issued from. Precise exception trapping is slower and more complicated. Read this [slashdot.org] for more!
I'm thinking of a rewind button for my PC. I can execute some application and if I screw up, I can "rewind" back to where I was before. This sounds kind of stupid, but I can see consumer devices eating this up. It would also make it easier to replay that last death in Quake without having to go back to your last saved game.
you start with instruction set specific code and meta-compile it into custom hardware that is dynamically reconfigurable, resulting in very fast execution on hardware that is essentially optimized for each particular application.
Sure, and I use that state-of-the-art parallelizing compiler on my 8-fold XEON SMP system to get a blazing fast application.. oh, sorry, there is no such compiler that turns a random program efficiently into a parallelized one? Oops.
So please tell me why the engineers at Intel or AMD don't fire on their logic optimizers to implement the processor instructions in such an optimal manner?
Its actually pretty simple. Essentially a PPro is emulating an x86, right? A PPro 200 is faster than a Pentium 200 and thus you get faster emulation than native!
They were awarded a patent for a universal hardware based processor emulator.
Generally speaking of course...this is similar to how AMD's K6 family of processors work..by converting X86 instructions to faster RISC instructions...but thiers operates on a broader scale.
When writing a patent I believe you are required to phrase the abstract as one sentence. I suppose it was originally intended to show the purpose of a patent as shortly and concisely as possible.
That clearly is not happening. But like so many other things to do with the patent office, this outdated requirement has been preserved.
If Transmeta is creating the next big thing in processors, I am almost positive they will have someone else manufacturing them. It is not uncommon for semiconductor companies to be fabless, and I am sure we would have heard it if Transmeta was purchasing/building a fab.
Yes, but what about the permanent bit? some other comment (sorry no link) thinks that it means the cache of another processor. Use one to translate, then ship it somewhere else for execution. But that would imply a multiprocessor while I think it is a single CPU (possibbly multiple processing units).
But why have any reference to permanent storage if the data just gets shipped back and forth between caches?
"circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor"
Here is how it should look:
"ciruitry for 'permanently storing memory stores' temporarily stored when...." It might be easier if you abbreviate it..
"ciruitry for PSMS's temporarily stored when.."
Make sense now? =] The whole dern document is like that. It's sick..I had to read certain sentences multiple times.. =] It's like reading a book with no periods, no capitals, no nothing. Insane. But yeah...Still sounds cool.. =]
Granted, its a huge very important idea. But we already have binary compatibility between x86 and x86 running 2 different OSes(which is what WINE does if Im not mistaken), and it's getting better and better every day. But isn't the binary compatibilty just half of the battle? This will be a kick ass chip, but it's not going to let a fortune 500 company run werd on linux still. Of course, this is my wild speculation. Could be totally wrong.
Ok, if TM is really inventing the Holy Grail, they're probably going to expect, um...high demand. So wouldn't they need to have either (1) built their own chip fab or (2) contracted with another chip fab? Seems like someone should be able to confirm (1) or (2), and if neither has occured, it's likely we won't see this chip for some time.
From the comments I've been reading it seems like the patent is for a processor that would translate instructions for other processors into its own instruction set, make sure the translated instructions would work, and if so run them.
Read the patent, not the comments. Many people seem to think the processor would translate instructions itself, perhaps because the patent goes on about
a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor
but the patent later indicates that "the processor" does that translation by running translation software:
Typically, the target application is being designed for some target computer other than the host machine on which the emulator is being run. The emulator
software analyzes the target instructions, translates those instructions into instructions which may be run on the host machine, and caches those host instructions so that they may be reused.
what would it do if the translated instuction would cause an error?
Not bother storing the results of that instruction.
Would the processor just not carry the translated instructions out?
...or, at least, make it look as if it didn't.
If so, that would seem to be quite a flaw.
Why? The error could trap, and the trap handler (or code it invokes) could do whatever is necessary to simulate the what the processor being emulated would do in that error situation (although the "exceptions" they talk about aren't necessarily errors - I scrolled past one example of "native-Transmeta" code, generated from x86 code, that assumed that the code doesn't make unaligned memory references that cross a page boundary; if that happened, "either hardware or software alignment fix up" would detect this, and perhaps generate more pessimistic and slower code and restart the emulation running the new code).
I think it said "we've got a cpu that pretends to be another cpu and it needs a place to store the instructions it's actually running to run the instructions it thinks it's running, and we're patenting how it does that." But I'm probably wrong.
So far this fits with the dynamically re-microprogrammable processor I speculated on [slashdot.org] last week. For "a first instruction set" in the patent, read "microcode", and the patent becomes more legible. In short, I think it's a combo of dynamically changeable microcode with JIT compilation of the interpreted instruction set, although there are a couple of different approaches to that.
Now, the specifics in this case seem to be a provision for cacheing the results of parallel instruction execution and then either voiding or writing that cache depending on whether the instructions cause an exception or not. That in itself is nothing new, but in combination with "just-in-time" compilation of, say, x86 code to Transmeta microcode it might be. Particularly if the Transmeta processor uses horizontal microprogramming (read, "very very very long instruction word") to speed up the processing. (Loosely speaking, with horizontal microcode, each bit in the very long instruction word (could be hundreds of bits) maps to a discrete piece of logic (gate, flip/flop, etc) in the processor. Given an appropriate processor design it might be possible to map several instructions in a more vertical set (x86, PPC, etc) to a single wide microinstruction, effectively executing all of those in parallel, but then you really need some way of flushing everything if it screws up. Which this patent provides.)
(It'll be interesting to see if the active microcode store is loadable from RAM (making it end-user microprogrammable) or just from a fixed set of microprograms on ROM (which may live on the CPU die)).
The error-prediction system looks like a hardware band-aid solution for "that other OS's" bad code.;) The possibility of instantaneous suspend/resume looks nice, though, and there might be interesting hardware tricks to play with if this were ever applied to anything resembling a "standard desktop".
Some of the cooler benefits of this new processor have gone unnoted by the otherwise snappy Slashdotters. A compiler that translates any and all instruction sets into its own native code provides an amazing means for JIT, open source, and all manner of programs written in their own customized instruction sets.
If the TransMeta ideal propagates widely, it will be a new world of software design. Instead of compiling for a processor, you only need to pre-compile into a pseudocode, provide a description of that pseudocode to TransMeta, and the processor takes care of final translation, and apparently a good deal of the debugging work too. As a development platform this sounds like a serious ideal. In a world of open source and platform independence, the TransMeta sounds like a real solution.
Yeah, I'm a Mac programmer. You got a problem with that?
If your doing ISA->ISA translation, rollback is very, very useful. Suppose we have an x86-style instruction like this:
MOV [EAX*4 + 4], EBX
This particular instruction does several things: It reads EAX, multiplies it by 4, adds 4 to it, and then stores EBX at the address genereated to the desired location in memory. This might break up into several RISC-like ops: (these are written in the more traditional RISC form OP src, src, dst for clarity)
SHL EAX, 2, tmp1
ADD tmp1, 4, tmp2
STORE EBX, *tmp2
It doesn't take a brain surgeon to see that these steps could overlap their execution with other instructions. For instance, the instruction that calculates EBX could overlap execution with the left-shift and the add. If the original instruction was in a tight loop, then it could even overlap with itself! Why is this important?
Say you have some code which is stepping through the array, and say that the array spans a page boundary. And, say that the second page isn't "paged in." When the loop hits the page boundary, a fault will occur. Because the stores are being spooled extremely rapidly, the loop may not be informed that a particular STORE faulted until several other stores were executed. All stores after the faulting one need to be killed since we need to process exceptions in a precise order. Here's where the rollback becomes handy: We merely discard the extra, incorrect stores, and roll back the processor state to be consistent with the emulated state of the machine at the time of the fault.
This is what most current x86 clones do when they translate x86 instructions into "RISCops" or whatever they decide to call them. I'm guessing Transmeta is aiming to do a similar sort of translation, only with a more configurable flavor.
Could be one hellatious gamer machine! This article made heavy use of the term prior art processor and "modern computer". Last time I checked modern computers ran under a thin layer of hydrogen? (one example from ABC news) Since the speed difference between a modern cryogenic computer and "prior art" computers is substantial it becomes easier to visualize needing to cache and distribute "prior art" computer instructions in this manner.
According to an old article I have misplaced, Transmeta once defined itself as "fabless" - They research, do all the logic design, but when it comes to manufacturing (if it ever comes - I certainly wish it does!), they will hire another company's facilities to do so.
Sounds like plans for a coprocessor that translates X86 instructions into some native instruction set to be executed a processor. In otherwords a translator for the x86 instruction set so that transmetas chip can pretend to be a pentium. The claim it will be faster though.
It is, therefore, an object of the present invention to provide a host processor with apparatus for enhancing the operation of a microprocessor which is less expensive than conventional state of the art microprocessors yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors. Whooh baby. Sounds cool.
Apparatus for use in a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor including circuitry for temporarily storing memory stores generated until a determination that a sequence of translated instructions will execute without exception or error on the host processor, circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor, and circuitry for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.
This is a device for assisting in processor emulation: I believe it will hold commands in memory until it knows that they will execute without error. Quite a good idea.
Simple, elegant, and not obvious. All the requirements for a good patent.
This is really the sort of thing that Windoze really needs: a 'this instruction would cause the program to do "bad stuff(TM)", so I won't allow it. It should stop a single process for taking whole systems down.
actually I think it's slightly different from what you describe (see my full explanation below) they do this so they can translate the code without having to worry about x86 exception semantics (like where the PC is or what values the various registers or flags have). And they assume that they don't take any excpetions - if they don't all is cool - they know the state at the end of the basic block (or whatever unit they are using for translation - hopefully they're doing better than simple basic blocks).
If they DO get an exception... then they use the described hardware to throw away the side effects of executing the code fragment, and interpret the x86 instructions from the start - doing all the proper instruction semantics.. when they get the exception again this time they know what the PC is and what all the flags and registers etc are
OK, these are just a few other bits of interest I picked out of the patent:
In a preferred embodiment of the invention, the morph host is a very long instruction word (VLIW) processor which is designed with a plurality of processing channels.
I'm not going to go into huge detail about VLIW machines (particularly since I don't know all that much about them:-). Suffice it to say that traditional VLIW CPUs fetch multiple instructions at once, and rely on the compiler to ensure that there are no dependencies between instructions in a fetch group (if the compiler can't find x number of independents, it will pad the holes with non-operations, or NOPs). Looking at Transmeta's patent, it appears that rather than a compiler doing this checking, their code-translation software will be doing it on the fly. RISC/CISC machines, on the other hand, typically do this checking in hardware. But Transmeta's reasoning seems to be that doing it in hardware adds complexity, hence lower clock rates, and also doesn't make multiple instruction sets very feasible.
Regarding the instruction translation and subsequent caching I mentioned in my previous post, a quote from the patent illuminates the matter a little more:
The code morphing software of the microprocessor...includes a translator portion which decodes the instructions of the target application, converts those target instructions to the primitive host instructions capable of execution by the morph host, optimizes the operations required by the target instructions, reorders and schedules the primitive instructions into VLIW instructions (a translation) for the morph host, and executes the host VLIW instructions.
When the particular target instruction sequence is next encountered in running the application, the host translation will then be found in the translation buffer and immediately executed without the necessity of translating, optimizing, reordering, or rescheduling. Using the advanced techniques described below, it has been estimated that the translation for a target instruction (once completely translated) will be found in the translation buffer all but once for each one million or so executions of the translation. Consequently, after a first translation, all of the steps required for translation such as decoding, fetching primitive instructions, optimizing the primitive instructions, rescheduling into a host translation, and storing in the translation buffer may be eliminated from the processing required. Since the processor for which the target instructions were written must decode, fetch, reorder, and reschedule each instruction each time the instruction is executed, this drastically reduces the work required for executing the target instructions and increases the speed of the microprocessor of the present invention.
Transmeta seems to have an excellent idea here. They're caching optimized translations of the incoming instructions, so rather than have to translate and optimize over and over each time you see that bit of code, you do it once and then just grab it from the cache. Due to the spatial and temporal locality of programs (ie the fact that your accesses to instructions are not random, but are localized in loops, etc), this cache ("translation buffer") will only fail to have a translation present once every million instructions. So you're doing *one* translation every million cycles, rather than a million translations like current processors would have to do. Interestingly enough, a scheme like this was brought up as a discussion item in my Superscalar Processor Design class a couple of weeks ago, though my professor used the example of an specialized Alpha decoding/translating x86 and caching the results. One might even write the translations back out to disk as an attachment to the original executable, so that the next time you run the program that's fewer translations you have to do, and eventually you'll have a fully translated version on your hard disk for optimal speed. I guess we'll just have to wait to see if Transmeta does something similar.
One embodiment of the enhanced hardware includes sixty-four working registers in the integer unit and thirty-two working registers in the floating point unit. The embodiment also includes an enhanced set of target registers which include all of the frequently changed registers of the target processor necessary to provide the state of that processor; these include condition control registers and other registers necessary for control of the simulated system.
It seems this new chip is going to have a lot of registers. As Cartman would say, sweeeeeet!
The patent [164.195.100.11] also provides some sample C code, the corresponding x86 assembly, and some sample optimizations the Transmeta system may perform. It's a little more than half way down the page, if you want to look, just scroll until you see code:-)
A lot of nice explainations of how it does what it does..but not of Why it does what it does. All computers run software to get to an end point. If this thing runs ( almost ) all software, it's to get to an end point. The effect of the machine...Windows is just another application. So is Linux , or BEos, etc. One resource drive with a second configuation drive lets you drop in any OS disk and have an variable configuation of the type of machine it was written for.
Can't think of a situation (except for processor bugs like the F00F one), where the processor hangs in mid of some instruction, stumbled over some microcode gone crazy. So I simply see no benefit of a rollback of an instruction, sorry.
Happens all the time, although there's already ways of dealing with it. Consider virtual memory. Having to redo an instruction, because some exception occurred in the middle of it, isn't very.. um.. exceptional.
But I can't think of how this relates to the Transmeta speculations. Well, actually I can, but my theory is so wild-ass that everyone would laugh at me.
Oh, what the hell. This is as good a place as any for me to make a complete fool of myself... I think Transmeta is making a display circuit that instead of fetching each pixel from a frame buffer, executes a little program for each pixel. The program must execute incredibly fast since the result must be available before the horizontal scan goes to the next pixel.
There, I said it. Now everyone can back away from me quietly, and then point and laugh when they reach a safe distance.
i believe it even says somewhere in the patent that it is more software based than not
The patent says that "emulation software" would translate x86 or whatever code into "native Transmeta" code (see other postings of mine in this thread, many of which amount to "software translation, dammit, not hardware translation").
As such, I don't know why this need involve any FPGAs at all - the patent doesn't seem to describe a processor that can be configured at the hardware level to run arbitrary instruction sets, it appears to describe a processor that lets software (presumably running on that processor) translate other instruction sets into the native instruction set making optimistic assumptions about what the code being translated does, get exceptions if those assumptions are invalid (with the exception handler presumably doing more pessimistic translations and retrying with the new code), and not have to worry about irreversible state changes having been made by overly-optimistically-translated code.
Some notes for those who may want a more in-detail explanation:
The beginning of the patent ("claims") is essentially just a list of things that all modern, superscalar, out-of-order processors do, and saying "hey we do this too".
Basically, out-of-order machines execute instructions out of their program order (hence the name:). This means that if your code sequence is A,B,C; the CPU may actually execute it such that B is done executing before A. But B's results cannot be written to system memory or the architected registers ("machine state")until you know that instruction A didn't generate an exception. That's so that you can provide precise exception handling, ie that the OS can service A's exception and then resume exection with B. If you don't wait to do your memory store, then you'll end up executing B twice, which you didn't intend. So that's what all the talk in the beginning of the patent about memory stores, etc, is about.
If you get past all the uninteresting stuff like that in the beginning, you'll find the following:
"The present invention overcomes the problems of the prior art and provides a microprocessor which is faster than microprocessors of the prior art, is capable of running all of the software for all of the operating systems which may be run by a large number of families of prior art microprocessors, yet is less expensive than prior art microprocessors. "
The idea it seems is that rather than making complex hardware to execute the instructions and perform speed enhancements, they're doing speed optimizations in software. Which in turn allows very simple hardware(which in turn should translate to really high clock speeds). It seems that Transmeta's bet with this is that the penalty incurred by doing software rather than hardware optimizations is offset by the increase in clock speed and decrease in hardware cost.
Using such an approach should also make running multiple instruction sets a much easier task. Currently processors do their instruction decoding in hardware. But if Transmeta has managed to do this decoding (fast) in software, then they can just add a little more software to allow multiple instruction sets. They also seem to be caching the translations of non-native to native instructions in a memory structure of some sort, so that they minimize the redundant emulation computations.
Actually, to address gupg's comment, it also seems that they should not need *any* special compiler support, because they can run stuff that was compiled for any of the various instruction sets they choose to support. So they themselves should not need to do compiler work. I would guess that the reason they're hiring all sorts of compiler folks is that they need people to do the afore-mentioned software instruction translation, and the people best suited for that are compiler people since they work on the instruction level all the time. Most other programmers don't have to deal with anything other than high-level languages, and so would not be particularly well suited to doing what Transmeta is doing.
Anyway, hopefully this explained things a bit more to everyone. My reading and explanation of the patent was pretty quick since I have to go to class in a few minutes. I'll finish reading the patent afterwards and add anything else I think you might like to know.
Any emulator has to translate one instruction in the emulated instruction set into one or more instructions in the target instruction set. For instance, the Mac's 68K emulator does this... but it also tries to cache previously-interpreted 68K instruction sequences and stashes them in a cache, where it can reexecute them again without retranslating each instruction.
This patent (and, yes it's in English - but patent-lawyer English) apparently implies a hardware-based mechanism to store translated instructions in a on-chip cache and then execute them afterwards, hoping that at that point other tricks like pipelines and multiple instruction units will be able to do their thing.
In a normal emulator, you get relatively little benefit from the normal on-chip caches and pipelines. This would seem like an interesting way to speedup a X86 (or even PPC) emulator.
And if you think there's little use for this, think "Java Virtual Machine". Think "hardware-assisted just-in-time compiler"...
On a side note, why spend all of this effort to be x86 compatible when you have the source code?
Umm, because they don't have the source code to all the, say, x86-architecture programs they might want to run?
IMO open source software is going to make hardware architecture very competitive.
"Is going to make" isn't the same as "has made". Yes, typing make to get "native-Transmeta" machine code for your application may not require all the work that this patent involves, but it involves, instead, waiting for open source versions of the programs they're interested in showing up, and they may not be willing to wait for that.
Hate to be the pain in the ass demanding people be a bit consistent in their distaste for patents, but Ye Olde UltraHLE on Win32 *appears* to do a good chunk of automagic rewriting of processor instructions intended for another architecture.
I doubt it has the same kind of exception handling as we see described in this patent, though. Them TransMetans do some funky stuff;-)
Yours Truly,
Dan Kaminsky DoxPara Research http://www.doxpara.com
Once you pull the pin, Mr. Grenade is no longer your friend.
what would Transmeta do when Intel introduces a new opcode in their Pentium IV?
Add more code to their binary-to-binary translator software to handle that new instruction. The processor isn't doing the translation, except to the extent that it runs the translation software (see other postings of mine in this thread for the quote from the patent that speaks of "emulation software").
I concur, and I would advise reading of the DETAILED DESCRIPTION if you scroll the page down a bit.
What is interesting is that they appear to have created a CPU capable of running applications designed for one of many target systems (Intel x86/Pentium, PPC, Postscript and Java even) by buffering the instructions, optimising them, and then checking their execution for errors before execution occurs. Quite brilliant and mind-bogglingly complicated.
Note the business-angles hinted at: speed and optimisation come at significant cost; cost of producing any microprocessor is out of reach of most companies (inferring a mass market), large number of applications written for many targets (Windows, java, etc.), problems associated with traditional thinking with regard to optimization and parralel processing.
To create a microprocessor which overcomes the above at viable cost to both manufacturer and customer would be enormous!
Just think of it, you're running the Transmeta CPU which is running some OS, and running Office 2000 through it and knowing that the CPU will trap any problems before they occur! This is a hardware VM-Ware!
One possibility it could be, would be for it to be more of a multi-CPU control centre. It is first fed an instruction from the software, then it picks which processor should get it and sends it off to that proc with any appropriate changes to the code to get it to work (and checking that it works before letting it out).
Combine it with a slot T scheme (T for transmeta:-) that lets them repackage any CPU in their own high speed cartridge, and their little processor can send tasks off to the appropriate intel/AMD/PowerPC/whatever chips (and there could be a mixture).
Perhaps their little CPU thing will also let it experiment if there are multiple CPUs and idle time. Send an instruction off to CPU A and CPU B at the same time and see who gets a valid answer back the soonest. That might be why it needs its own memory - so it can queue the same task up for multiple CPUs but then delete it from the other CPUs queue once it has a reply back from whichever CPU got to it first.
Combine that with a CPU bus and things could get interesting. Perhaps you stick the transmeta CPU in a regular CPU slot/socket, and then stick an daughter board into an AGP like slot.
Or alternatively, they could be going a bit like the amiga, and have semi-specialised CPUs, but which can also be used for other things. So your sound card can also do general CPU tasks if it isn't playing enough audio to take up its full load, and your video card could do other stuff. Though I tend to think it'd be more sensible to just form a CPU bus for all the cards, and just have a set of adapter plugs on the m/b. So you could get a m/b with 5 peripheral ports, so if you want to use 1 visual, 1 audio, 1 joystick, 1 mouse and 1 keyboard then thats fine, but they could also do 2 visuals, 2 audio and 1 keyboard or whatever.
So that way you get a semi-specialized video CPU card if you want hot graphics, but its CPU can also be used for general stuff (and the transmeta controller helps with translations), and you might get a fast general CPU for regular stuff - but it can also do your graphics work as well if it gets to be too much for the video CPU card alone (or you just don't have one at all).
So you could mix'n'match differently specialized CPUs on the CPU bus, as well as just add a new one every year for the latest speed and keep the old one there too (presumably the controller CPU would be able to shut down the slowest CPUs to save on electricity if they weren't needed).
Though I don't think the CPU bus is too realistic, as is technology really up to that kind of thing?
(a fibre optic backbone between CPUs that multicasts requests and the CPUs just pick up requests based on how full their internal queues are?:-).
I don't know, the possibilities are endless. But I'm quite content to just wait and see what (if anything) transmeta actually comes out with. It's just fun to play around with guessing:-).
Hmm, maybe I should get an account here someday. --Vastor.
Because the only instructions sent to the processor (after optimization) are instructions that are known to succeed.
The process of optimization is based in software as well- the instruction translation (code morphing, they call it) software is written in code native to the VLIW chip.
IE: there are no speculative instruction paths on the VLIW chip. There are something like 4? on the PIII.
In other words, the chip can have about 4x less transistors than a comparable x86 chip.
This means:
higher yield in fabrication (less cost)
More chips per wafer (less cost).
Simpler chips can also be run stably at higher clock frequencies than more complex chips of the same manufacturing process. (.18 micron,.22, etc)
Also, the optimized instructions have 70% or less operations than the original instructions.
I'm getting some of this from their earlier patent.
From my understanding of Intel and AMD CPU's, what they do is convert the x86 instructions into groups of RISC instructions, which are then run by the core processor.
What the TransMeta CPU does is CACHE the results of the translations into a multi-megabyte on chip buffer.
So, while a Pentium III takes up to 20 cpu cycles to decode some of the more complex instructions, the TransMeta CPU takes even longer, but makes a better optimization. But once it's translated it's buffered away so that if it's needed again soon, it takes ZERO cycles to decode.
The TransMeta CPU then justifys the time cost of taking GROUPS of instructions, optimize the hell out of them, taking as long as it needs, then file the result for future use.
All the exception handling stuff is needed in case an exception happens in the middle of a group.
Say, for example your program contains the instructions "A,B,C,D".
The Transmeta CPU translates this into "1,2,3".
Further, lets say an error occurs at step "2".
The CPU then Rolls back (read up on Transaction processing) to the state before "1" executed.
Then it translates the instructions one at a time until it recrates (or fails to recreate) the exception.
It then Commits the changes to the emulated registers, and reports the exception at the point when it occurs in the origional code.
Put simply, this thing will KICK INTELS ASS. possible speed improvements of over 10 times.
and, the same principles could be applied to any other CPU instruction set.
This patent does not appear to cover emulating multiple Instruction Sets at once, but nothing stops it from being applied in that manner. it would be just as hard as doing it with a 'Prior Art' CPU design.
Nor does it seem to be FGPA related, but I suppose FPGA's could be used somewhere in it.
1. Set of instructions comes into processor in one instruction set (like x86).
2. This device stores the data for this series of instructions temporarily
3. The device translates the (x86) instructions into its own internal instruction set and figures out an ordering that will not cause it to have exceptions.
4. The device retrieves the temporary data and "fills in the blanks" in the "inner" processor to get results, the so called "permanent storage" is probably the inner processor's instruction cache.
5. The data is cleared from the interim area once it's acted upon.
I think that the "in practice" that turned out badly has been the price wars between Intel, AMD, and Cyrix.
Two years ago, there was room for some serious profits on CPUs as they were the most expensive component in a computer system.
That has changed such that the most expensive component is commonly the hard disk, followed (with MSFT software) the OS, with CPU in third or fourth place.
With that change, this leaves Transmeta without the viable "IA-32 market" they may have expected to have.
Based on the droppage in pricing, it is not clear that there is room to get vast decreases in pricing.
Of course, considering that Transmeta is fabless, and doesn't directly have a sales organization for CPUs, the goal might have been to construct technology to allow building cheaper IA-32 chips, and then license it to AMD or Intel or Cyrix.
I'm not sure any of them are necessarily interested to the tune of $Billions...
I think it means (from the abstract) that they are going to provide compatability to other processors by converting their instructions to their host processor. So, the story unfolds. Obviously, they have a super fast processor and will provide for running Intel etc instructions on their processors.
The patent itself is more concerned with making sure that the conversion process occurs without any exceptions taking place.. or actually holding the processor state and waiting for a sequence of instructions to make sure no exception etc happens and then excuting it on the host processor.
They obviously also need strong compiler support for such a processor which explains all the software and compiler people they have been recruiting.
Fun, fun, fun.. who says Computer architecture is dead !
From what little I've read of it (actually now read most of it), it appears to be a way to allow for fast context switching between processor modes. Since everyone is speculating that their chip will emulate other chips (instead of providing their own ISA), this just goes in hand with that.
I also see a lot of stuff about pointer manipulations. Maybe this is at the core of how they will attempt this (i.e. keep all "processes" in memory with their own vm space and then "swap" 'em out when necessary).
In my rough perusal, I may have missed some very important details. =)
Either the people at Transmeta really need to take a course in basic english and especially punctuation or that they have a random patent generator that strings together random combinations of processor, executed, processing, circuits, determination and stores.
My money is on the latter, maybe Linus whipped together a Perl script in his lunch hour?
instrument for US one processing system t a capable processing host executing a first instruction ajust ajud functioning instruction a different instruction ajust that est translates first instruction ajust processing host including provisory circuit for storing memory armazen to ger until a determination that a sequence translates instruction execut without exception or error processing host, permanent circuit for storing provisory memory armazen stored when a determination est faç that a sequence translates instruction execut without exception or error processing host, and circuit for eliminating provisory memory armazen stored when a determination est faç that a sequence translates instruction to ger an exception error in the processor.
okay, I'm going to be moderated down, but is that english?!?!?
I think I can imagine the patent officers that were reading this going "um, billy-bob, do you know what any of this means?" and "um, no earl-ray, I have no clue what they're talking about. Must be that internet/computer mumbo-jumbo. Guess we'll just have to give it an okay..."
starting at points #7, it starts to make a bit more sense... basically a machine running a program and then wiping that program out of the memory...
this part had me tho...
"means for transferring memory stores to the means for permanently storing memory stores, and
means for storing memory data replaced by the memory stores, "
here we go...
"This invention relates to computer systems and, more particularly, to methods and apparatus for providing an improved microprocessor. "
Another line that confuses the hell out of me... "It is difficult and expensive to make a microprocessor run as fast as state of the art microprocessors"
It looks like they took a clue from Digital (a.k.a . DEC, now part of Compaq) with the FX86! package they sent me for running x86 compiled Win32 apps on my Alphastation.
They're taking data/code from one processor/platform and shipping it to another for work, then (presumeably) shipping the results back. This will be tremendously useful in loadsharing situations where you don't have all the same hardware.
Picture a multiplatform Beowulf cluster built of a mixture of G3's, G4's, Pentium II's, Alpha's, SGI's, and a couple of Amiga's just to make it fun.
I guess you'd have to call this a Beomutt cluster.;-)
Ok, its for emulation, but it Doesnt Just speed emulation. This allows for instruction ROLLBACK. Want a journeling filesystem? How about a journeling processor? The patent is for a co-processing unit that not Only translates an foreign instruction set into native instructions for a 'target processor', But, acts as a go-between for that target processor and memory. It stores the processor state, and buffers any memory writes, until it is certain that a group of instructions has been run without exception or error... If the translated instructions crash, no damage is done. Not only is this amazing overall, but it allows for Very speculative, and Very fast, instruction translation and branch prediction...
Today, a press release for Transmeta, Inc. was cleverly disguised as a patent. Transmeta, Inc. was truly proud that the US Patent & Trademark Office (USPTO) allowed them to release a press release endorsing their vaporware, and was soon picked up by a local website (www.dotslash.org).
"It amazes us that the geeks were able to interpret 'Apparatus for use in a processing system' as 'Wow, they've got something faster than Intel!'. That was our intent, of course, but we hate to see our bretheren fail a Turing test..."
Those bastards have patented my favorite fish! Of all the nerve!
Really, tho', it could be a Red Herring. Transmeta could be cashing in on the popular assumption that they're going to create a wild new processor that'll be Everything to Everyone in order to disguise the fact that they're really in the process of opening the ULTIMATE multimedia porn sight for cyber-trans-sexuals.
This would be useful if you follow both paths of a branch statement. Once it is determined which branch was supposed to be taken, that data gets posted, while the data generated by taking the wrong branch gets tossed.
This is the same method Merced/McKinley uses, isn't it? Does that count as prior art?
looks like a cpu which read foreign instruction sets and then translates them into its own set and execs them in a highly parallel manner to produces a faster execution than the original processor. THe trick here looks to be finding out which things can be done parallel without causing an exception. end result is a transmeta chip that runs the instructions of other chips faster.
Simple, you first translate it into an archtecture that is blasingly fast. Imagine if you will that I had a program to translate your office 2000 program into a program to run on my high end dec alpha. at the same clock speed it will run much faster on the alpha. This simply makes the translation phase in the hardware. Also it optimized the code and checks for errors, making it even faster as you don't have to deal with errors in the central processor. How do you do this all and make it cheaper. I HAVE NO F#%# IDEA!
Yes. It seems that Transmeta has perfected a run-on sentence processor, able to mutate any reasonable statement into a more obscured but equivalent sentence until comprehension is completely lost by the reader.
If you had scrolled way down the page, you would have found this:
SUMMARY OF THE INVENTION
It is, therefore, an object of the present invention to provide a host processor with apparatus for enhancing the operation of a microprocessor which is less expensive than conventional state of the art microprocessors yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors.
This and other objects of the present invention are realized by apparatus for use in a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor comprising means for temporarily storing memory stores generated until a determination that a sequence of translated instructions will execute without exception or error on the host processor, means for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor, and means for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.
These and other objects and features of the invention will be better understood by reference to the detailed description which follows taken together with the drawings in which like elements are referred to by like designations throughout the several views.
It appears to be a system in which a processor is fed a sequence of instructions in a translated foreign set, and the results are held in cache until it can be ascertained that the entire stream of instructions will run without error, at which time the cache is released. They may be using this purely as a CISCRISC mechanism, or they may be planning a platform where the actual program code is 'broken' into chunks, and the processors might encounter exception if the granularity of the sets is off. They may even be planning a platform that does multi-arch emulation on a transparent hardware/microcode level, ala AS/400. Heck, they might be doing all three! They also give an allusion to making a cheap processor run code designed for a more expensive one, so perhaps they're planning to give Intel a run for their money.
I'm sorry, but that is the closest I can get to an answer with the available information.
"...It is, therefore, an object of the present invention to provide a host processor with apparatus for enhancing the operation of a microprocessor which is less expensive than conventional state of the art microprocessors yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors..."
think of it like this: the cpu is capable of reading the instruction set for another architecture, figuring out what that architecture needs from the cpu, determining all possibile instructions of that architecture, and "emulating" that architecture by a technique that allows the "emulation" to be as fast or faster than the original architeture (by taking advantage of the invented cpu's "extra" free stuff).
so, what that means is that the cpu would theoretically be able to run any OS designed for any instruction set (ie x86, alpha, mac, etc.)
It appears that the superscalar speed necessary for faster-than-target emulation comes from a VLIW design. A single VLIW instruction is generally going to correspond to more than one target (x86/whatever) instruction, hence the patent's subject matter of efficient cached memory store and exception determination - you don't want to commit the VLIW memory stores until you've determined that *all* of the corresponding target instructions would have suceeded.
As the patent points out, it applies equally to emulation on other (non-VLIW) superscalar architectures, but the emphasis does appear to be on VLIW.
It also appears to be transaction-oriented: a sequence of instructions that would fault will have no effect, regardless of whether it would have executed a write to memory before the fault. This could be handy, because it means that bad code won't corrupt memory on the way down.
It's not really all that different to the way the recently announced supercomputer-on-a-desktop works, the one that translated microprocessor-type instructions into FPGA wiring on the fly, just in time, so that CPU instructions effectively run on dedicated logic intead of in generic microcode, ie. *much* faster.
This area is called Reconfigurable Computing, and it's been around for quite a few years (there's some quite reasonable supporting hardware available for it from Xilinx).
Transmeta's patent differs from that in the detail of course, but the general principle is remarkably similar, so much so that they've probably included references to it among the prior art somewhere.
The Pentium Pro was out when this was filed, and it operates very similarly. It translates x86 instructions into micro-ops which are executed out of order and retired in order. Memory writes go into the retirement buffer, and if they are the result of a mispredicted branch, they are expunged from that buffer. This is pretty much exactly the same as what Transmeta is claiming in this patent.
Well now we know it! And what an unbelievably brilliant idea that will make them the next Intel. For the lay persons, what they are making is the combination of a new microprocessor and BIOS that interprets X86 commands and runs them on a RISC processor. This combination works as a hardware excellerated DOS/Windows emulator. Their claim that it runs faster than Intel's systems makes sense. It's like having two processors that are cheaper than the Pentium that run faster than the Pentium. I have two questions: When is the IPO and when do I get my hands on one of these. I imagine that the error handling will do what Microsoft and Intel have never accomplished...revert back to "last good state" when an error is generated. Oh my!!!!!!
It sounds to me like they have 1) A fast processor that speaks its own language. 2) Some device that translates code from some other istruction set on the fly and independent of the fist processor. 3) Once the translation is deemed to be correct, the original is tossed. What this means: They can run any instruction set they like as fast as they want because the main CPU is not doing the translation.
"circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor"
I vote for it being the random patent generator. My favorite part of the whole solliloquy is
you can't beat that! Maybe they're really working on an optical processor and wanting us to think they're working on a universal processor that'll run any other processor's code. Good one, Linus (and others), but what's it really do
The latest patent surely looks related to the other patents previously awarded...
The basic idea for all of the patents has been to provide mechanisms to allow one to:
Create a new CPU that uses one instruction set;
That CPU is emulating the instruction set of some other CPU ( Oh, Say, Perhaps IA-32... );
The patent provides for some scheme whereby instructions are run in some sort of "emulation mode," where they try to execute in a sort of abeyance...
The system then seeks to detect situations where the emulation starts going astray, and provides mechanisms for "coping with this error."
The various patents have involved that mechanism for coping with the errors, with an attempt to construct ways of quickly working around them.
This parallels the notion of Lagrangian Relaxation, where you take a problem, with various restrictions, and relax those restrictions. In exploring the solution space, the system will find solutions that aren't in the feasible solution space of the (unrelaxed) problem.
In the case of Lagrangian Relaxation, the way of coping with that is to associate values with the objective function that penalize infeasible solutions, thus encouraging the system to head towards feasible solutions.
In the case of Patent 5958061, the "relaxation" is that the system performs the emulated instructions, modifying a temporary memory store, and rolling back when it hits cases where the preliminary emulation results in errors on the host processor.
Patent 5832205 concentrated, in contrast, on the apparatus to detect a failure of speculation.
That patent, plus one of the other patents (mentioned on Slashdot a while ago), seems to suggest that if what they end up building involves the patents they're filing (i.e., assuming those patents don't come from what they were working on at one time, but decided not to build), then it may be a processor with an instruction set different from that of other processors, plus something (quite possibly software, not necessarily hardware in the processor, as some appear to have inferred) that translates other instruction sets into the Transmeta instruction set, and does so "speculatively", in that it assumes that the translated code won't get a fault.
If the code does get a fault ("exception or error" - this could be an exception without being an error, e.g. a page fault), then anything that code did "speculatively" and that wouldn't have been done by the untranslated code had it gotten that exception hasn't made any permanent state change, so the fault cancels/backs out any uncommitted state changes and presumably traps to software that would do whatever is necessary to do what the untranslated code would have done.
Basically it is a fully configurable CPU that can be programmed on the fly to fully dedicate to a single goal and complete it very quickly. Your standard Intel is designed for general number chunching and nothing much else in particular. But if you had control over what each register and logic gate did you could make your processor totally dedicated and streamlined to complete a single task. as well as being programmable the cache is spead out so that each logic gate has its own cache rather than one lump of cache for the whole board, this also speeds things up.
It doesn't sound to me like this chip is actually doing the emulation; just the translation, and then buffering it so another chip can pick it up from there and run with the instructions... which would make sense with everything else you said.
So I got this feeling from reading it that this chip must have a shitload of registers (I mean, everything they're doing is all about avoiding memory i/o it seems), so I started grepping... it's certainly very register-based. But here was the good part:
> These improvements include a gated store > buffer and a large plurality of additional > processor registers
"large plurality" sounds to me like a whole boatload more than the ones we used in those silly MIPS simulators to learn assembly theory and certainly more than any x86 chip I've seen.
I also saw some references to VLIW conversion, so I did another grep; I think this is one of the best paragraphs, and it's not in Greek...
---------- FIG. 2 is a diagram of morph host hardware designed in accordance with the present invention represented running the same application program which is being run on the CISC processor of FIG. 1(a). As may be seen, the microprocessor includes the code morphing software portion and the enhanced hardware morph host portion described above. The target application furnishes the target instructions to the code morphing software for translation into host instructions which the morph host is capable of executing. In the meantime, the target operating system receives calls from the target application program and transfers these to the code morphing software. In a preferred embodiment of the invention, the morph host is a very long instruction word (VLIW) processor which is designed with a plurality of processing channels. The overall operation of such a processor is further illustrated in FIG --------
This pretty much gives away what people have been saying since the beginning. Morphing hardware AND software elements that work in conjuntion to provide (drum roll) a fast as HELL computer. And it will run software we already have. And pretty darn near anything you throw at it. Want to be a Playstation for a day? How about an O2? Now switch back to Pentium II so you can type up that report and then become a G4 so you can make the graphics to insert in the presentation that accompanies it.
This thing will be doing the code morphing in parallel (which is what this invention seems to be... the morpher) and then run it on another fast chip that's related to one of the earlier patents. And it will all be controlled by a little driver that turns into a "layer 1 vmware" (now that our computers will need an OSI layer model...:P )
DETAILED DESCRIPTION, Paragraph 1: "The present invention overcomes the problems of the prior art and provides a microprocessor which is _faster_ that microprocessors of the prior art, is capable of running _all the software_ for _all the operating systems_ which may be run by a _large number of families of prior art microprocessors_, yes is less expensive than prior art microprocessors. (my emphasis)
in other words, the Holy Grail of computer architecture: processor emulation that's faster than the native processors. yes, sounds too good to be true, but at least it won't be vaporware...;-)
Yes yes. But what about circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor
Wha'dup with the "permanently storing memory stores temporarially stored?" It's pretty much decided that temporarially stored implies a cache. Take the code, translate it. Store it in the cache until it is verfied, then execute it.
Sounds like their temporary cache of instructions can be sent somewhere else for storage once they have been verified.
Hmmm. HHHMMMMmmmmmm.....
So does that mean it can execute, say x86, code in emulation, and at the same time translate into native transmeta opcode in order to be run natively at a later time? It's doable. I can picture a basic flow diagram circuit in my head right now.
basicly you want to be able to translate say x86 code into some other native instruction set and not to have to worry (too much) about x86 exception semantics or context... you do the high performing translation for the common case and have your hardware pick up any exceptions - meanwhile you have your hardware not commit any memory system changes (the store buffer) until you know your code fragment ran without exceptions - then you commit the changes to memory - if you take an exception you back off and emulate the code fragment probably x86 instruction by instruction (that way you find out which x86 instruction caused the problem etc etc and the x86 state looks clean to the x86 program).
Chances are the code fragments are basic blocks (between branches etc)
I think I've seen this idea by another name before so I'd guess there's prior art - but hey it's a patent you can read anything the lawyers can get away with into it....
I think you're reading alittle too much into what is said in the patent. From the abstract:
determination is made that a sequence of translated instructions will generate an exception or error on the host processor [empahsis mine]
It seems to me that what they are doing here is making sure that the translation is correct, i.e. that the native instructions make sense. It does not do anything about any memory writes that might take place as a result of the execution of those native commands. Remember that the BSOD in windows comes as a result of the execution of a valid set of x86 instructions that mess up memory in a way that stops the application/system from functioning properly. What TM is talking about here would not effect that sort of thing at all (the chip logic would have no idea that writing to memlocation x would screw up the running of your app)
This implies to me that whatever mechanism TM is using to quickly translate (say) x86 -> TM instruction set can cause a set of instructions that make no sense (for instance a value is written to a register then another a value written to the same register without the first value being used at all -- that might not be a very good example but its that sort of thing )
It is cogent, well written and covers a lot of ground. Someone really did their homework on that!
Much of the rest of the patent application is as deliberately dense as they can make it. Including one run-on sentence that would take me three huge breaths to speak aloud:-)
For information on what this thing actually does, read the 'DETAILED DESCRIPTION' section. On interesting fact gleaned from there in a quick reading is the fact the emulation co-processer is called a 'morph host' and it apparently executes some kind of special opcodes used for emulation. So to do the emulation you write 'code morphing software' that translates incoming instructions to the 'morph host' instruction set. Very Cool! And, of course, the 'transactioning' and error checking stuff noted in prior posts.
It is looking more and more like the early rumors of a Transmeta 'emulate anything' design were on the nose...
OK, after reading a ton of messages, I'm thinking about this whole instruction translation issue. If Transmeta is making a "co-processor" that would translate instruction sets, and _THIS_ thing can store the existing state of the processor then....
Couldn't they theoretically (siq) be working on a system that would allow you to run MULTIPLE instruction sets inside of single OS?? The implications would turn the existing software industry (of which I am a part) onto it's ear!
Could we actually have a box running some form of unix, and actually be able to run ANY application natively on it - no matter what OS it was written for?? Think about running a BeOS app next to a Win32 app, next to an application compiled for i386 Redhat! WOAH.
If this is even close to what actually exists in Transmeta's labs, then we are in for a serious roller coaster over the next couple of years!
looks like a cpu which read foreign instruction sets and then translates them into its own set and execs them in a highly parallel manner to produces a faster execution than the original processor.
It's gonna be a lot cheaper per CPU, but you'll need a bunch of them to do the same work -- no more single-processor systems! Remember, in theory, a large group of 6502's can emulate a PIII faster than a PIII... if you can manage to write the software to coordinate the breaking up and reordering of instructions. Finally, a true RISC machine -- why have multiple instruction pipelines on a single processor, when you can have multiple processors with single pipelines?
They may even be planning a platform that does multi-arch emulation on a transparent hardware/microcode level, ala AS/400.
PowerPC-based AS/400's don't have microcode in the CPU, as far as I know. The older IMPI ones had two levels of what was called "microcode", but the Inside the AS/400 book by Frank Soltis (one of the architects of S/38 and AS/400) said the "vertical microcode" was just machine code and was called "microcode" for legal reasons (if it was software, IBM would have to unbundle it; it was "microcode", however, which meant they could bundle it with the hardware). The "horizontal microcode" was conventional microcode, used to implement the IMPI instruction set.
I.e., the emulation is done largely in software, by translation of the high-level "MI" instruction set into the native instruction set (IMPI or extended PowerPC), although that software was, at one point, called "microcode".
The processor described in the various Transmeta patents also appears to do that translation in software, not hardware; this patent says
Typically, the target application is being designed for some target computer other than the host machine on which the emulator is being run. The emulator
software analyzes the target instructions, translates those instructions into instructions which may be run on the host machine, and caches those host instructions so that they may be reused.
If the code does get a fault ("exception or error" - this could be an exception without being an error, e.g. a page fault), then anything that code did "speculatively" and that wouldn't have been done by the untranslated code had it gotten that exception hasn't made any permanent state change, so the fault cancels/backs out any uncommitted state changes and presumably traps to software that would do whatever is necessary to do what the untranslated code would have done.
That's exactly what I got out of that part. And it sounds pretty cool. This particulary would have applications in multiprocessing systems. On the Alpha, we handle exceptions using something called trap barriers, which is a software method of handling this sort of thing. What happens is a fault appears to issue from the trapb and you are left to your own devices(from a compiler perspective) to discover where the exception occurs. It isolates the exception down to what's called a "trap shadow". This translates to a pain in the ass because we don't know precisely where a fault issued from, only the "shadow". Multiproc's complicate this mess further. This makes for interesting, but complicated compiler development.
Moving this to hardware, OTOH, would greatly simplify things, especially when emulation adds a layer of obfuscation.
That's why the exception handling part is what I zero'ed in on. It sounds really neat.
My take on this was that there would be part of a CPU devoted to this "multi-tasking." After all, in order for the communications to take place really quickly, it all needs to be on one chip.
You do, however, show an interesting notion; if the patented matters reveal a "protocol" for allowing this emulation, it may make it plausible to have multiple "little processors" doing work, and getting to change Real Memory when it makes sense to commit the work.
It would certainly be neat if this were amenable to putting a bunch of "little processors" working together. The communications takes place at a much lower level than Beowulf; it may even be at a lower level than is done with SMP.
Or rather, that Transmeta is developing processors. They might be produced by a different company. I don't know how large Transmeta is or whether it's capable of producing its own chips.
Is it possible that they could translate something like Java bytecodes at a speed on par with compiled C code? Well, I guess the answer is: no one has any idea.
But if they could do something like that -- not just for Java, but other environments that do better with dynamic compiling (like polymorphic OO systems, e.g. Smalltalk, Common Lisp) -- that would mean a real revolution in programming. The advantages of C for anything other than systems programming would be greatly diminished.
Of course, if the translation is all hardcoded, that's unlikely to be very helpful for higher-level languages. And maybe the translation assumes some sort of commonality -- registers and the sort -- that most processors share, but wouldn't be shared by most sorts of bytecodes. This reminds me of what Linus was talking about in his article on the portability of Linux.
(if you dont know work out how you can add ppc to a AS400 and not recompile)
Much of the audience may not be familiar with AS/400's, so that's not necessarily much of a hint.
System/38 and AS/400 compilers generate code in a high-level pseudo instruction set; the low-level OS kernel, when told to run one of those programs, translates it into the native instruction set and runs that. (See Frank Soltis' Inside the AS/400; go to the 29th Street Press's home page [29thstreetpress.com] and select "General Interest" under "*** ALL AS/400 TOPICS ***", and then look for that book, which they claim to have online - the URLs on that site look depressingly dynamically-generated, so I'm loath to make a direct link.)
This let them change the native instruction set from the apparently 360-flavored "IMPI" to an extended PowerPC instruction set without requiring people to recompile programs (unless they tossed out the pseudo instruction set code to save disk space).
From the various Transmeta patents, it sounds as if they're building a chip intended to be used in an environment making use of binary-to-binary translation, as the S/38 and AS/400 do, but it's not at all clear that they intend to use B2B translation in exactly the same fashion - they appear to be targeting existing low-level instruction sets, e.g. x86, rather than some high-level instruction set like the S/38 and AS/400 "MI".
First, if you go to the patent office page again, and hit the next button, you'll see that they have a number of patents, the sum total of which is not a processor, but a computer.
There's a somewhat interesting write up on CNN [cnet.com] (from the time of the first patent, nov. 98). There seemed to be some posts that missed who transmeta really is - it's owned by Paul Allen, who also owns Interval (another think tank). His whole goal has been to recreate his PARC days, when really smart people could team up and work on just about anything they wanted (the result we all know, since we're using it).
Transmeta's computer does at the processor level what JIT and Java do for software. Java lets you write one program and run it on many OS's. JIT speeds that process by pre-translating java byte codes into native code.
The transmeta box will allow a chip manufacture to make a single chip, that will run any OS, and (by cacheing instruction conversions, as well as memorizing repeated instructions) actually run them all faster than the zillion chips AMD, Intel and the rest are cranking out.
Think about it: Universal hardware, universal applications, and plethora of invisible middleware.
Welcome to the future. You heard it here first. Too bad you can't by stock in Transmeta....
Hmmmm..... looks like some hardware assist to help binary retargetting. For people not familiar with the concept, take a look at an overview [uq.edu.au]. The concept is sound in that as ESR points out 95% of the programming jobs out there are spent in maintaining old code on old machines. However, if there was a way of abstracting and specifying the hardware characteristics and mapping from one to another, then old binaries could be shifted onto newer and cheaper hardware with less hassle. I can think of cases like old Cray binaries where porting them to a new MPP would be too painful manually, some of those timing cases can be really subtle. Given that computer companies are very relunctant to support hardware which isn't current (ie not profitable) and others could potentially go belly-up (correct me if I'm wrong, I think only IBM is one of the few giants left from the 60's), there is a need to protect the million of man-years spent on specific packages. Of course, research has shown that retargetting works better with availability to the original compiler source:-).
Given the rate of corporate take-overs, you could quite easily end up running a zillion different systems and lose valuable time in trying to consolidating everything.
Oh well, add this to the speculation pile along with everyone else.
Actually, it's a way to run any application for any processor and any OS, straight from Emacs. Unrelated planned features for Emacs include improved SMB support, an extremely light-weight httpd, and preliminary support for USB child-rearing devices.
Re:Behind the technology - the business (Score:2)
Re:Hypocrites! (Score:1)
Actually, it doesn't know there won't be an error (Score:2)
This patent exists because they can't determine ahead of time if there will be an error. This patent, inconjunction with one of their other patents, provides a method for them to muddle on in the usual case of no error and still have a means of rolling back in the less usual case of an error.
This particular patent convers stores that are speculatively executed, but may need to be killed because an instruction that occurred logically before them in the original code stream faulted.
--Joe--
Re:What it Really does (Score:1)
Re:What it Really does (Score:1)
Re:It means .. (Score:1)
Sure, and I use that state-of-the-art parallelizing compiler on my 8-fold XEON SMP system to get a blazing fast application.. oh, sorry, there is no such compiler that turns a random program efficiently into a parallelized one? Oops.
So please tell me why the engineers at Intel or AMD don't fire on their logic optimizers to implement the processor instructions in such an optimal manner?
Re:YES, that's what I got (Score:1)
Just in case you still dont get it... (Score:1)
They were awarded a patent for a universal hardware based processor emulator.
Generally speaking of course...this is similar to how AMD's K6 family of processors work..by converting X86 instructions to faster RISC instructions...but thiers operates on a broader scale.
Patent requirements (Score:2)
When writing a patent I believe you are required to phrase the abstract as one sentence. I suppose it was originally intended to show the purpose of a patent as shortly and concisely as possible.
That clearly is not happening. But like so many other things to do with the patent office, this outdated requirement has been preserved.LoppEar
Run it through babelfish...And play backwards! (Score:1)
-BF
Re:Processors (Score:1)
Re:What about the "permanent bit" (Score:2)
But why have any reference to permanent storage if the data just gets shipped back and forth between caches?
Re:Hmm ... (Score:1)
Re:Huh? - Time to call on your Grammer classes =] (Score:1)
"circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor"
Here is how it should look:
"ciruitry for 'permanently storing memory stores' temporarily stored when...."
It might be easier if you abbreviate it..
"ciruitry for PSMS's temporarily stored when.."
Make sense now? =] The whole dern document is like that. It's sick..I had to read certain sentences multiple times.. =] It's like reading a book with no periods, no capitals, no nothing. Insane. But yeah...Still sounds cool.. =]
Re:Wait... (Score:1)
I didn't know he had a doctorate.
I didn't even know he owned Transmeta.
But is instruction level compatibility enough? (Score:1)
anybody work in chip fab? (Score:1)
Re:Hmm ...[wrong url sorry] (Score:1)
VLIW [ibm.com]
Stephen King??? (Score:2)
attourny's name is Stephen King?
Is this whole thing some kind of cruel
quasi-technological/horrific hoax?
Re:Wait... (Score:1)
Translation for Geeks (Score:1)
0000000 7041 6170 6172 7574 2073 6f66 2072 7375
0000010 2065 6e69 6120 7020 6f72 6563 7373 6e69
0000020 2067 7973 7473 6d65 6820 7661 6e69 2067
0000030 2061 6f68 7473 7020 6f72 6563 7373 726f
0000040 6320 7061 6261 656c 6f20 2066 7865 6365
....
Geek talk vs. lawyer talk.
Re:What if... (Score:2)
Read the patent, not the comments. Many people seem to think the processor would translate instructions itself, perhaps because the patent goes on about
but the patent later indicates that "the processor" does that translation by running translation software:
Not bother storing the results of that instruction.
...or, at least, make it look as if it didn't.
Why? The error could trap, and the trap handler (or code it invokes) could do whatever is necessary to simulate the what the processor being emulated would do in that error situation (although the "exceptions" they talk about aren't necessarily errors - I scrolled past one example of "native-Transmeta" code, generated from x86 code, that assumed that the code doesn't make unaligned memory references that cross a page boundary; if that happened, "either hardware or software alignment fix up" would detect this, and perhaps generate more pessimistic and slower code and restart the emulation running the new code).
Hmm ... (Score:2)
I think it said "we've got a cpu that pretends to be another cpu and it needs a place to store the instructions it's actually running to run the instructions it thinks it's running, and we're patenting how it does that." But I'm probably wrong.
It fits with horizontal microprogramming... (Score:1)
Now, the specifics in this case seem to be a provision for cacheing the results of parallel instruction execution and then either voiding or writing that cache depending on whether the instructions cause an exception or not. That in itself is nothing new, but in combination with "just-in-time" compilation of, say, x86 code to Transmeta microcode it might be. Particularly if the Transmeta processor uses horizontal microprogramming (read, "very very very long instruction word") to speed up the processing. (Loosely speaking, with horizontal microcode, each bit in the very long instruction word (could be hundreds of bits) maps to a discrete piece of logic (gate, flip/flop, etc) in the processor. Given an appropriate processor design it might be possible to map several instructions in a more vertical set (x86, PPC, etc) to a single wide microinstruction, effectively executing all of those in parallel, but then you really need some way of flushing everything if it screws up. Which this patent provides.)
(It'll be interesting to see if the active microcode store is loadable from RAM (making it end-user microprogrammable) or just from a fixed set of microprograms on ROM (which may live on the CPU die)).
Re:Hmm ... (Score:1)
A compiling processor! (Score:1)
If the TransMeta ideal propagates widely, it will be a new world of software design. Instead of compiling for a processor, you only need to pre-compile into a pseudocode, provide a description of that pseudocode to TransMeta, and the processor takes care of final translation, and apparently a good deal of the debugging work too. As a development platform this sounds like a serious ideal. In a world of open source and platform independence, the TransMeta sounds like a real solution.
Yeah, I'm a Mac programmer. You got a problem with that?
Rolling back is useful in translation. (Score:2)
If your doing ISA->ISA translation, rollback is very, very useful. Suppose we have an x86-style instruction like this:
MOV [EAX*4 + 4], EBX
This particular instruction does several things: It reads EAX, multiplies it by 4, adds 4 to it, and then stores EBX at the address genereated to the desired location in memory. This might break up into several RISC-like ops: (these are written in the more traditional RISC form OP src, src, dst for clarity)
SHL EAX, 2, tmp1
ADD tmp1, 4, tmp2
STORE EBX, *tmp2
It doesn't take a brain surgeon to see that these steps could overlap their execution with other instructions. For instance, the instruction that calculates EBX could overlap execution with the left-shift and the add. If the original instruction was in a tight loop, then it could even overlap with itself! Why is this important?
Say you have some code which is stepping through the array, and say that the array spans a page boundary. And, say that the second page isn't "paged in." When the loop hits the page boundary, a fault will occur. Because the stores are being spooled extremely rapidly, the loop may not be informed that a particular STORE faulted until several other stores were executed. All stores after the faulting one need to be killed since we need to process exceptions in a precise order. Here's where the rollback becomes handy: We merely discard the extra, incorrect stores, and roll back the processor state to be consistent with the emulated state of the machine at the time of the fault.
This is what most current x86 clones do when they translate x86 instructions into "RISCops" or whatever they decide to call them. I'm guessing Transmeta is aiming to do a similar sort of translation, only with a more configurable flavor.
--Joe--
Looks like.... (Score:2)
Missing link? (Score:1)
Send the flying saucers in.
They will not produce processors (Score:1)
Re:It means .. (Score:1)
"The number of suckers born each minute doubles every 18 months."
Sounds like fast emulation (Score:1)
Summary of the Invention (Score:2)
Abstract (very) for the lazy folks (Score:3)
processor, and circuitry for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.
hmmm???
Safe cache (Score:2)
This is a device for assisting in processor emulation: I believe it will hold commands in memory until it knows that they will execute without error. Quite a good idea.
Simple, elegant, and not obvious. All the requirements for a good patent.
This is really the sort of thing that Windoze really needs: a 'this instruction would cause the program to do "bad stuff(TM)", so I won't allow it. It should stop a single process for taking whole systems down.
Re:My layman's explanation (Score:2)
If they DO get an exception ... then they use the described hardware to throw away the side effects of executing the code fragment, and interpret the x86 instructions from the start - doing all the proper instruction semantics .. when they get the exception again this time they know what the PC is and what all the flags and registers etc are
A few other things (Score:5)
In a preferred embodiment of the invention, the morph host is a very long instruction word (VLIW) processor which is designed with a plurality of processing channels.
I'm not going to go into huge detail about VLIW machines (particularly since I don't know all that much about them
Regarding the instruction translation and subsequent caching I mentioned in my previous post, a quote from the patent illuminates the matter a little more:
The code morphing software of the microprocessor...includes a translator portion which decodes the instructions of the target application, converts those target instructions to the primitive host instructions capable of execution by the morph host, optimizes the operations required by the target instructions, reorders and schedules the primitive instructions into VLIW instructions (a translation) for the morph host, and executes the host VLIW instructions.
When the particular target instruction sequence is next encountered in running the application, the host translation will then be found in the translation buffer and immediately executed without the necessity of translating, optimizing, reordering, or rescheduling. Using the advanced techniques described below, it has been estimated that the translation for a target instruction (once completely translated) will be found in the translation buffer all but once for each one million or so executions of the translation. Consequently, after a first translation, all of the steps required for translation such as decoding, fetching primitive instructions, optimizing the primitive instructions, rescheduling into a host translation, and storing in the translation buffer may be eliminated from the processing required. Since the processor for which the target instructions were written must decode, fetch, reorder, and reschedule each instruction each time the instruction is executed, this drastically reduces the work required for executing the target instructions and increases the speed of the microprocessor of the present invention.
Transmeta seems to have an excellent idea here. They're caching optimized translations of the incoming instructions, so rather than have to translate and optimize over and over each time you see that bit of code, you do it once and then just grab it from the cache. Due to the spatial and temporal locality of programs (ie the fact that your accesses to instructions are not random, but are localized in loops, etc), this cache ("translation buffer") will only fail to have a translation present once every million instructions. So you're doing *one* translation every million cycles, rather than a million translations like current processors would have to do. Interestingly enough, a scheme like this was brought up as a discussion item in my Superscalar Processor Design class a couple of weeks ago, though my professor used the example of an specialized Alpha decoding/translating x86 and caching the results. One might even write the translations back out to disk as an attachment to the original executable, so that the next time you run the program that's fewer translations you have to do, and eventually you'll have a fully translated version on your hard disk for optimal speed. I guess we'll just have to wait to see if Transmeta does something similar.
One embodiment of the enhanced hardware includes sixty-four working registers in the integer unit and thirty-two working registers in the floating point unit. The embodiment also includes an enhanced set of target registers which include all of the frequently changed registers of the target processor necessary to provide the state of that processor; these include condition control registers and other registers necessary for control of the simulated system.
It seems this new chip is going to have a lot of registers. As Cartman would say, sweeeeeet!
The patent [164.195.100.11] also provides some sample C code, the corresponding x86 assembly, and some sample optimizations the Transmeta system may perform. It's a little more than half way down the page, if you want to look, just scroll until you see code
Re:Hmm ... (Score:2)
Re:What it Really does (Score:3)
Happens all the time, although there's already ways of dealing with it. Consider virtual memory. Having to redo an instruction, because some exception occurred in the middle of it, isn't very .. um .. exceptional.
But I can't think of how this relates to the Transmeta speculations. Well, actually I can, but my theory is so wild-ass that everyone would laugh at me.
Oh, what the hell. This is as good a place as any for me to make a complete fool of myself... I think Transmeta is making a display circuit that instead of fetching each pixel from a frame buffer, executes a little program for each pixel. The program must execute incredibly fast since the result must be available before the horizontal scan goes to the next pixel.
There, I said it. Now everyone can back away from me quietly, and then point and laugh when they reach a safe distance.
---
Have a Sloppy day!
Re:FPGA - Field Gate Processors (Score:2)
The patent says that "emulation software" would translate x86 or whatever code into "native Transmeta" code (see other postings of mine in this thread, many of which amount to "software translation, dammit, not hardware translation").
As such, I don't know why this need involve any FPGAs at all - the patent doesn't seem to describe a processor that can be configured at the hardware level to run arbitrary instruction sets, it appears to describe a processor that lets software (presumably running on that processor) translate other instruction sets into the native instruction set making optimistic assumptions about what the code being translated does, get exceptions if those assumptions are invalid (with the exception handler presumably doing more pessimistic translations and retrying with the new code), and not have to worry about irreversible state changes having been made by overly-optimistically-translated code.
Further clarification (Score:5)
The beginning of the patent ("claims") is essentially just a list of things that all modern, superscalar, out-of-order processors do, and saying "hey we do this too".
Basically, out-of-order machines execute instructions out of their program order (hence the name
If you get past all the uninteresting stuff like that in the beginning, you'll find the following:
"The present invention overcomes the problems of the prior art and provides a microprocessor which is faster than microprocessors of the prior art, is capable of running all of the software for all of the operating systems which may be run by a large number of families of prior art microprocessors, yet is less expensive than prior art microprocessors. "
The idea it seems is that rather than making complex hardware to execute the instructions and perform speed enhancements, they're doing speed optimizations in software. Which in turn allows very simple hardware(which in turn should translate to really high clock speeds). It seems that Transmeta's bet with this is that the penalty incurred by doing software rather than hardware optimizations is offset by the increase in clock speed and decrease in hardware cost.
Using such an approach should also make running multiple instruction sets a much easier task. Currently processors do their instruction decoding in hardware. But if Transmeta has managed to do this decoding (fast) in software, then they can just add a little more software to allow multiple instruction sets. They also seem to be caching the translations of non-native to native instructions in a memory structure of some sort, so that they minimize the redundant emulation computations.
Actually, to address gupg's comment, it also seems that they should not need *any* special compiler support, because they can run stuff that was compiled for any of the various instruction sets they choose to support. So they themselves should not need to do compiler work. I would guess that the reason they're hiring all sorts of compiler folks is that they need people to do the afore-mentioned software instruction translation, and the people best suited for that are compiler people since they work on the instruction level all the time. Most other programmers don't have to deal with anything other than high-level languages, and so would not be particularly well suited to doing what Transmeta is doing.
Anyway, hopefully this explained things a bit more to everyone. My reading and explanation of the patent was pretty quick since I have to go to class in a few minutes. I'll finish reading the patent afterwards and add anything else I think you might like to know.
Cheers,
Stradivarius
Re:More from the Patent (Score:2)
You have to admit that translating code to native in parallel with actually execution would be pretty cool though.
It's a hardware-assisted just-in-time compiler... (Score:2)
This patent (and, yes it's in English - but patent-lawyer English) apparently implies a hardware-based mechanism to store translated instructions in a on-chip cache and then execute them afterwards, hoping that at that point other tricks like pipelines and multiple instruction units will be able to do their thing.
In a normal emulator, you get relatively little benefit from the normal on-chip caches and pipelines. This would seem like an interesting way to speedup a X86 (or even PPC) emulator.
And if you think there's little use for this, think "Java Virtual Machine". Think "hardware-assisted just-in-time compiler"...
Re:Summary (Score:2)
Umm, because they don't have the source code to all the, say, x86-architecture programs they might want to run?
"Is going to make" isn't the same as "has made". Yes, typing make to get "native-Transmeta" machine code for your application may not require all the work that this patent involves, but it involves, instead, waiting for open source versions of the programs they're interested in showing up, and they may not be willing to wait for that.
Transmeta Patents Dynamic Recompilation? (Score:2)
I doubt it has the same kind of exception handling as we see described in this patent, though. Them TransMetans do some funky stuff
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
Once you pull the pin, Mr. Grenade is no longer your friend.
Re:I see one problem.... (Score:2)
Add more code to their binary-to-binary translator software to handle that new instruction. The processor isn't doing the translation, except to the extent that it runs the translation software (see other postings of mine in this thread for the quote from the patent that speaks of "emulation software").
Behind the technology - the business (Score:2)
I concur, and I would advise reading of the DETAILED DESCRIPTION if you scroll the page down a bit.
What is interesting is that they appear to have created a CPU capable of running applications designed for one of many target systems (Intel x86/Pentium, PPC, Postscript and Java even) by buffering the instructions, optimising them, and then checking their execution for errors before execution occurs. Quite brilliant and mind-bogglingly complicated.
Note the business-angles hinted at: speed and optimisation come at significant cost; cost of producing any microprocessor is out of reach of most companies (inferring a mass market), large number of applications written for many targets (Windows, java, etc.), problems associated with traditional thinking with regard to optimization and parralel processing.
To create a microprocessor which overcomes the above at viable cost to both manufacturer and customer would be enormous!
Just think of it, you're running the Transmeta CPU which is running some OS, and running Office 2000 through it and knowing that the CPU will trap any problems before they occur! This is a hardware VM-Ware!
Ooh I'm drooling already!
James Green
Re:Looks like.... (Score:2)
Combine it with a slot T scheme (T for transmeta
Perhaps their little CPU thing will also let it experiment if there are multiple CPUs and idle time. Send an instruction off to CPU A and CPU B at the same time and see who gets a valid answer back the soonest. That might be why it needs its own memory - so it can queue the same task up for multiple CPUs but then delete it from the other CPUs queue once it has a reply back from whichever CPU got to it first.
Combine that with a CPU bus and things could get interesting. Perhaps you stick the transmeta CPU in a regular CPU slot/socket, and then stick an daughter board into an AGP like slot.
Or alternatively, they could be going a bit like the amiga, and have semi-specialised CPUs, but which can also be used for other things. So your sound card can also do general CPU tasks if it isn't playing enough audio to take up its full load, and your video card could do other stuff. Though I tend to think it'd be more sensible to just form a CPU bus for all the cards, and just have a set of adapter plugs on the m/b. So you could get a m/b with 5 peripheral ports, so if you want to use 1 visual, 1 audio, 1 joystick, 1 mouse and 1 keyboard then thats fine, but they could also do 2 visuals, 2 audio and 1 keyboard or whatever.
So that way you get a semi-specialized video CPU card if you want hot graphics, but its CPU can also be used for general stuff (and the transmeta controller helps with translations), and you might get a fast general CPU for regular stuff - but it can also do your graphics work as well if it gets to be too much for the video CPU card alone (or you just don't have one at all).
So you could mix'n'match differently specialized CPUs on the CPU bus, as well as just add a new one every year for the latest speed and keep the old one there too (presumably the controller CPU would be able to shut down the slowest CPUs to save on electricity if they weren't needed).
Though I don't think the CPU bus is too realistic, as is technology really up to that kind of thing?
(a fibre optic backbone between CPUs that multicasts requests and the CPUs just pick up requests based on how full their internal queues are?
I don't know, the possibilities are endless. But I'm quite content to just wait and see what (if anything) transmeta actually comes out with. It's just fun to play around with guessing
Hmm, maybe I should get an account here someday.
--Vastor.
Re: How do you do this all and make it cheaper? (Score:2)
Because the only instructions sent to the processor (after optimization) are instructions that are known to succeed.
The process of optimization is based in software as well- the instruction translation (code morphing, they call it) software is written in code native to the VLIW chip.
IE: there are no speculative instruction paths on the VLIW chip. There are something like 4? on the PIII.
In other words, the chip can have about 4x less transistors than a comparable x86 chip.
This means:
Simpler chips can also be run stably at higher clock frequencies than more complex chips of the same manufacturing process. (.18 micron, .22, etc)
Also, the optimized instructions have 70% or less operations than the original instructions.
I'm getting some of this from their earlier patent.
I think the Instruction Translation Cache is it. (Score:2)
From my understanding of Intel and AMD CPU's, what they do is convert the x86 instructions into groups of RISC instructions, which are then run by the core processor.
What the TransMeta CPU does is CACHE the results of the translations into a multi-megabyte on chip buffer.
So, while a Pentium III takes up to 20 cpu cycles to decode some of the more complex instructions, the TransMeta CPU takes even longer, but makes a better optimization. But once it's translated it's buffered away so that if it's needed again soon, it takes ZERO cycles to decode.
The TransMeta CPU then justifys the time cost of taking GROUPS of instructions, optimize the hell out of them, taking as long as it needs, then file the result for future use.
All the exception handling stuff is needed in case an exception happens in the middle of a group.
Say, for example your program contains the instructions "A,B,C,D".
The Transmeta CPU translates this into "1,2,3".
Further, lets say an error occurs at step "2".
The CPU then Rolls back (read up on Transaction processing) to the state before "1" executed.
Then it translates the instructions one at a time until it recrates (or fails to recreate) the exception.
It then Commits the changes to the emulated registers, and reports the exception at the point when it occurs in the origional code.
Put simply, this thing will KICK INTELS ASS. possible speed improvements of over 10 times.
and, the same principles could be applied to any other CPU instruction set.
This patent does not appear to cover emulating multiple Instruction Sets at once, but nothing stops it from being applied in that manner. it would be just as hard as doing it with a 'Prior Art' CPU design.
Nor does it seem to be FGPA related, but I suppose FPGA's could be used somewhere in it.
Veil, indeed. (Score:2)
*digs around for his aspirin*
- dom
My layman's explanation (Score:5)
1. Set of instructions comes into processor in one instruction set (like x86).
2. This device stores the data for this series of instructions temporarily
3. The device translates the (x86) instructions into its own internal instruction set and figures out an ordering that will not cause it to have exceptions.
4. The device retrieves the temporary data and "fills in the blanks" in the "inner" processor to get results, the so called "permanent storage" is probably the inner processor's instruction cache.
5. The data is cleared from the interim area once it's acted upon.
What didn't turn out? (Score:2)
Two years ago, there was room for some serious profits on CPUs as they were the most expensive component in a computer system.
That has changed such that the most expensive component is commonly the hard disk, followed (with MSFT software) the OS, with CPU in third or fourth place.
With that change, this leaves Transmeta without the viable "IA-32 market" they may have expected to have.
Based on the droppage in pricing, it is not clear that there is room to get vast decreases in pricing.
Of course, considering that Transmeta is fabless, and doesn't directly have a sales organization for CPUs, the goal might have been to construct technology to allow building cheaper IA-32 chips, and then license it to AMD or Intel or Cyrix.
I'm not sure any of them are necessarily interested to the tune of $Billions...
It means .. (Score:5)
The patent itself is more concerned with making sure that the conversion process occurs without any exceptions taking place .. or actually holding the processor state and waiting for a sequence of instructions to make sure no exception etc happens and then excuting it on the host processor.
They obviously also need strong compiler support for such a processor which explains all the software and compiler people they have been recruiting.
Fun, fun, fun .. who says Computer architecture is dead !
Sumit [uci.edu]
Switching contexts in a rapid state??? (Score:2)
I also see a lot of stuff about pointer manipulations. Maybe this is at the core of how they will attempt this (i.e. keep all "processes" in memory with their own vm space and then "swap" 'em out when necessary).
In my rough perusal, I may have missed some very important details. =)
Justin
It means (Score:4)
My money is on the latter, maybe Linus whipped together a Perl script in his lunch hour?
It helps if you run it through babelfish... (Score:4)
That's from english->portugese->english
my gawd (Score:2)
I think I can imagine the patent officers that were reading this going "um, billy-bob, do you know what any of this means?" and "um, no earl-ray, I have no clue what they're talking about. Must be that internet/computer mumbo-jumbo. Guess we'll just have to give it an okay..."
starting at points #7, it starts to make a bit more sense... basically a machine running a program and then wiping that program out of the memory...
this part had me tho...
"means for transferring memory stores to the means for permanently storing memory stores, and
means for storing memory data replaced by the memory stores, "
here we go...
"This invention relates to computer systems and, more particularly, to methods and apparatus for providing an improved microprocessor. "
Another line that confuses the hell out of me...
"It is difficult and expensive to make a microprocessor run as fast as state of the art microprocessors"
um, I'm not sure whether to say "duh." or "huh?"
What they said... (Score:2)
They're taking data/code from one processor/platform and shipping it to another for work, then (presumeably) shipping the results back. This will be tremendously useful in loadsharing situations where you don't have all the same hardware.
Picture a multiplatform Beowulf cluster built of a mixture of G3's, G4's, Pentium II's, Alpha's, SGI's, and a couple of Amiga's just to make it fun.
I guess you'd have to call this a Beomutt cluster. ;-)
D. Keith Higgs
CWRU. Kelvin Smith Library
What it Really does (Score:5)
The patent is for a co-processing unit that not Only translates an foreign instruction set into native instructions for a 'target processor', But, acts as a go-between for that target processor and memory. It stores the processor state, and buffers any memory writes, until it is certain that a group of instructions has been run without exception or error... If the translated instructions crash, no damage is done. Not only is this amazing overall, but it allows for Very speculative, and Very fast, instruction translation and branch prediction...
Transmeta speculation (segfault style) (Score:2)
"It amazes us that the geeks were able to interpret 'Apparatus for use in a processing system' as 'Wow, they've got something faster than Intel!'. That was our intent, of course, but we hate to see our bretheren fail a Turing test..."
A HERRING! (Score:4)
Really, tho', it could be a Red Herring. Transmeta could be cashing in on the popular assumption that they're going to create a wild new processor that'll be Everything to Everyone in order to disguise the fact that they're really in the process of opening the ULTIMATE multimedia porn sight for cyber-trans-sexuals.
(Not that there's anything wrong with that...)
Re:Huh? (Score:2)
This is the same method Merced/McKinley uses, isn't it? Does that count as prior art?
Re:Hmm ... (Score:3)
Re:YES, that's what I got (Score:2)
Re:Veil, indeed. (Score:3)
Quick Summary (Score:3)
SUMMARY OF THE INVENTION
It is, therefore, an object of the present invention to provide a host processor with apparatus for enhancing the operation of a microprocessor which is less expensive than conventional state of the art microprocessors yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors.
This and other objects of the present invention are realized by apparatus for use in a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor comprising means for temporarily storing memory stores generated until a determination that a sequence of translated instructions will execute without exception or error on the host processor, means for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor, and means for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.
These and other objects and features of the invention will be better understood by reference to the detailed description which follows taken together with the drawings in which like elements are referred to by like designations throughout the several views.
Transmeta (Score:5)
I'm sorry, but that is the closest I can get to an answer with the available information.
Transmeta Patent (Score:2)
yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors..."
think of it like this: the cpu is capable of reading the instruction set for another architecture, figuring out what that architecture needs from the cpu, determining all possibile instructions of that architecture, and "emulating" that architecture by a technique that allows the "emulation" to be as fast or faster than the original architeture (by taking advantage of the invented cpu's "extra" free stuff).
so, what that means is that the cpu would theoretically be able to run any OS designed for any instruction set (ie x86, alpha, mac, etc.)
or at least that's how i read it, but whoami
To be more specific... (Score:2)
As the patent points out, it applies equally to emulation on other (non-VLIW) superscalar architectures, but the emphasis does appear to be on VLIW.
Re:Safe cache (Score:2)
Quite like that FPGA supercomputer-on-a-desktop (Score:2)
This area is called Reconfigurable Computing, and it's been around for quite a few years (there's some quite reasonable supporting hardware available for it from Xilinx).
Transmeta's patent differs from that in the detail of course, but the general principle is remarkably similar, so much so that they've probably included references to it among the prior art somewhere.
This is not a new idea -- look at the PPro (Score:2)
Now we know what Transmeta does! (Score:2)
My take on this thing. (Score:2)
Huh? (Score:4)
I vote for it being the random patent generator. My favorite part of the whole solliloquy is
you can't beat that! Maybe they're really working on an optical processor and wanting us to think they're working on a universal processor that'll run any other processor's code. Good one, Linus (and others), but what's it really do
They deserve to read it (Score:2)
Sure Looks Related To Their Other Patents (Score:3)
The basic idea for all of the patents has been to provide mechanisms to allow one to:
This parallels the notion of Lagrangian Relaxation, where you take a problem, with various restrictions, and relax those restrictions. In exploring the solution space, the system will find solutions that aren't in the feasible solution space of the (unrelaxed) problem.
In the case of Lagrangian Relaxation, the way of coping with that is to associate values with the objective function that penalize infeasible solutions, thus encouraging the system to head towards feasible solutions.
In the case of Patent 5958061, the "relaxation" is that the system performs the emulated instructions, modifying a temporary memory store, and rolling back when it hits cases where the preliminary emulation results in errors on the host processor.
Patent 5832205 concentrated, in contrast, on the apparatus to detect a failure of speculation.
Hardware assists for binary-to-binary translation? (Score:2)
If the code does get a fault ("exception or error" - this could be an exception without being an error, e.g. a page fault), then anything that code did "speculatively" and that wouldn't have been done by the untranslated code had it gotten that exception hasn't made any permanent state change, so the fault cancels/backs out any uncommitted state changes and presumably traps to software that would do whatever is necessary to do what the untranslated code would have done.
WAKE UP (Score:2)
its for this > its for that
they are produceing a system that runs a code template !
(if you dont know work out how you can add ppc to a AS400 and not recompile)
the product will have multithreading in
OK
peace enough of this guessing
john
a poor student @ bournemouth uni in the UK (a deltic so please dont moan about spelling but the content)
FPGA - Field Gate Processors (Score:2)
http://bersj.www.media.mit.edu/~vmb/papers/chid
http://www.atmel.com/atmel/products/prod3.htm
Basically it is a fully configurable CPU that can be programmed on the fly to fully dedicate to a single goal and complete it very quickly. Your standard Intel is designed for general number chunching and nothing much else in particular. But if you had control over what each register and logic gate did you could make your processor totally dedicated and streamlined to complete a single task. as well as being programmable the cache is spead out so that each logic gate has its own cache rather than one lump of cache for the whole board, this also speeds things up.
Re:What it Really does (Score:2)
-Chris
Of Registers and other less obvious things (Score:2)
> These improvements include a gated store
> buffer and a large plurality of additional
> processor registers
"large plurality" sounds to me like a whole boatload more than the ones we used in those silly MIPS simulators to learn assembly theory and certainly more than any x86 chip I've seen.
I also saw some references to VLIW conversion, so I did another grep; I think this is one of the best paragraphs, and it's not in Greek...
----------
FIG. 2 is a diagram of morph host hardware designed in accordance with the present invention represented running the same application program which is being run on the CISC
processor of FIG. 1(a). As may be seen, the microprocessor includes the code morphing software portion and the enhanced hardware morph host portion described above. The target
application furnishes the target instructions to the code morphing software for translation into host instructions which the morph host is capable of executing. In the meantime, the target
operating system receives calls from the target application program and transfers these to the code morphing software. In a preferred embodiment of the invention, the morph host is a
very long instruction word (VLIW) processor which is designed with a plurality of processing channels. The overall operation of such a processor is further illustrated in FIG
--------
This pretty much gives away what people have been saying since the beginning. Morphing hardware AND software elements that work in conjuntion to provide (drum roll) a fast as HELL computer. And it will run software we already have. And pretty darn near anything you throw at it. Want to be a Playstation for a day? How about an O2? Now switch back to Pentium II so you can type up that report and then become a G4 so you can make the graphics to insert in the presentation that accompanies it.
This thing will be doing the code morphing in parallel (which is what this invention seems to be... the morpher) and then run it on another fast chip that's related to one of the earlier patents. And it will all be controlled by a little driver that turns into a "layer 1 vmware" (now that our computers will need an OSI layer model...
-Chris
YES, that's what I got (Score:2)
"The present invention overcomes the problems of the prior art and provides a microprocessor which is _faster_ that microprocessors of the prior art, is capable of running _all the software_ for _all the operating systems_ which may be run by a _large number of families of prior art microprocessors_, yes is less expensive than prior art microprocessors. (my emphasis)
in other words, the Holy Grail of computer architecture: processor emulation that's faster than the native processors. yes, sounds too good to be true, but at least it won't be vaporware...
Re:What about the "permanent bit" (Score:2)
Wha'dup with the "permanently storing memory stores temporarially stored?" It's pretty much decided that temporarially stored implies a cache. Take the code, translate it. Store it in the cache until it is verfied, then execute it.
Sounds like their temporary cache of instructions can be sent somewhere else for storage once they have been verified.
Hmmm. HHHMMMMmmmmmm.....
So does that mean it can execute, say x86, code in emulation, and at the same time translate into native transmeta opcode in order to be run natively at a later time? It's doable. I can picture a basic flow diagram circuit in my head right now.
it has to do with fast translation and exceptions (Score:2)
Chances are the code fragments are basic blocks (between branches etc)
I think I've seen this idea by another name before so I'd guess there's prior art - but hey it's a patent you can read anything the lawyers can get away with into it ....
Re:What it Really does (Score:2)
determination is made that a sequence of translated instructions will generate an exception or error on the host processor [empahsis mine]
It seems to me that what they are doing here is making sure that the translation is correct, i.e. that the native instructions make sense. It does not do anything about any memory writes that might take place as a result of the execution of those native commands. Remember that the BSOD in windows comes as a result of the execution of a valid set of x86 instructions that mess up memory in a way that stops the application/system from functioning properly. What TM is talking about here would not effect that sort of thing at all (the chip logic would have no idea that writing to memlocation x would screw up the running of your app)
This implies to me that whatever mechanism TM is using to quickly translate (say) x86 -> TM instruction set can cause a set of instructions that make no sense (for instance a value is written to a register then another a value written to the same register without the first value being used at all -- that might not be a very good example but its that sort of thing )
The Prior Art section is excellent... (Score:2)
It is cogent, well written and covers a lot of ground. Someone really did their homework on that!
Much of the rest of the patent application is as deliberately dense as they can make it. Including one run-on sentence that would take me three huge breaths to speak aloud :-)
For information on what this thing actually does, read the 'DETAILED DESCRIPTION' section. On interesting fact gleaned from there in a quick reading is the fact the emulation co-processer is called a 'morph host' and it apparently executes some kind of special opcodes used for emulation. So to do the emulation you write 'code morphing software' that translates incoming instructions to the 'morph host' instruction set. Very Cool! And, of course, the 'transactioning' and error checking stuff noted in prior posts.
It is looking more and more like the early rumors of a Transmeta 'emulate anything' design were on the nose...
Jack
Have we really thought this through yet?? (Score:3)
Couldn't they theoretically (siq) be working on a system that would allow you to run MULTIPLE instruction sets inside of single OS?? The implications would turn the existing software industry (of which I am a part) onto it's ear!
Could we actually have a box running some form of unix, and actually be able to run ANY application natively on it - no matter what OS it was written for?? Think about running a BeOS app next to a Win32 app, next to an application compiled for i386 Redhat! WOAH.
If this is even close to what actually exists in Transmeta's labs, then we are in for a serious roller coaster over the next couple of years!
Quivering with anticipation...
p.
Re:Hmm ... (Score:5)
TRANSlatingMETAprocessor?
Re:Code Morphing..... (Score:2)
Re:Transmeta (Score:2)
PowerPC-based AS/400's don't have microcode in the CPU, as far as I know. The older IMPI ones had two levels of what was called "microcode", but the Inside the AS/400 book by Frank Soltis (one of the architects of S/38 and AS/400) said the "vertical microcode" was just machine code and was called "microcode" for legal reasons (if it was software, IBM would have to unbundle it; it was "microcode", however, which meant they could bundle it with the hardware). The "horizontal microcode" was conventional microcode, used to implement the IMPI instruction set.
I.e., the emulation is done largely in software, by translation of the high-level "MI" instruction set into the native instruction set (IMPI or extended PowerPC), although that software was, at one point, called "microcode".
The processor described in the various Transmeta patents also appears to do that translation in software, not hardware; this patent says
(emphasis mine).
exception trapping and the Alpha (Score:2)
That's exactly what I got out of that part. And it sounds pretty cool. This particulary would have applications in multiprocessing systems.
On the Alpha, we handle exceptions using something called trap barriers, which is a software method of handling this sort of thing. What happens is a fault appears to issue from the trapb and you are left to your own devices(from a compiler perspective) to discover where the exception occurs. It isolates the exception down to what's called a "trap shadow". This translates to a pain in the ass because we don't know precisely where a fault issued from, only the "shadow". Multiproc's complicate this mess further. This makes for interesting, but complicated compiler development.
Moving this to hardware, OTOH, would greatly simplify things, especially when emulation adds a layer of obfuscation.
That's why the exception handling part is what I zero'ed in on. It sounds really neat.
Clustering? (Score:2)
You do, however, show an interesting notion; if the patented matters reveal a "protocol" for allowing this emulation, it may make it plausible to have multiple "little processors" doing work, and getting to change Real Memory when it makes sense to commit the work.
It would certainly be neat if this were amenable to putting a bunch of "little processors" working together. The communications takes place at a much lower level than Beowulf; it may even be at a lower level than is done with SMP.
Re:Processors (Score:2)
Re:Hmm... (Score:2)
But if they could do something like that -- not just for Java, but other environments that do better with dynamic compiling (like polymorphic OO systems, e.g. Smalltalk, Common Lisp) -- that would mean a real revolution in programming. The advantages of C for anything other than systems programming would be greatly diminished.
Of course, if the translation is all hardcoded, that's unlikely to be very helpful for higher-level languages. And maybe the translation assumes some sort of commonality -- registers and the sort -- that most processors share, but wouldn't be shared by most sorts of bytecodes. This reminds me of what Linus was talking about in his article on the portability of Linux.
Re:WAKE UP (Score:4)
Much of the audience may not be familiar with AS/400's, so that's not necessarily much of a hint.
System/38 and AS/400 compilers generate code in a high-level pseudo instruction set; the low-level OS kernel, when told to run one of those programs, translates it into the native instruction set and runs that. (See Frank Soltis' Inside the AS/400; go to the 29th Street Press's home page [29thstreetpress.com] and select "General Interest" under "*** ALL AS/400 TOPICS ***", and then look for that book, which they claim to have online - the URLs on that site look depressingly dynamically-generated, so I'm loath to make a direct link.)
This let them change the native instruction set from the apparently 360-flavored "IMPI" to an extended PowerPC instruction set without requiring people to recompile programs (unless they tossed out the pseudo instruction set code to save disk space).
From the various Transmeta patents, it sounds as if they're building a chip intended to be used in an environment making use of binary-to-binary translation, as the S/38 and AS/400 do, but it's not at all clear that they intend to use B2B translation in exactly the same fashion - they appear to be targeting existing low-level instruction sets, e.g. x86, rather than some high-level instruction set like the S/38 and AS/400 "MI".
what it REALLY does (Score:2)
There's a somewhat interesting write up on CNN [cnet.com] (from the time of the first patent, nov. 98). There seemed to be some posts that missed who transmeta really is - it's owned by Paul Allen, who also owns Interval (another think tank). His whole goal has been to recreate his PARC days, when really smart people could team up and work on just about anything they wanted (the result we all know, since we're using it).
Transmeta's computer does at the processor level what JIT and Java do for software. Java lets you write one program and run it on many OS's. JIT speeds that process by pre-translating java byte codes into native code.
The transmeta box will allow a chip manufacture to make a single chip, that will run any OS, and (by cacheing instruction conversions, as well as memorizing repeated instructions) actually run them all faster than the zillion chips AMD, Intel and the rest are cranking out.
Think about it: Universal hardware, universal applications, and plethora of invisible middleware.
Welcome to the future. You heard it here first. Too bad you can't by stock in Transmeta....
$.02
Potential Hardware to Suppoert Binary Retargetting (Score:2)
Given the rate of corporate take-overs, you could quite easily end up running a zillion different systems and lose valuable time in trying to consolidating everything.
Oh well, add this to the speculation pile along with everyone else.
LL
Re:No you're all wrong: it's for Emacs (Score:5)