Transmeta Awarded Another Patent 345
Eric_Scheirer writes "You can read it here. Can someone explain to me what it means?"
An age is called Dark not because the light fails to shine, but because people refuse to see it. -- James Michener, "Space"
Re:Behind the technology - the business (Score:2)
Re:Hypocrites! (Score:1)
Actually, it doesn't know there won't be an error (Score:2)
This patent exists because they can't determine ahead of time if there will be an error. This patent, inconjunction with one of their other patents, provides a method for them to muddle on in the usual case of no error and still have a means of rolling back in the less usual case of an error.
This particular patent convers stores that are speculatively executed, but may need to be killed because an instruction that occurred logically before them in the original code stream faulted.
--Joe--
Re:What it Really does (Score:1)
Re:What it Really does (Score:1)
Re:It means .. (Score:1)
Sure, and I use that state-of-the-art parallelizing compiler on my 8-fold XEON SMP system to get a blazing fast application.. oh, sorry, there is no such compiler that turns a random program efficiently into a parallelized one? Oops.
So please tell me why the engineers at Intel or AMD don't fire on their logic optimizers to implement the processor instructions in such an optimal manner?
Re:YES, that's what I got (Score:1)
Just in case you still dont get it... (Score:1)
They were awarded a patent for a universal hardware based processor emulator.
Generally speaking of course...this is similar to how AMD's K6 family of processors work..by converting X86 instructions to faster RISC instructions...but thiers operates on a broader scale.
Patent requirements (Score:2)
When writing a patent I believe you are required to phrase the abstract as one sentence. I suppose it was originally intended to show the purpose of a patent as shortly and concisely as possible.
That clearly is not happening. But like so many other things to do with the patent office, this outdated requirement has been preserved.LoppEar
Run it through babelfish...And play backwards! (Score:1)
-BF
Re:Processors (Score:1)
Re:What about the "permanent bit" (Score:2)
But why have any reference to permanent storage if the data just gets shipped back and forth between caches?
Re:Hmm ... (Score:1)
Re:Huh? - Time to call on your Grammer classes =] (Score:1)
"circuitry for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor"
Here is how it should look:
"ciruitry for 'permanently storing memory stores' temporarily stored when...."
It might be easier if you abbreviate it..
"ciruitry for PSMS's temporarily stored when.."
Make sense now? =] The whole dern document is like that. It's sick..I had to read certain sentences multiple times.. =] It's like reading a book with no periods, no capitals, no nothing. Insane. But yeah...Still sounds cool.. =]
Re:Wait... (Score:1)
I didn't know he had a doctorate.
I didn't even know he owned Transmeta.
But is instruction level compatibility enough? (Score:1)
anybody work in chip fab? (Score:1)
Re:Hmm ...[wrong url sorry] (Score:1)
VLIW [ibm.com]
Stephen King??? (Score:2)
attourny's name is Stephen King?
Is this whole thing some kind of cruel
quasi-technological/horrific hoax?
Re:Wait... (Score:1)
Translation for Geeks (Score:1)
0000000 7041 6170 6172 7574 2073 6f66 2072 7375
0000010 2065 6e69 6120 7020 6f72 6563 7373 6e69
0000020 2067 7973 7473 6d65 6820 7661 6e69 2067
0000030 2061 6f68 7473 7020 6f72 6563 7373 726f
0000040 6320 7061 6261 656c 6f20 2066 7865 6365
....
Geek talk vs. lawyer talk.
Re:What if... (Score:2)
Read the patent, not the comments. Many people seem to think the processor would translate instructions itself, perhaps because the patent goes on about
but the patent later indicates that "the processor" does that translation by running translation software:
Not bother storing the results of that instruction.
...or, at least, make it look as if it didn't.
Why? The error could trap, and the trap handler (or code it invokes) could do whatever is necessary to simulate the what the processor being emulated would do in that error situation (although the "exceptions" they talk about aren't necessarily errors - I scrolled past one example of "native-Transmeta" code, generated from x86 code, that assumed that the code doesn't make unaligned memory references that cross a page boundary; if that happened, "either hardware or software alignment fix up" would detect this, and perhaps generate more pessimistic and slower code and restart the emulation running the new code).
Hmm ... (Score:2)
I think it said "we've got a cpu that pretends to be another cpu and it needs a place to store the instructions it's actually running to run the instructions it thinks it's running, and we're patenting how it does that." But I'm probably wrong.
It fits with horizontal microprogramming... (Score:1)
Now, the specifics in this case seem to be a provision for cacheing the results of parallel instruction execution and then either voiding or writing that cache depending on whether the instructions cause an exception or not. That in itself is nothing new, but in combination with "just-in-time" compilation of, say, x86 code to Transmeta microcode it might be. Particularly if the Transmeta processor uses horizontal microprogramming (read, "very very very long instruction word") to speed up the processing. (Loosely speaking, with horizontal microcode, each bit in the very long instruction word (could be hundreds of bits) maps to a discrete piece of logic (gate, flip/flop, etc) in the processor. Given an appropriate processor design it might be possible to map several instructions in a more vertical set (x86, PPC, etc) to a single wide microinstruction, effectively executing all of those in parallel, but then you really need some way of flushing everything if it screws up. Which this patent provides.)
(It'll be interesting to see if the active microcode store is loadable from RAM (making it end-user microprogrammable) or just from a fixed set of microprograms on ROM (which may live on the CPU die)).
Re:Hmm ... (Score:1)
A compiling processor! (Score:1)
If the TransMeta ideal propagates widely, it will be a new world of software design. Instead of compiling for a processor, you only need to pre-compile into a pseudocode, provide a description of that pseudocode to TransMeta, and the processor takes care of final translation, and apparently a good deal of the debugging work too. As a development platform this sounds like a serious ideal. In a world of open source and platform independence, the TransMeta sounds like a real solution.
Yeah, I'm a Mac programmer. You got a problem with that?
Rolling back is useful in translation. (Score:2)
If your doing ISA->ISA translation, rollback is very, very useful. Suppose we have an x86-style instruction like this:
MOV [EAX*4 + 4], EBX
This particular instruction does several things: It reads EAX, multiplies it by 4, adds 4 to it, and then stores EBX at the address genereated to the desired location in memory. This might break up into several RISC-like ops: (these are written in the more traditional RISC form OP src, src, dst for clarity)
SHL EAX, 2, tmp1
ADD tmp1, 4, tmp2
STORE EBX, *tmp2
It doesn't take a brain surgeon to see that these steps could overlap their execution with other instructions. For instance, the instruction that calculates EBX could overlap execution with the left-shift and the add. If the original instruction was in a tight loop, then it could even overlap with itself! Why is this important?
Say you have some code which is stepping through the array, and say that the array spans a page boundary. And, say that the second page isn't "paged in." When the loop hits the page boundary, a fault will occur. Because the stores are being spooled extremely rapidly, the loop may not be informed that a particular STORE faulted until several other stores were executed. All stores after the faulting one need to be killed since we need to process exceptions in a precise order. Here's where the rollback becomes handy: We merely discard the extra, incorrect stores, and roll back the processor state to be consistent with the emulated state of the machine at the time of the fault.
This is what most current x86 clones do when they translate x86 instructions into "RISCops" or whatever they decide to call them. I'm guessing Transmeta is aiming to do a similar sort of translation, only with a more configurable flavor.
--Joe--
Looks like.... (Score:2)
Missing link? (Score:1)
Send the flying saucers in.
They will not produce processors (Score:1)
Re:It means .. (Score:1)
"The number of suckers born each minute doubles every 18 months."
Sounds like fast emulation (Score:1)
Summary of the Invention (Score:2)
Abstract (very) for the lazy folks (Score:3)
processor, and circuitry for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.
hmmm???
Safe cache (Score:2)
This is a device for assisting in processor emulation: I believe it will hold commands in memory until it knows that they will execute without error. Quite a good idea.
Simple, elegant, and not obvious. All the requirements for a good patent.
This is really the sort of thing that Windoze really needs: a 'this instruction would cause the program to do "bad stuff(TM)", so I won't allow it. It should stop a single process for taking whole systems down.
Re:My layman's explanation (Score:2)
If they DO get an exception ... then they use the described hardware to throw away the side effects of executing the code fragment, and interpret the x86 instructions from the start - doing all the proper instruction semantics .. when they get the exception again this time they know what the PC is and what all the flags and registers etc are
A few other things (Score:5)
In a preferred embodiment of the invention, the morph host is a very long instruction word (VLIW) processor which is designed with a plurality of processing channels.
I'm not going to go into huge detail about VLIW machines (particularly since I don't know all that much about them
Regarding the instruction translation and subsequent caching I mentioned in my previous post, a quote from the patent illuminates the matter a little more:
The code morphing software of the microprocessor...includes a translator portion which decodes the instructions of the target application, converts those target instructions to the primitive host instructions capable of execution by the morph host, optimizes the operations required by the target instructions, reorders and schedules the primitive instructions into VLIW instructions (a translation) for the morph host, and executes the host VLIW instructions.
When the particular target instruction sequence is next encountered in running the application, the host translation will then be found in the translation buffer and immediately executed without the necessity of translating, optimizing, reordering, or rescheduling. Using the advanced techniques described below, it has been estimated that the translation for a target instruction (once completely translated) will be found in the translation buffer all but once for each one million or so executions of the translation. Consequently, after a first translation, all of the steps required for translation such as decoding, fetching primitive instructions, optimizing the primitive instructions, rescheduling into a host translation, and storing in the translation buffer may be eliminated from the processing required. Since the processor for which the target instructions were written must decode, fetch, reorder, and reschedule each instruction each time the instruction is executed, this drastically reduces the work required for executing the target instructions and increases the speed of the microprocessor of the present invention.
Transmeta seems to have an excellent idea here. They're caching optimized translations of the incoming instructions, so rather than have to translate and optimize over and over each time you see that bit of code, you do it once and then just grab it from the cache. Due to the spatial and temporal locality of programs (ie the fact that your accesses to instructions are not random, but are localized in loops, etc), this cache ("translation buffer") will only fail to have a translation present once every million instructions. So you're doing *one* translation every million cycles, rather than a million translations like current processors would have to do. Interestingly enough, a scheme like this was brought up as a discussion item in my Superscalar Processor Design class a couple of weeks ago, though my professor used the example of an specialized Alpha decoding/translating x86 and caching the results. One might even write the translations back out to disk as an attachment to the original executable, so that the next time you run the program that's fewer translations you have to do, and eventually you'll have a fully translated version on your hard disk for optimal speed. I guess we'll just have to wait to see if Transmeta does something similar.
One embodiment of the enhanced hardware includes sixty-four working registers in the integer unit and thirty-two working registers in the floating point unit. The embodiment also includes an enhanced set of target registers which include all of the frequently changed registers of the target processor necessary to provide the state of that processor; these include condition control registers and other registers necessary for control of the simulated system.
It seems this new chip is going to have a lot of registers. As Cartman would say, sweeeeeet!
The patent [164.195.100.11] also provides some sample C code, the corresponding x86 assembly, and some sample optimizations the Transmeta system may perform. It's a little more than half way down the page, if you want to look, just scroll until you see code
Re:Hmm ... (Score:2)
Re:What it Really does (Score:3)
Happens all the time, although there's already ways of dealing with it. Consider virtual memory. Having to redo an instruction, because some exception occurred in the middle of it, isn't very .. um .. exceptional.
But I can't think of how this relates to the Transmeta speculations. Well, actually I can, but my theory is so wild-ass that everyone would laugh at me.
Oh, what the hell. This is as good a place as any for me to make a complete fool of myself... I think Transmeta is making a display circuit that instead of fetching each pixel from a frame buffer, executes a little program for each pixel. The program must execute incredibly fast since the result must be available before the horizontal scan goes to the next pixel.
There, I said it. Now everyone can back away from me quietly, and then point and laugh when they reach a safe distance.
---
Have a Sloppy day!
Re:FPGA - Field Gate Processors (Score:2)
The patent says that "emulation software" would translate x86 or whatever code into "native Transmeta" code (see other postings of mine in this thread, many of which amount to "software translation, dammit, not hardware translation").
As such, I don't know why this need involve any FPGAs at all - the patent doesn't seem to describe a processor that can be configured at the hardware level to run arbitrary instruction sets, it appears to describe a processor that lets software (presumably running on that processor) translate other instruction sets into the native instruction set making optimistic assumptions about what the code being translated does, get exceptions if those assumptions are invalid (with the exception handler presumably doing more pessimistic translations and retrying with the new code), and not have to worry about irreversible state changes having been made by overly-optimistically-translated code.
Further clarification (Score:5)
The beginning of the patent ("claims") is essentially just a list of things that all modern, superscalar, out-of-order processors do, and saying "hey we do this too".
Basically, out-of-order machines execute instructions out of their program order (hence the name
If you get past all the uninteresting stuff like that in the beginning, you'll find the following:
"The present invention overcomes the problems of the prior art and provides a microprocessor which is faster than microprocessors of the prior art, is capable of running all of the software for all of the operating systems which may be run by a large number of families of prior art microprocessors, yet is less expensive than prior art microprocessors. "
The idea it seems is that rather than making complex hardware to execute the instructions and perform speed enhancements, they're doing speed optimizations in software. Which in turn allows very simple hardware(which in turn should translate to really high clock speeds). It seems that Transmeta's bet with this is that the penalty incurred by doing software rather than hardware optimizations is offset by the increase in clock speed and decrease in hardware cost.
Using such an approach should also make running multiple instruction sets a much easier task. Currently processors do their instruction decoding in hardware. But if Transmeta has managed to do this decoding (fast) in software, then they can just add a little more software to allow multiple instruction sets. They also seem to be caching the translations of non-native to native instructions in a memory structure of some sort, so that they minimize the redundant emulation computations.
Actually, to address gupg's comment, it also seems that they should not need *any* special compiler support, because they can run stuff that was compiled for any of the various instruction sets they choose to support. So they themselves should not need to do compiler work. I would guess that the reason they're hiring all sorts of compiler folks is that they need people to do the afore-mentioned software instruction translation, and the people best suited for that are compiler people since they work on the instruction level all the time. Most other programmers don't have to deal with anything other than high-level languages, and so would not be particularly well suited to doing what Transmeta is doing.
Anyway, hopefully this explained things a bit more to everyone. My reading and explanation of the patent was pretty quick since I have to go to class in a few minutes. I'll finish reading the patent afterwards and add anything else I think you might like to know.
Cheers,
Stradivarius
Re:More from the Patent (Score:2)
You have to admit that translating code to native in parallel with actually execution would be pretty cool though.
It's a hardware-assisted just-in-time compiler... (Score:2)
This patent (and, yes it's in English - but patent-lawyer English) apparently implies a hardware-based mechanism to store translated instructions in a on-chip cache and then execute them afterwards, hoping that at that point other tricks like pipelines and multiple instruction units will be able to do their thing.
In a normal emulator, you get relatively little benefit from the normal on-chip caches and pipelines. This would seem like an interesting way to speedup a X86 (or even PPC) emulator.
And if you think there's little use for this, think "Java Virtual Machine". Think "hardware-assisted just-in-time compiler"...
Re:Summary (Score:2)
Umm, because they don't have the source code to all the, say, x86-architecture programs they might want to run?
"Is going to make" isn't the same as "has made". Yes, typing make to get "native-Transmeta" machine code for your application may not require all the work that this patent involves, but it involves, instead, waiting for open source versions of the programs they're interested in showing up, and they may not be willing to wait for that.
Transmeta Patents Dynamic Recompilation? (Score:2)
I doubt it has the same kind of exception handling as we see described in this patent, though. Them TransMetans do some funky stuff
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
Once you pull the pin, Mr. Grenade is no longer your friend.
Re:I see one problem.... (Score:2)
Add more code to their binary-to-binary translator software to handle that new instruction. The processor isn't doing the translation, except to the extent that it runs the translation software (see other postings of mine in this thread for the quote from the patent that speaks of "emulation software").
Behind the technology - the business (Score:2)
I concur, and I would advise reading of the DETAILED DESCRIPTION if you scroll the page down a bit.
What is interesting is that they appear to have created a CPU capable of running applications designed for one of many target systems (Intel x86/Pentium, PPC, Postscript and Java even) by buffering the instructions, optimising them, and then checking their execution for errors before execution occurs. Quite brilliant and mind-bogglingly complicated.
Note the business-angles hinted at: speed and optimisation come at significant cost; cost of producing any microprocessor is out of reach of most companies (inferring a mass market), large number of applications written for many targets (Windows, java, etc.), problems associated with traditional thinking with regard to optimization and parralel processing.
To create a microprocessor which overcomes the above at viable cost to both manufacturer and customer would be enormous!
Just think of it, you're running the Transmeta CPU which is running some OS, and running Office 2000 through it and knowing that the CPU will trap any problems before they occur! This is a hardware VM-Ware!
Ooh I'm drooling already!
James Green
Re:Looks like.... (Score:2)
Combine it with a slot T scheme (T for transmeta
Perhaps their little CPU thing will also let it experiment if there are multiple CPUs and idle time. Send an instruction off to CPU A and CPU B at the same time and see who gets a valid answer back the soonest. That might be why it needs its own memory - so it can queue the same task up for multiple CPUs but then delete it from the other CPUs queue once it has a reply back from whichever CPU got to it first.
Combine that with a CPU bus and things could get interesting. Perhaps you stick the transmeta CPU in a regular CPU slot/socket, and then stick an daughter board into an AGP like slot.
Or alternatively, they could be going a bit like the amiga, and have semi-specialised CPUs, but which can also be used for other things. So your sound card can also do general CPU tasks if it isn't playing enough audio to take up its full load, and your video card could do other stuff. Though I tend to think it'd be more sensible to just form a CPU bus for all the cards, and just have a set of adapter plugs on the m/b. So you could get a m/b with 5 peripheral ports, so if you want to use 1 visual, 1 audio, 1 joystick, 1 mouse and 1 keyboard then thats fine, but they could also do 2 visuals, 2 audio and 1 keyboard or whatever.
So that way you get a semi-specialized video CPU card if you want hot graphics, but its CPU can also be used for general stuff (and the transmeta controller helps with translations), and you might get a fast general CPU for regular stuff - but it can also do your graphics work as well if it gets to be too much for the video CPU card alone (or you just don't have one at all).
So you could mix'n'match differently specialized CPUs on the CPU bus, as well as just add a new one every year for the latest speed and keep the old one there too (presumably the controller CPU would be able to shut down the slowest CPUs to save on electricity if they weren't needed).
Though I don't think the CPU bus is too realistic, as is technology really up to that kind of thing?
(a fibre optic backbone between CPUs that multicasts requests and the CPUs just pick up requests based on how full their internal queues are?
I don't know, the possibilities are endless. But I'm quite content to just wait and see what (if anything) transmeta actually comes out with. It's just fun to play around with guessing
Hmm, maybe I should get an account here someday.
--Vastor.
Re: How do you do this all and make it cheaper? (Score:2)
Because the only instructions sent to the processor (after optimization) are instructions that are known to succeed.
The process of optimization is based in software as well- the instruction translation (code morphing, they call it) software is written in code native to the VLIW chip.
IE: there are no speculative instruction paths on the VLIW chip. There are something like 4? on the PIII.
In other words, the chip can have about 4x less transistors than a comparable x86 chip.
This means:
Simpler chips can also be run stably at higher clock frequencies than more complex chips of the same manufacturing process. (.18 micron, .22, etc)
Also, the optimized instructions have 70% or less operations than the original instructions.
I'm getting some of this from their earlier patent.
I think the Instruction Translation Cache is it. (Score:2)
From my understanding of Intel and AMD CPU's, what they do is convert the x86 instructions into groups of RISC instructions, which are then run by the core processor.
What the TransMeta CPU does is CACHE the results of the translations into a multi-megabyte on chip buffer.
So, while a Pentium III takes up to 20 cpu cycles to decode some of the more complex instructions, the TransMeta CPU takes even longer, but makes a better optimization. But once it's translated it's buffered away so that if it's needed again soon, it takes ZERO cycles to decode.
The TransMeta CPU then justifys the time cost of taking GROUPS of instructions, optimize the hell out of them, taking as long as it needs, then file the result for future use.
All the exception handling stuff is needed in case an exception happens in the middle of a group.
Say, for example your program contains the instructions "A,B,C,D".
The Transmeta CPU translates this into "1,2,3".
Further, lets say an error occurs at step "2".
The CPU then Rolls back (read up on Transaction processing) to the state before "1" executed.
Then it translates the instructions one at a time until it recrates (or fails to recreate) the exception.
It then Commits the changes to the emulated registers, and reports the exception at the point when it occurs in the origional code.
Put simply, this thing will KICK INTELS ASS. possible speed improvements of over 10 times.
and, the same principles could be applied to any other CPU instruction set.
This patent does not appear to cover emulating multiple Instruction Sets at once, but nothing stops it from being applied in that manner. it would be just as hard as doing it with a 'Prior Art' CPU design.
Nor does it seem to be FGPA related, but I suppose FPGA's could be used somewhere in it.
Veil, indeed. (Score:2)
*digs around for his aspirin*
- dom
My layman's explanation (Score:5)
1. Set of instructions comes into processor in one instruction set (like x86).
2. This device stores the data for this series of instructions temporarily
3. The device translates the (x86) instructions into its own internal instruction set and figures out an ordering that will not cause it to have exceptions.
4. The device retrieves the temporary data and "fills in the blanks" in the "inner" processor to get results, the so called "permanent storage" is probably the inner processor's instruction cache.
5. The data is cleared from the interim area once it's acted upon.
What didn't turn out? (Score:2)
Two years ago, there was room for some serious profits on CPUs as they were the most expensive component in a computer system.
That has changed such that the most expensive component is commonly the hard disk, followed (with MSFT software) the OS, with CPU in third or fourth place.
With that change, this leaves Transmeta without the viable "IA-32 market" they may have expected to have.
Based on the droppage in pricing, it is not clear that there is room to get vast decreases in pricing.
Of course, considering that Transmeta is fabless, and doesn't directly have a sales organization for CPUs, the goal might have been to construct technology to allow building cheaper IA-32 chips, and then license it to AMD or Intel or Cyrix.
I'm not sure any of them are necessarily interested to the tune of $Billions...
It means .. (Score:5)
The patent itself is more concerned with making sure that the conversion process occurs without any exceptions taking place .. or actually holding the processor state and waiting for a sequence of instructions to make sure no exception etc happens and then excuting it on the host processor.
They obviously also need strong compiler support for such a processor which explains all the software and compiler people they have been recruiting.
Fun, fun, fun .. who says Computer architecture is dead !
Sumit [uci.edu]
Switching contexts in a rapid state??? (Score:2)
I also see a lot of stuff about pointer manipulations. Maybe this is at the core of how they will attempt this (i.e. keep all "processes" in memory with their own vm space and then "swap" 'em out when necessary).
In my rough perusal, I may have missed some very important details. =)
Justin
It means (Score:4)
My money is on the latter, maybe Linus whipped together a Perl script in his lunch hour?
It helps if you run it through babelfish... (Score:4)
That's from english->portugese->english
my gawd (Score:2)
I think I can imagine the patent officers that were reading this going "um, billy-bob, do you know what any of this means?" and "um, no earl-ray, I have no clue what they're talking about. Must be that internet/computer mumbo-jumbo. Guess we'll just have to give it an okay..."
starting at points #7, it starts to make a bit more sense... basically a machine running a program and then wiping that program out of the memory...
this part had me tho...
"means for transferring memory stores to the means for permanently storing memory stores, and
means for storing memory data replaced by the memory stores, "
here we go...
"This invention relates to computer systems and, more particularly, to methods and apparatus for providing an improved microprocessor. "
Another line that confuses the hell out of me...
"It is difficult and expensive to make a microprocessor run as fast as state of the art microprocessors"
um, I'm not sure whether to say "duh." or "huh?"
What they said... (Score:2)
They're taking data/code from one processor/platform and shipping it to another for work, then (presumeably) shipping the results back. This will be tremendously useful in loadsharing situations where you don't have all the same hardware.
Picture a multiplatform Beowulf cluster built of a mixture of G3's, G4's, Pentium II's, Alpha's, SGI's, and a couple of Amiga's just to make it fun.
I guess you'd have to call this a Beomutt cluster. ;-)
D. Keith Higgs
CWRU. Kelvin Smith Library
What it Really does (Score:5)
The patent is for a co-processing unit that not Only translates an foreign instruction set into native instructions for a 'target processor', But, acts as a go-between for that target processor and memory. It stores the processor state, and buffers any memory writes, until it is certain that a group of instructions has been run without exception or error... If the translated instructions crash, no damage is done. Not only is this amazing overall, but it allows for Very speculative, and Very fast, instruction translation and branch prediction...
Transmeta speculation (segfault style) (Score:2)
"It amazes us that the geeks were able to interpret 'Apparatus for use in a processing system' as 'Wow, they've got something faster than Intel!'. That was our intent, of course, but we hate to see our bretheren fail a Turing test..."
A HERRING! (Score:4)
Really, tho', it could be a Red Herring. Transmeta could be cashing in on the popular assumption that they're going to create a wild new processor that'll be Everything to Everyone in order to disguise the fact that they're really in the process of opening the ULTIMATE multimedia porn sight for cyber-trans-sexuals.
(Not that there's anything wrong with that...)
Re:Huh? (Score:2)
This is the same method Merced/McKinley uses, isn't it? Does that count as prior art?
Re:Hmm ... (Score:3)
Re:YES, that's what I got (Score:2)
Re:Veil, indeed. (Score:3)
Quick Summary (Score:3)
SUMMARY OF THE INVENTION
It is, therefore, an object of the present invention to provide a host processor with apparatus for enhancing the operation of a microprocessor which is less expensive than conventional state of the art microprocessors yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors.
This and other objects of the present invention are realized by apparatus for use in a processing system having a host processor capable of executing a first instruction set to assist in running instructions of a different instruction set which is translated to the first instruction set by the host processor comprising means for temporarily storing memory stores generated until a determination that a sequence of translated instructions will execute without exception or error on the host processor, means for permanently storing memory stores temporarily stored when a determination is made that a sequence of translated instructions will execute without exception or error on the host processor, and means for eliminating memory stores temporarily stored when a determination is made that a sequence of translated instructions will generate an exception or error on the host processor.
These and other objects and features of the invention will be better understood by reference to the detailed description which follows taken together with the drawings in which like elements are referred to by like designations throughout the several views.
Transmeta (Score:5)
I'm sorry, but that is the closest I can get to an answer with the available information.
Transmeta Patent (Score:2)
yet is compatible with and capable of running application programs and operating systems designed for other microprocessors at a faster rate than those other microprocessors..."
think of it like this: the cpu is capable of reading the instruction set for another architecture, figuring out what that architecture needs from the cpu, determining all possibile instructions of that architecture, and "emulating" that architecture by a technique that allows the "emulation" to be as fast or faster than the original architeture (by taking advantage of the invented cpu's "extra" free stuff).
so, what that means is that the cpu would theoretically be able to run any OS designed for any instruction set (ie x86, alpha, mac, etc.)
or at least that's how i read it, but whoami
To be more specific... (Score:2)
As the patent points out, it applies equally to emulation on other (non-VLIW) superscalar architectures, but the emphasis does appear to be on VLIW.
Re:Safe cache (Score:2)
Quite like that FPGA supercomputer-on-a-desktop (Score:2)
This area is called Reconfigurable Computing, and it's been around for quite a few years (there's some quite reasonable supporting hardware available for it from Xilinx).
Transmeta's patent differs from that in the detail of course, but the general principle is remarkably similar, so much so that they've probably included references to it among the prior art somewhere.
This is not a new idea -- look at the PPro (Score:2)
Now we know what Transmeta does! (Score:2)
My take on this thing. (Score:2)
Huh? (Score:4)
I vote for it being the random patent generator. My favorite part of the whole solliloquy is
you can't beat that! Maybe they're really working on an optical processor and wanting us to think they're working on a universal processor that'll run any other processor's code. Good one, Linus (and others), but what's it really do
They deserve to read it (Score:2)
Sure Looks Related To Their Other Patents (Score:3)
The basic idea for all of the patents has been to provide mechanisms to allow one to:
This parallels the notion of Lagrangian Relaxation, where you take a problem, with various restrictions, and relax those restrictions. In exploring the solution space, the system will find solutions that aren't in the feasible solution space of the (unrelaxed) problem.
In the case of Lagrangian Relaxation, the way of coping with that is to associate values with the objective function that penalize infeasible solutions, thus encouraging the system to head towards feasible solutions.
In the case of Patent 5958061, the "relaxation" is that the system performs the emulated instructions, modifying a temporary memory store, and rolling back when it hits cases where the preliminary emulation results in errors on the host processor.
Patent 5832205 concentrated, in contrast, on the apparatus to detect a failure of speculation.
Hardware assists for binary-to-binary translation? (Score:2)
If the code does get a fault ("exception or error" - this could be an exception without being an error, e.g. a page fault), then anything that code did "speculatively" and that wouldn't have been done by the untranslated code had it gotten that exception hasn't made any permanent state change, so the fault cancels/backs out any uncommitted state changes and presumably traps to software that would do whatever is necessary to do what the untranslated code would have done.
WAKE UP (Score:2)
its for this > its for that
they are produceing a system that runs a code template !
(if you dont know work out how you can add ppc to a AS400 and not recompile)
the product will have multithreading in
OK
peace enough of this guessing
john
a poor student @ bournemouth uni in the UK (a deltic so please dont moan about spelling but the content)
FPGA - Field Gate Processors (Score:2)
http://bersj.www.media.mit.edu/~vmb/papers/chid
http://www.atmel.com/atmel/products/prod3.htm
Basically it is a fully configurable CPU that can be programmed on the fly to fully dedicate to a single goal and complete it very quickly. Your standard Intel is designed for general number chunching and nothing much else in particular. But if you had control over what each register and logic gate did you could make your processor totally dedicated and streamlined to complete a single task. as well as being programmable the cache is spead out so that each logic gate has its own cache rather than one lump of cache for the whole board, this also speeds things up.
Re:What it Really does (Score:2)
-Chris
Of Registers and other less obvious things (Score:2)
> These improvements include a gated store
> buffer and a large plurality of additional
> processor registers
"large plurality" sounds to me like a whole boatload more than the ones we used in those silly MIPS simulators to learn assembly theory and certainly more than any x86 chip I've seen.
I also saw some references to VLIW conversion, so I did another grep; I think this is one of the best paragraphs, and it's not in Greek...
----------
FIG. 2 is a diagram of morph host hardware designed in accordance with the present invention represented running the same application program which is being run on the CISC
processor of FIG. 1(a). As may be seen, the microprocessor includes the code morphing software portion and the enhanced hardware morph host portion described above. The target
application furnishes the target instructions to the code morphing software for translation into host instructions which the morph host is capable of executing. In the meantime, the target
operating system receives calls from the target application program and transfers these to the code morphing software. In a preferred embodiment of the invention, the morph host is a
very long instruction word (VLIW) processor which is designed with a plurality of processing channels. The overall operation of such a processor is further illustrated in FIG
--------
This pretty much gives away what people have been saying since the beginning. Morphing hardware AND software elements that work in conjuntion to provide (drum roll) a fast as HELL computer. And it will run software we already have. And pretty darn near anything you throw at it. Want to be a Playstation for a day? How about an O2? Now switch back to Pentium II so you can type up that report and then become a G4 so you can make the graphics to insert in the presentation that accompanies it.
This thing will be doing the code morphing in parallel (which is what this invention seems to be... the morpher) and then run it on another fast chip that's related to one of the earlier patents. And it will all be controlled by a little driver that turns into a "layer 1 vmware" (now that our computers will need an OSI layer model...
-Chris
YES, that's what I got (Score:2)
"The present invention overcomes the problems of the prior art and provides a microprocessor which is _faster_ that microprocessors of the prior art, is capable of running _all the software_ for _all the operating systems_ which may be run by a _large number of families of prior art microprocessors_, yes is less expensive than prior art microprocessors. (my emphasis)
in other words, the Holy Grail of computer architecture: processor emulation that's faster than the native processors. yes, sounds too good to be true, but at least it won't be vaporware...
Re:What about the "permanent bit" (Score:2)
Wha'dup with the "permanently storing memory stores temporarially stored?" It's pretty much decided that temporarially stored implies a cache. Take the code, translate it. Store it in the cache until it is verfied, then execute it.
Sounds like their temporary cache of instructions can be sent somewhere else for storage once they have been verified.
Hmmm. HHHMMMMmmmmmm.....
So does that mean it can execute, say x86, code in emulation, and at the same time translate into native transmeta opcode in order to be run natively at a later time? It's doable. I can picture a basic flow diagram circuit in my head right now.
it has to do with fast translation and exceptions (Score:2)
Chances are the code fragments are basic blocks (between branches etc)
I think I've seen this idea by another name before so I'd guess there's prior art - but hey it's a patent you can read anything the lawyers can get away with into it ....
Re:What it Really does (Score:2)
determination is made that a sequence of translated instructions will generate an exception or error on the host processor [empahsis mine]
It seems to me that what they are doing here is making sure that the translation is correct, i.e. that the native instructions make sense. It does not do anything about any memory writes that might take place as a result of the execution of those native commands. Remember that the BSOD in windows comes as a result of the execution of a valid set of x86 instructions that mess up memory in a way that stops the application/system from functioning properly. What TM is talking about here would not effect that sort of thing at all (the chip logic would have no idea that writing to memlocation x would screw up the running of your app)
This implies to me that whatever mechanism TM is using to quickly translate (say) x86 -> TM instruction set can cause a set of instructions that make no sense (for instance a value is written to a register then another a value written to the same register without the first value being used at all -- that might not be a very good example but its that sort of thing )
The Prior Art section is excellent... (Score:2)
It is cogent, well written and covers a lot of ground. Someone really did their homework on that!
Much of the rest of the patent application is as deliberately dense as they can make it. Including one run-on sentence that would take me three huge breaths to speak aloud :-)
For information on what this thing actually does, read the 'DETAILED DESCRIPTION' section. On interesting fact gleaned from there in a quick reading is the fact the emulation co-processer is called a 'morph host' and it apparently executes some kind of special opcodes used for emulation. So to do the emulation you write 'code morphing software' that translates incoming instructions to the 'morph host' instruction set. Very Cool! And, of course, the 'transactioning' and error checking stuff noted in prior posts.
It is looking more and more like the early rumors of a Transmeta 'emulate anything' design were on the nose...
Jack
Have we really thought this through yet?? (Score:3)
Couldn't they theoretically (siq) be working on a system that would allow you to run MULTIPLE instruction sets inside of single OS?? The implications would turn the existing software industry (of which I am a part) onto it's ear!
Could we actually have a box running some form of unix, and actually be able to run ANY application natively on it - no matter what OS it was written for?? Think about running a BeOS app next to a Win32 app, next to an application compiled for i386 Redhat! WOAH.
If this is even close to what actually exists in Transmeta's labs, then we are in for a serious roller coaster over the next couple of years!
Quivering with anticipation...
p.
Re:Hmm ... (Score:5)
TRANSlatingMETAprocessor?
Re:Code Morphing..... (Score:2)
Re:Transmeta (Score:2)
PowerPC-based AS/400's don't have microcode in the CPU, as far as I know. The older IMPI ones had two levels of what was called "microcode", but the Inside the AS/400 book by Frank Soltis (one of the architects of S/38 and AS/400) said the "vertical microcode" was just machine code and was called "microcode" for legal reasons (if it was software, IBM would have to unbundle it; it was "microcode", however, which meant they could bundle it with the hardware). The "horizontal microcode" was conventional microcode, used to implement the IMPI instruction set.
I.e., the emulation is done largely in software, by translation of the high-level "MI" instruction set into the native instruction set (IMPI or extended PowerPC), although that software was, at one point, called "microcode".
The processor described in the various Transmeta patents also appears to do that translation in software, not hardware; this patent says
(emphasis mine).
exception trapping and the Alpha (Score:2)
That's exactly what I got out of that part. And it sounds pretty cool. This particulary would have applications in multiprocessing systems.
On the Alpha, we handle exceptions using something called trap barriers, which is a software method of handling this sort of thing. What happens is a fault appears to issue from the trapb and you are left to your own devices(from a compiler perspective) to discover where the exception occurs. It isolates the exception down to what's called a "trap shadow". This translates to a pain in the ass because we don't know precisely where a fault issued from, only the "shadow". Multiproc's complicate this mess further. This makes for interesting, but complicated compiler development.
Moving this to hardware, OTOH, would greatly simplify things, especially when emulation adds a layer of obfuscation.
That's why the exception handling part is what I zero'ed in on. It sounds really neat.
Clustering? (Score:2)
You do, however, show an interesting notion; if the patented matters reveal a "protocol" for allowing this emulation, it may make it plausible to have multiple "little processors" doing work, and getting to change Real Memory when it makes sense to commit the work.
It would certainly be neat if this were amenable to putting a bunch of "little processors" working together. The communications takes place at a much lower level than Beowulf; it may even be at a lower level than is done with SMP.
Re:Processors (Score:2)
Re:Hmm... (Score:2)
But if they could do something like that -- not just for Java, but other environments that do better with dynamic compiling (like polymorphic OO systems, e.g. Smalltalk, Common Lisp) -- that would mean a real revolution in programming. The advantages of C for anything other than systems programming would be greatly diminished.
Of course, if the translation is all hardcoded, that's unlikely to be very helpful for higher-level languages. And maybe the translation assumes some sort of commonality -- registers and the sort -- that most processors share, but wouldn't be shared by most sorts of bytecodes. This reminds me of what Linus was talking about in his article on the portability of Linux.
Re:WAKE UP (Score:4)
Much of the audience may not be familiar with AS/400's, so that's not necessarily much of a hint.
System/38 and AS/400 compilers generate code in a high-level pseudo instruction set; the low-level OS kernel, when told to run one of those programs, translates it into the native instruction set and runs that. (See Frank Soltis' Inside the AS/400; go to the 29th Street Press's home page [29thstreetpress.com] and select "General Interest" under "*** ALL AS/400 TOPICS ***", and then look for that book, which they claim to have online - the URLs on that site look depressingly dynamically-generated, so I'm loath to make a direct link.)
This let them change the native instruction set from the apparently 360-flavored "IMPI" to an extended PowerPC instruction set without requiring people to recompile programs (unless they tossed out the pseudo instruction set code to save disk space).
From the various Transmeta patents, it sounds as if they're building a chip intended to be used in an environment making use of binary-to-binary translation, as the S/38 and AS/400 do, but it's not at all clear that they intend to use B2B translation in exactly the same fashion - they appear to be targeting existing low-level instruction sets, e.g. x86, rather than some high-level instruction set like the S/38 and AS/400 "MI".
what it REALLY does (Score:2)
There's a somewhat interesting write up on CNN [cnet.com] (from the time of the first patent, nov. 98). There seemed to be some posts that missed who transmeta really is - it's owned by Paul Allen, who also owns Interval (another think tank). His whole goal has been to recreate his PARC days, when really smart people could team up and work on just about anything they wanted (the result we all know, since we're using it).
Transmeta's computer does at the processor level what JIT and Java do for software. Java lets you write one program and run it on many OS's. JIT speeds that process by pre-translating java byte codes into native code.
The transmeta box will allow a chip manufacture to make a single chip, that will run any OS, and (by cacheing instruction conversions, as well as memorizing repeated instructions) actually run them all faster than the zillion chips AMD, Intel and the rest are cranking out.
Think about it: Universal hardware, universal applications, and plethora of invisible middleware.
Welcome to the future. You heard it here first. Too bad you can't by stock in Transmeta....
$.02
Potential Hardware to Suppoert Binary Retargetting (Score:2)
Given the rate of corporate take-overs, you could quite easily end up running a zillion different systems and lose valuable time in trying to consolidating everything.
Oh well, add this to the speculation pile along with everyone else.
LL
Re:No you're all wrong: it's for Emacs (Score:5)