Reconfigurable Supercomputers 181
VanL writes "
A previously unknown company
has come up with a supercomputer design using programmable logic components.
If this is right, this might be another case of a garage inventor changing
the computing paradigm. In the meantime, they are claiming incredible
things from their demo computer: It is the worlds fastest computer
(3-4x faster than IBM's Pacific Blue, and 10x faster than a Cray); it is
fault-tolerant enough that you could shoot a bullet through it and it would
keep on working; it will run any operating system out-of-the-box; and it
is the size of a normal desktop computer and runs off household current.
They call it HAL. ;) Check out the press
release, a news
story, and a more detailed description of the company and the technology
here."
Address is a Secret (Score:1)
'This once unheard-of company expects to be "heard of" this week, now that it has unveiled Hal for all to see. But its address is a secret, for security reasons.
When you have a computer that's worth more than its weight in solid gold, you lock it up tight, every night.'
From Whois query on starbridgesystems.com...
Registrant:
Circa 65 (STARBRIDGESYSTEMS-DOM)
208 1/2 25th Street
Ogden, UT 84401
US
Domain Name: STARBRIDGESYSTEMS.COM
Administrative Contact, Technical Contact, Zone Contact:
Light, Doug (DL8191) dlight@LGCY.COM
(801)994-7337 (FAX) (801)994-7338
Billing Contact:
Gleason, Matt (MG11172) MGLEASON@CIRCA65.COM
(801)392-2600 (FAX) (801)392-6350
Record last updated on 17-Jan-99.
Database last updated on 9-Feb-99 15:17:13 EST.
Domain servers in listed order:
NS1.WHATEVERWEB.COM 209.160.196.62
NS2.WHATEVERWEB.COM 209.160.196.59
Cold Fusion II (Score:1)
PLEASE, open source your code, and patents (Score:1)
:)
All I can say is... (Score:1)
All I can say is... (Score:1)
No Subject Given (Score:1)
But does it run Win 98?
Supercomputer (Score:1)
I'll take two please. Is this for real or is it an April Fools joke two months early?
That, and the Irish Girl's uber-crypto! (Score:1)
Hey, I have an interesting idea!
Lets grab one, load on that crypto software that the girl in Ireland wrote, and toss on that mystical compression that compresses files into 256 bytes( from waaaay back.. ) and add to that the secret of how to get the caramel into the caramilk bar.
Sure bet the aliens will come back and give us the keys to the pyramids, so we can find out what the fsck the 11 herbs and spices are in Kentucky fried chickens receipe!
Sheeesh!
Fascinating stuff (Score:1)
> BTW, if you find a metalanguage with all possible tasks you could do on a processor, let me know.
What about turing-machines ? At least if you mean processor in our traditional sense.
in principle... (Score:1)
Although I know next to nothing about this stuff and although what they say sounds a lot like a hoax, the principle strikes me as a typical case of "why did I never think about that". I instantly liked it
Sure, a "C++ to FPGA"-compiler would be a bit too complex to imagine, but if you find the means to
create good circuits from an algorithm-description, why not ?
And as for the speed-increase, think about a simple AND-operation. To do it on a conventional machine you load the instruction, decode it and execute it. Although that may run in on "cycle" it surely involves a lot of gate-switches. Hardcoding this AND on the FPGA takes the time of.. well, an AND. The time needed for one gate (or array, for register-AND) to calculate this operation is all it takes. It's not hard to believe that this is many times faster, even if the FPGA in itself is not as fast as a custom made "real" chip.
Hardware is always faster than software. 3D-grafics-chips do exist for a reason, don't they ?
And stuff like i/o-bandwith or context-switching might not be that important for a "supercomputer" which will probably not be targeted as your average multitasking-unix-box, even if it could fit on the desktop.
Being reprogrammable "only" 1000 times a second also is no problem at all, because you can leave one part of it running as a general-purpose-cpu all the time. Slow reprogramming is not a loss for its slowness, but a win for its reprogrammability.
Any OS? (Score:1)
I don't suppose they run a conventional OS. But does it run any OS at all? Can it be used as a general purpose supercomputer? It sounds like a super-calculator to me.
Apples and Oranges (Score:1)
Interesting how they compare 16 bit addition (for their system) to IEEE floating point on a Cray. Also of note, 50ns main memory? SDRAMS are faster than that.
As to their comparison of computing on their system vs another supercomputer, Most analysts don't re-build the computer to do a program run.
There are some interesting ideas there, but I'll bet that real world benchmarking won't look nearly as good as their estimates.
Superspecificity! (Score:1)
My warning bells started clanging at this point:
6. Superspecificity is also achieved by SBS through advanced artificial intelligence algorithms and techniques such as recursion, cellular autonoma, heuristics and genetic algorithms, which are incorporated into a single system which naturally selects the most efficient library element for achieving maximum specificity.
Buzzword alert! Buzzword alert! He forgot "neural nets" though. I'm sure they're in there.
7. Higher orders of specificity are also possible because SBS's Hypercomputer system is self-recursive. It uses its own algorithms to evolve itself. The system is capable not only of producing systems that are more simple than itself, it is also capable of producing systems that are more complex than itself.
Cool, they've hit that holy grail of science fiction, the self evolving computer. All you have to do now is build a pair of robotic arms for it and it will build itself into a gigantic Übercomputer and take over the universe.
Oh, and my favourite part:
9. Because the Viva software system includes a formally accurate method for achieving an optimal solution to a problem, the layering of those optimal solutions is also optimal....
It looks like they've solved that nasty "computer science" problem we've all been working on! No more slaving over algorithm design! Just type in the problem (in English I assume) and this thing will solve it optimally.
Big Deal... (Score:1)
Still, interesting.
Real numbers, please? (Score:1)
I wonder which Cray they're comparing it to. There's more than one, after all. They're probably comparing to something slow like a Y/MP or a J90. They might look stupid if they compared to a T90 or SV1...
--Troy
Real numbers, please? (Score:1)
The Cray T3E #1024
per their about us page
Well, that tells me a little more... but not much. There are 3 different models of that particular machine, depending on whether it uses 300, 450, or 600MHz Alphas. Based on the 1 TFLOP number they quote, I'm guessing they mean the model with the 600MHz chips (the T3E-1200E/1088).
I liked the following little piece of idiocy from SBS's "About us" page [starbridgesystems.com]:
The Cray machine, which costs approximately $76 million, can perform at a peak speed of one trillion instructions per second in a narrow class of applications. It cannot sustain performance at peak speed.
Well, duh!!! Of course it can't sustain peak! There's no machine in the world that can sustain more than about 50% of peak performance on useful, real world code; usually that number's closer to 25-30% of peak. Unless SBS knows something serious about compiler technology that the rest of us haven't figured out yet, sustaining a system's peak performance is impossible.
Their 12 teraop number is very suspicious, too. They define an "op" as a 4-bit integer add. The ops they quote on the T3E are 64-bit floating-point adds and multiplies. Apples and apples? I don't think so. There aren't too many interesting problems you can do with 4-bit integers, either; maybe extremely lossy signal processing, but that's about it.
I also noticed that SBS's press release page [starbridgesystems.com] has been taken down some time in the last day or so... I'd love to believe these guys have some kind of breakthrough, but from everything I've seen they're either extremely naive or lying out their collective butts.
True Test (Score:1)
----------
Why is my baloney detector ringing? (Score:1)
I truly hope it's for real. I want to believe. But why is my baloney detector ringing?
Let's hope my detector is faulty, shall we?
--
The performance specs are bogus (Score:1)
This means that the memory and I/O subsystems aren't even exercised. Nobody uses a 4 bit addition as a performance spec, not even Intel.
The actual product description is unbelievable as well. The largest Xilinx FPGA's might be capable of being configured to fully emulate a 16 bit microprocessor. I haven't worked with them in a long time but when I worked with the 4000 series I figured I could shoehorn a rudimentary 8 bit processor into the largest devices. (which would mean that a rudimentary 8 bit microprocessor was produced for over one thousand dollars incidently. It's a bit cheaper to buy a PIC from MicroChip)
They said that they reached these performace levels with 280 of the largest Xilinx FPGA's. My take on what they've done is cram as many 4 bit adders onto a single FPGA and replicate it 280 times. They then had them all execute in parallel and pretended that this made up a supercomputer.
Keep in mind that performance on an FPGA isn't stunning. We're talking on the order of 10 nanoseconds to do the 4 bit addition.
So... if they've even designed and built this thing (which I doubt) the specifications are a complete fabrication.
I haven't checked yet, but browse through Xilinx's web site [xilinx.com]. If they don't mention this wonder of reconfigurable computing then it doesn't exist.
Possible but Skeptical (Score:1)
This could be something along those lines. However, I must admit that I'm a little bit skeptical.
Imagine... (Score:1)
run linux. Then imagine enough of them to take
up the space of ibm's box, all beowulf'd together.... BWHAHAHAHAHAH!
Beowolf (Score:1)
Can it convert lead into gold as well? (Score:1)
April Fools? (Score:1)
The question mark?s make it real!! (Score:1)
I'm having problems with this... (Score:1)
1) How the heck is he reloading the FPGAs fast enough to be all that hot? I've not worked with Xilinx parts lately, but the Altera (one of Xilinx's competitors in the FPGA market) stuff I have used takes many milliseconds to reload. Mabey this is a feature of the Xilinx architecture, but that leads to my second problem:
2) How the heck did he get enough information out of Xilinx to write his own compiler? Again, I've never tried, but when I asked Altera for information about the internal structure of their parts, I was told that was proprietary. Since the structure of the chips is the thing these companies are trading on, they are usually pretty closed mouth about this sort of thing.
3) Why isn't there an announcement on the Xilinx web site? If I were Xilinx and someone used my gear to beat Pacific Blue, I'd be shouting it from the rooftops, and trying to drum up buisness with it.
I wonder if they are busy signing up investors as we type?
Michael Kohne
mhkohne@discordia.org
Here's one of their patents (Score:1)
http://www.patents.i bm.com/details?pn=US05600845__&language=en [ibm.com]
Hack the Gibson... (Score:1)
Spelling Flame (Score:1)
In addition to all of the tasks traditionally performed by supercomputers, SBS's Hypercomputer systems can perform the full range of functions requiring ultra-fast scaler processing, such as...
There, you anti-spelling flamers, is a great example of how spelling _matters_. The BS meter
goes off even louder when you can't spell your wild claims about your invention.
No Subject Given (Score:1)
Kythe
(Remove "x"'s from
The benchmarking I really care about! (Score:1)
Time to change the map? (Score:1)
They do have an economy model for only 2 million. Maybe one of the AC's who's as rich as he are brilliant will put one on their AMEX card and report back to us on how well it works. Be the first one on your block. Operators are standing by.
(I've still got the articles on building that RCA computer around here somewhere, BTW, but I was strictly into analog back then).
Time to change the map? (Score:1)
They do have an economy model for only 2 million. Maybe one of the AC's who're as rich as they are brilliant will put one on their AMEX card and report back to us on how well it works. Be the first one on your block. Operators are standing by.
(I've still got the articles on building that RCA computer around here somewhere, BTW, but I was strictly into analog back then).
Xilinx FPGA's not programmable at that speed (Score:1)
But even then, this all assumes that you have the bitstreams already available to download to the fpga. Dynamically creating them is not too simple - just calculating the routing can take hours on a p2-300. So that means you have to have a pre-compiled set of bitstreams, which could be reconfigurable to a lesser extent (swapping pieces out and in) - but if you have that, why not make a bunch of ASICS that do what those precompiled bitstreams do?
Not that reconfigurable hardware isn't a neat and exciting paradigm, but these claims are so much cow feces (for now, at least).
No Subject Given (Score:1)
NT or Unix? (Score:1)
But they expect us to believe they've designed radically new hardware as well as a brand new fancy programming environment which runs on multiple platforms in the short life of their startup, and it was all done by 1 guy?
Ok. Sure.
Think what he could do if started hacking linux.
FPGAs (Score:1)
No Patents (Score:1)
This is transparent nonsense. The only hint of legitimacy derives from the fact that they scammed a couple of Mormon newspapers and TV stations into buying it.
Nice little fantasy though.
Hmmmmm.... (Score:1)
Some HP researchers were working on this (Score:1)
There's also at least one company producing circuit simulation platforms hundreds/thousands of times faster than pure software simulation platforms, for the IC design industry.
What marks these guys out is that they can write hype with the very best Microsofties.
Heck what do I know, they might be the same guys after some marketing courses.
Hmmm... (Score:1)
StarBridge pages moved to... (Score:1)
Looks like the perfect 3D accelerator... (-:
Baloney, huh? (Score:1)
this MUST be a joke... right? (Score:1)
sorry, VERY hard to believe....
They're not the only ones. (Score:1)
describes a similar machine where each FPGA simulates a bunch of neurons and cycles through 300 bunches a second giving an effective neural net of 40 million neurons! They are trying to get it to control a robot kitten in an intelligent way.
Where I come from... (Score:1)
The owner of this domain is probably peeing his pants laughing, the domain registration money and site design time well spent.
It's patented. (Score:1)
Abstract:
An integrated circuit computing device is comprised of a dynamically configurable Field Programmable Gate Array (FPGA). This gate array is configured to implement a RISC processor and a Reconfigurable Instruction Execution Unit. Since the FPGA can be dynamically reconfigured, the Reconfigurable Instruction Execution Unit can be dynamically changed to implement complex operations in hardware rather than in time-consuming software routines. This feature allows the computing device to operate at speeds that are orders of magnitude greater than traditional RISC or CISC counterparts. In addition, the programmability of the computing device makes it very flexible and hence, ideally suited to handle a large number of very complex and different applications.
Apparently, yes. (Score:1)
If this is true... (Score:1)
yearh right! (Score:1)
Large dose of salt, table 2... (Score:1)
But you don't EVER get real-world performance like that, for several reasons. One, you have a very complex piece of software that "compiles" the program for that sea of FPGAs, and your utilization is only as good as that software (and complex doesn't begin to describe it). Second, once you introduce communication into your computational model, everything goes to hell. You have to include room to route data (and hence "wires") between the chips, and that just eats up everything.
Just for the record, I contracted for a company that built a virtually identical box, used for chip emulation. It had ~300 Xilinx chips, and had a VHDL->Xilinx compiler and router. You COULD run Win95 AND DOOM at a clock rate of about 1000 HZ (i.e. VERY SLOW). But companies bought them. The price was about $750k to $1M, and that included a BIG profit margin. This thing is way overpriced. If you need more convincing of that, look up how much the EFF built Deep Crack for, and that was a one-of-a-kind box.
Large dose of salt, table 2... (Score:1)
I've seen start-ups like this before... (Score:1)
bull(cough)shit (Score:1)
You try to give everything to everyone, and you end up giving nothing - e.g. WinNT/9x...
And besides - who spends 15 years working in the very low level processor design field, and ends up with a *tremendous* break through in the OOP products (as they claim in their release)
Flame On!
No, that's a nanotech issue. (Score:1)
lead and gold are different atoms so nanotechnology (building object by assembling precisely atoms) can't help here.
But diamond is another story
Fundamental problems with this architecture. (Score:1)
The largest FPGA that I've heard of had a million gates on it. Pick-your-random-processor has 10-20 million transistors, giving it a high single-digit equivalent number of gates. Implementing anything with FPGAs will take up several times more space than using a custom chip.
This means that you will have a _big_ supercomputer.
While FPGAs are reconfigurable and hence very flexible, the implementations that they come up with for a given logic configuration aren't optimal. This, combined with the performance overhead incurred by the components that make it configurable, mean that an FPGA with a given logic pattern burned into it will be slower than an equivalent, optimized logic pattern implemented in CMOS.
This is another important point - CMOS. While the machine on your desk may use CMOS or might add BiCMOS in there for a speed boost, supercomputers and servers have significant amounts of ECL circuitry in them to speed up critical logic paths. ECL technology is based on bipolar transistors, which switch much more quickly than the MOSFETs used in CMOS but generate far more heat. Used sparingly with aggressive cooling, they can double the performance of a chip or more. This leaves CMOS chips in the dust, and by extension anything built with an FPGA.
If you're shelling out the money for a supercomputer, then you have a good idea of the classes of problem that you're going to be running on it. This lets you choose the type of processor and interconnection architecture that you use so that it matches the problems that you plan to be running. If necessary, you design a custom ASIC for even better performance (as was done with Deep Crack). A reconfigurable architecture that was magically as fast as hybrid ECL/CMOS still wouldn't get you much of a performance boost, because you're already fairly close to an optimum hardware implementation. With modern processors, this is expecially true, because the on-chip scheduling and pipelining is good enough to keep most of the chip busy if the problem even approximately matches the chip's logic capabilities.
There was much mention in the article about using processors that were tightly coupled. They'd need to be, to share logical functions with each other. However, this is extremely difficult to accomplish even with conventional processors. The communications traffic goes up with the clock speed and as the square of the number of processors (until it saturates the processors, at which point it goes up linearly). Processors have enough trouble communicating with other chips as it is; this is why new memory architectures are coming out. Asking n=lots processors to communicate tightly with each other and with memory in a reconfigurable manner is asking for a motherboard that can't be built. In practice, you'll wind up implementing either an n-cube architecture that allows fast communication but limits connectivity, an anywhere-to-anywhere mesh that has wonderful connectivity but seriously limits the amount of traffic that can be supported, or a hierarchial system using one or both of the above. The system as described just won't work.
It's hard enough to optimize well with hardware that doesn't change. Figuring out the best way to implement an algorithm using both hardware and software feels like an intrinsically hard problem. Your compiler will have to try to solve this. IMO this will result in either a compiler that requires the user to explicitly state what they want done in hardware, or else a compiler that tries to optimize but does it badly, or else a compiler that is never finished.
In summary, I think that there are a number of issues that the writer of the article was not aware of. I hope that the designers of the system took them into account, because otherwise this will be a neat-sounding project that disappears once the investors realize that there isn't going to be a product.
It might be worthy (Score:1)
HERE'S SOMETHING TO THINK ABOUT... (Score:1)
Nine devices, from 50,000 to 1,000,000
system gates (1,728 to 27,648 Logic
Cells)
Over 500 user I/O pins
Many package options, including leading
edge 1.0mm FinePitch ball grid arrays
and 0.8mm chip scale packages
Leading edge 2.5-Volt, 0.22 micron, five
layer metal CMOS process
Fully 5-Volt tolerant I/Os
Timing-driven place and route tools allow
compile times of 200,000 gates per hour
(400 MHz Pentium II CPU)
Vector-based interconnect for fast,
predictable, core-friendly routing across all densities
Fully 64 bit/66 MHz PCI and Compact
PCI compliant
Okay, so say they take 280 of these at the 1 million gate density = 280,000,000 gates. Currently, the Pentium II has 7.5 million transistors (probable 1.875M approximate logic gates)(http://www.zdnet.co.uk/news/1998/3 6/ns-5490.html [zdnet.co.uk])
Just raw silicon. Lets say they had a bunch of pre-compiled circuits, then there wouldn't be any lag in switching as they say. (I must admit that 1,000 switches per second would be a little overblown.
But - lets just say for equivilance, that we had 149 Pentium II's connected PARALLEL (Which is currently impossible. I think that LLNL uses PPro's currently at 2 chip SMP.) Such a system WOULD kill a cray. But the Pentium can't do that.
Everyone who has read the Beowulf papers know that the overall speed of the system is entirly dependant on the lag of the interconnect between systems. So, for fun let say we could put 18 chips on a board, and put 20 boards on a 128 bit local bus. That would lead to some damn fast computing. (Remember Deep Crack on the last RSA contest? It only ran at a system speed of 80 MHz?)
I believe they are at least on to it.
Farse (Score:1)
Zebra X
Limited machine (Score:1)
Baloney is still rather tasty... (Score:1)
This definitely sheds light on what Transmeta has been up to, however, and why Linus is working for them.
Fascinating stuff (Score:1)
Actually, this seems like a quite elegant solution to the FPGA re-configuring problem to me. The paragraph about how their Viva software determines many times a second whether each function would be best done in hardware or software, and reconfigures the FPGA appropriately, sounds rather elegant to me. I did some work with FPGA's once, and was limited to "booting" the processor from a PROM and executing silly little instructions; with this sucker, you could specify a metalanguage with ALL the possible tasks you could throw at the processor, and have the processor make its own instructions to execute whatever subset of the language that was needed to execute! Any additional power could be tossed in there by creating additional parallel instruction units to execute the most frequently executed instructions... the possibilities are mind-boggling.
Of course, there is a limit to how many things you can throw at the system at one time; The ridiculously high benchmark was for a 4-bit adder; of COURSE a bunch of special-purpose parallel chips that did nothing but 4-bit ADDs would be able to outperform a cray with the speed and power reductions mentioned. But I'm betting flipping the FPGA into a dynamic x86 emulation mode, with instruction parallalism, would slow the whole thing down to something more reasonable. Even still, it should outperform anything the x86 chipmakers currently manufacture. The graphics possibilities are what really grab me; the documents were mentioning that as well. Imagine a 3D accelerator that doesn't have a fixed instruction set! If the application was coded to use a particular instruction 80% of the time, Viva would adapt the FPGA grid to create massive numbers of parallel execution units for that instruction, and the speed would go through the roof.
I'm really happy about this; I wonder how the mainstream processor manufacturers are going to react once the possibility of this thing becoming mainstream technology shows up in the press.
You'd think for $26 million... (Score:1)
How??? (Score:1)
I cannot understand how is that technology going to work at all.
Except for "magic" or "ghosts".
(reprogrammable chips? like EEPROM? doesnt sound logical to me)
Please explain...
Wolf! Wolf! (Score:1)
Bleep!
Perhaps more embarrassment for Salt Lake . . . (Score:1)
Is this DiMora above the greatest engineering minds or will he give the media more, obvious, fun.
Will Salt Lake be dubbed the "Shifty City" theologically, commercially, . . . quite technically?
is the olympic committee behind this? (Score:1)
maybe the former olympic committee members have something to do with this...
man, if i could get fuzzy dice for my rear view mirror, put this thing in my trunk, i could have a car mp3 player AND composer - because it would write original kick-ass mp3s. the babes would be all over me.
I want to see the bullet test. (Score:1)
You call this *writing*? (Score:1)
"It's really fast! >" "You could shoot it and it'll keep working!! >" "It can rearrange its wires a thousand times a second!! >"
Next time, Utah News, some *fact* and *detail* would be nice.
Let's see it render (Score:1)
*shudder*
All we would need then is a neural VR interface, and then things could get really interesting...
No More Secrets (Score:1)
Hmmmm... (Score:1)
Wouldn't mind being wrong though...
Naaa, it's a scam (Score:1)
------------------------
Whois Query Results
Starbridge (STARBRIDGE2-DOM) STARBRIDGE.NET
Starbridge Communications (INTERNET37-DOM) INTERNET7.COM
Starbridge Communications (ADULT24-DOM) ADULT21.COM
Starbridge Communications (SEXSUPERSTORES-DOM) SEXSUPERSTORES.COM
Starbridge Communications (SEXYNUDEGIRLS2-DOM) SEXYNUDEGIRLS.COM
Starbridge Communications (FREDDIEFREAKER-DOM) FREDDIEFREAKER.COM
Starbridge Communications (FREAKER3-DOM) FREAKER.COM
Starbridge Communications (STARBRIDGE-DOM) STARBRIDGE.COM
-----------------------
Interesting scam though... what do THEY get out of it?
Ooops... my bad (Score:1)
-------------------------
Registrant:
Circa 65 (STARBRIDGESYSTEMS-DOM)
208 1/2 25th Street
Ogden, UT 84401
US
Domain Name: STARBRIDGESYSTEMS.COM
Administrative Contact, Technical Contact, Zone Contact:
Light, Doug (DL8191) dlight@LGCY.COM
(801)994-7337 (FAX) (801)994-7338
Billing Contact:
Gleason, Matt (MG11172) MGLEASON@CIRCA65.COM
(801)392-2600 (FAX) (801)392-6350
--------------------------------
So, maybe they're legit?
Superspecificity! (Score:1)
Are they like cellular automata, but loners?
And do you get the impression that they only
have (at most) one working system?
The whole things just seems like some CS-naive
EE has managed to get carried away with himself.
It reminds me of other naive projects, like
'The Last One' (UK, circa 1982, which was to
be the last program ever written, because it
would take a natural language description of
the problem and write the solution for it), or the
miracle data compression algorithm reported
in Byte magazine a few years back that could repeatedly compress its own output without
data loss. Both failed, because both were
quite stunningly naive about the nature of
the problems they were trying to solve. This
super-specific hypercomputer sounds just the same.
Computer Benchmarks (Score:1)
My rebuttal of the HAL Super (Score:1)
1) If you read the "About Company" link, it states that in fact they have not actually built the machine that they claim will reach 12.8 Tops. It states that it will be built by Feb 99. I believe that they scaled the performance of a machine much much smaller than the one they claim betters the IBM machine. Note that scaling performance of a smaller machine is considered a big faux pas, as it totally neglects parallel overheads.
2) They quote 4-bit operations as though they are equivalent to the 32-bit Floating point ops that were tested on BlueMountain. They are not similar. The performance drops off to 3.8Tops when they use 16-bit operations, but the IBM is still doing twice as much real work because they are using 32-bit math.
3) They have not actually stated which benchmarks they used (if they actually had a machine to test). The IBM used (i think) the LinPeak, which is a matrix multiply operation. That benchmark is very good at showing off parallel architectures. But the size of the matrix must be quoted in the results. I don't see that information.
4) The HAL computer is very under memoried. This might be easy to fix. The rule of thumb is 1byte RAM per Flop. That's why the BLUE machine has 2.5Tbytes of RAM for 3.8TFlops. The max listed for HAL is 100GB, or about 1-2 orders of magnitude too small.
5) On one page at StarBridge, it lists the I/O performance in comparison to a Cray T3E-1024. But the numbers are differnt on the press release. In the press release is sez 50GB/s, but the company link indicates 50MB/s. 3 orders of magnitude different. I wonder if one of thoses is actual I/O data, and one is (50GB) is supposed to be memory bandwidth, which is a very very different concept from I/O. If the machine only has 50MB/s I/O, but 100GB of memory, then it will take about 2000 seconds to load memory from disk (or, say, to do a check point of the current data set).
6) The marketing data for the Cray T3E-1024 is wrong, which in my mind negates most of the comparisons. The Cray T3E-1024 does not cost $76m, but the Cray/SGI Blue Origin2000-3000 does cost about that much. It states the Cray does not have fault tolerence, but it most certainly does. It states that the maximum I/O of the T3E is less than 2GB/s, which i know is wrong. (Did someone in marketting write this without double checking the numbers??)
7) The company link states that the HAL 4rW1 has a minimum sustained rate of 3.8 Tera-ops. Would any company really claim that there is no program that they could run that would not perform worse. If that is a challenge, i'd be willing to put a huge bet down that i could write a program that will give LESS than 3.8 Tops!!!
8) programming. On the company page, it does not state that you can program this in C or Fortran or any other common language. Instead it only talks about the GUI that you can use to describe your problem, and the software will automagically start running (at 3.8Tops minimum!!). This is a HUGE drawback if this is true (no C or Fortran). Also how can they possibly claim that their machine does not have the same parallel exectution overheads that are common on other parallel machines?? Just because you can change your network on the fly (and just how long does that change take??) does not mean you are immune to Amdahl's law!!
So to sum up:
1) 4-bit math is NOT equivalent to 32-bit math.
2) What are the benchmarks and data sets?? What were they programmed with? Was the test actually run with a full scale machine, or just scaled results??
3) How to deal with Amdahls law?
4) Hardware is under memoried, and very likely less than perfect I/O performance.
-r.
HERE'S SOMETHING TO THINK ABOUT... (Score:1)
And a 149 processor Pentium-II would not kill a Cray. Well, not a large one at least. I would guess 149 processor P-II would come in around the same as a 100 processor Cray T3E or Origin2000. But since they are selling T3E's as large as 1380 processors, and Origins in the thousands of processors, a small 149 processor would not come close.
-r.
Got to be fake... (Score:1)
So, I don't think it is fake.
Back-o-the-Napkin Debunking (Score:1)
The fastest reconfigurable Xilinx FPGA is a 200MHz part.
Assuming that you can do 1 FLOP in one cycle, that means that their machine would need 16 TFLOPS / 200MHz = 80,000 computational elements.
This same FPGA has 500,000 gates per chip. Assuming that you could fit 100 floating-point pipelines onto this FPGA (5,000 gates per pipeline), this would mean that they would need 800 FPGAs. Does this fit in their box that they said? I think not!
Think about it from the power side as well. If we assume that those FPGAs are 5W chips, that's a total of 4kW of power consumption. Minimum.
And we haven't even breached the subject of adding support components like real logic elements for control, RAM for storing any kind of software that is needed, I/O subsystems, software support layers (Oh yeah, that's right, maybe Viva is the real-language OS that the government has been secretly developing all these years to do mind control on us all ;)
Windows 2000 (Score:1)
---
gr0k - he got juju eyeballs - http://www.juju.org [juju.org]
fps in quake2 (Score:1)
If it's over 1000 I buy one!
RJ-11 connector?! (Score:1)
Witness http://www.starbridgesystems.com/about.html under the features of HAL-4rW1.
Why bother?