Adapteva Parallella Supercomputing Boards Start Shipping 98
hypnosec writes "Adapteva has started shipping its $99 Parallella parallel processing single-board supercomputer to initial Kickstarter backers. Parallella is powered by Adapteva's 16-core and 64-core Epiphany multicore processors that are meant for parallel computing unlike other commercial off-the-shelf (COTS) devices like Raspberry Pi that don't support parallel computing natively. The first model to be shipped has the following specifications: a Zynq-7020 dual-core ARM A9 CPU complemented with Epiphany Multicore Accelerator (16 or 64 cores), 1GB RAM, MicroSD Card, two USB 2.0 ports, optional four expansion connectors, Ethernet, and an HDMI port."
They are also releasing documentation, examples, and an SDK (brief overview, it's Free Software too). And the device runs GNU/Linux for the non-parallel parts (Ubuntu is the suggested distribution).
MAME? BitCoin? (Score:1)
The first comment to mention MAME or BitCoin wins.
Re: (Score:2)
Help me out here. The Adapteva sales pitch is claiming you get faster time to market by not having to do any FPGA programming (ANSI-C and OpenCL for the multicore coprocessors). The Zynq processor seems to be just for the host OS, which they say can run Ubuntu out of the box and they provide open source development tools for everything else. No mention of Xilinx anywhere that I can see. Am I missing something?
Re: (Score:3)
Help me out here. The Adapteva sales pitch is claiming you get faster time to market by not having to do any FPGA programming (ANSI-C and OpenCL for the multicore coprocessors). The Zynq processor seems to be just for the host OS, which they say can run Ubuntu out of the box and they provide open source development tools for everything else. No mention of Xilinx anywhere that I can see. Am I missing something?
he was probably confusing this with http://www.kickstarter.com/projects/1106670630/mojo-digital-design-for-the-hobbyist [kickstarter.com]
which pretty much means he didn't read even half of TFS.
Re: (Score:3)
Re: (Score:3)
manufacturing of physical goods can still be paid
How magnanimous of you.
In other words: You deal with organized crime.
By your standards, 100% of the electronics, computer and software industry is organized crime. That may stroke your ideological fervor, but it's of little practical value. Even Linus Torvalds uses a machine where less than 100% of the IP for all parts, software and manufacturing equipment is open. I'll happily continue using devices, participating in that industry and earning a living. I'm wondering how you made that post while avoiding any contact with the product of, as you label it
Re: (Score:1)
"By your standards, 100% of the electronics, computer and software industry is organized crime."
Haven't been paying attention to the Panasonic case, I see.
Every one of these companies is colluding and conspiring. This time, one got caught.
Welcome to reality, child.
Re: (Score:3)
It is a shame that you posted as an anonymous coward here. I'ld love to understand your thinking on this. As far as I see it, this is a win as the source code for the FPGA logic will be open, making this much like using Visual Studio to build an other Open Source project - hardly an Open Source fail.
I would also like to know if you run on Sparc CPUs as they are "open" (with published HDL source), rather than on Intel or ARM? If not, how can you defend that your favourite Open Source project (say Apache) run
Re: (Score:3)
the FPGA is the host for the CPU and communications with the Epiphany processor so you never need to change the FPGA at all. it's the Epiphany processor is what you are developing for, not an FPGA. the functionality of the FPGA is open, so you could use it just like any other IC if you really wanted.
Re: (Score:1)
"you never need to change the FPGA at all."
Until you want to be able to handle the bandwidth of a huge parallel processing unit.
And a typical FPGA will struggle with more than 8 TRUE cores currently. We've tested it. It would not work for our requirements, it was insufficient in bandwidth department.
Re: (Score:2)
you seem to be confusing the Epiphany chip (silicon) with the Zync (FPGA+ARM) host chip or something. the Epiphany chips contain 16 or 64 "true" cores and the chips connect directly together.
what were you talking about?
Re: (Score:3)
Proprietary software which can be used for free with very reasonable size and device limitations. Plus if you don't like the GUI you can always run the traditional command line tools to build a bitstream if you want.
fail even at advertising! (Score:1)
If all you are gonna do is advertise, at least do it right!
There is no micro SD included by default and the connectors are micro USB and micro HDMI. Big fail!
Re: (Score:1)
Damn them for making it use the same connectors and including the same standard equipment in the base package as every other similar product...
half the Gflops, 64 cores, 80% lower cost, 5 watts (Score:4, Informative)
It uses 5-10 watts, whereas the Core i7 uses 100 - 200 watts, with the chipset.
So total cost of ownership is about 90% less than the Core i7. Ten of them would spank the heck out of a Core i7 and cost the same.
> and what can you run on it ?
16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff - things where you run the same function on many pixels / samples / rows. So for face recognition, for example, the image would be broken up into 64 blocks and all of the blocks analyzed simultaneously on the 64 cores.
A database designed for the many cores could work well. For example, say you need to sort a table with 100,000 rows. On a system like this with 64 cores,
each core could simultaneously sort a group of 1,500 rows, then you'd merge those 64 sorted groups together ala merge sort. As a firewall, it could handle a blacklist with a million entries, as each core would handle simultaneously apply 1/64 of that list.
Re:half the Gflops, 64 cores, 80% lower cost, 5 wa (Score:5, Informative)
Yeah but compare it to a GPGPU and you start to realize how slow it is, a $200 660 GTX does 1880 GFLOPS in 140W.
1 GFLOPS/$ versus 9.4 GFLOPS/$
10 GFLOPS/Watt versus 13.4 GFLOPS/Watt
Re: (Score:3)
The only problem is you cant run GPU standalone.
There was one project by someone who reverse engineered old Radeon HD2400
http://www.edaboard.com/thread236934.html [edaboard.com]
http://www.flickr.com/photos/73923873@N05/sets/72157631771354007/ [flickr.com]
but that guy deleted his git repo before publishing the news blurp and some photos and they quickly shut up about it.
I would love to be able to use GPU cards standalone for Vision projects, or just as a openCL accelerators for embedded systems.
Re: (Score:2)
Re: (Score:3)
I don't know how these parallella boards work, but hopefully they would be a bit more versatile.
There is almost no chance that a $100 board can be designed to have a memory interface that can keep 64 cores well fed at this point in time. They have almost certainly chosen low latency cache model over high bandwidth cache model due to this, so this product will probably only perform well on highly computational problems that dont require much memory - in other words none of the problems that GPU's struggle with will likely be any better on it.
Re: (Score:2)
I agree with you 100% on that. If the cache isn't terrible, it might be okay if you have a problem amenable to openMP. But mainly I view these low-end things as kind of fun toys.
That said, there is a market for something reasonably compact and affordable in between a 4-8 core desktop and a large scale cluster. I occasionally test and debug problems on my de
Re: (Score:2)
Re: (Score:2)
The Parallella doesn't run standalone, either. It's an accelerator chip attached to an ARM system.
Re: (Score:2)
GPUs are SIMD, while this board is MIMD.
Re: (Score:1)
Hold the bus (Score:2)
So for some stuff they are very good, but for other stuff they are just not suitable at this time.
Re: (Score:2)
You are aware that this chip has the exact same problems, right? But unlike a GPU it has very limited on-chip memory, and no directly attached external memory at all. All communication happens through a FPGA-driven channel to the ARM, with the ARM being the only thing with DRAMs attached.
This is fundamentally, and properly labeled as, an external "accelerator chip" to add onto a computer.
Re: (Score:2)
You mean it's at the other end of a PCIe bus to where the memory is sitting? Thank you for playing but the communications channel looks a bit wider to me.
Re: (Score:2)
I don't know what you read, but the 6.4GB/sec chip interface bandwidth is less than a standard PCIe graphics card at 8GB/sec. Now, you may argue that there is more latency across PCIe or something, but also note that the Parallella system assumes a shared memory architecture with the host OS on the ARM.
Again, there are no dedicated RAM chips for the accelerator on this board, and the chip itself has no DRAM controllers. You can only load up to 32KB of RAM per core into the chip caches itself; it doesn't h
Re: (Score:2)
Well that sucks then for anything involving large data sets.
Re: (Score:3)
16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff
Low end ARM cores do that already in a low cost, low power package. I really can't see how this device would be economic for any of those things - even if you need to do facial recognition on multiple image streams at once low cost ARM cores will be cheaper. You also have the difficulty of interfacing so many video streams to a single parallel processing device; it would be easier to have lots of smaller devices.
As a firewall, it could handle a blacklist with a million entries
Again, current ARM based routers can handle such lists. IP address lists or simple URL lists wit
does for video what ARM does for photo (Score:2)
Re: (Score:1)
"It uses 5-10 watts, whereas the Core i7 uses 100 - 200 watts, with the chipset."
Wrong. Just so wrong. An i7-3770k, with a Radeon or Nvidia GPU drawing the desktop, running disks etc, while running a CPU heavy load, will draw 124 Watts, measured at the wall socket... Let's just say that if you subtract the GPU etc, you're down a significant chunk.
Re: (Score:1)
"Ten of them would spank the heck out of a Core i7 and cost the same."
Yea, if it were even a general-purpose usable piece of silicon. It's not.
"16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff "
We've had all of that in software since fucking Windows 98 on an Evergreen overdrive (180 MHz) chip. Please catch up with current technology or stop shilling, what you speak of is absolutely not new, and not even novel.
"A database designed for the many cores could
So everything since 1998 is useless? (Score:2)
So every processor since then is useless?
> "A database designed for the many cores could work well."
> As we've had for the past 30+ years I've been alive?
So noone will ever use another database, and there is no longer any use for hardware to run databases on?
Re: (Score:2)
So total cost of ownership is about 90% less than the Core i7.
TCO is a meaningless measure and it's sad that it persists. I have a used halfbrick here. It costs 99% less to buy (excluding shipping) and uses 0% of the power. The TCO is vastly better than either of the two options you present.
Now, return on investment is a much better measure...
But yeah, your other points stand. As always by using more specialised hardware you can get vastly better flops, etc in a given hardware/power/financial budget. There
You must be spending other people's money (Score:2)
> I have a used halfbrick here. It costs 99% less to buy (excluding shipping) and uses 0% of the power. The TCO is vastly better than either of the two options you present.
So the scorecard reads:
Item Effective Fast TCO
hw1 yes yes 6
hw2 yes yes 2
brick no na 0
It looks to me like "brick" loses because it can't do the job. The other two options are
Do I have to be the first one? (Score:2, Funny)
Very well:
Imagine a Beuowulf Cluster of these!
tis already a cluster - 64 cores (Score:3)
Re: (Score:2)
"With 64 cores, I'd say it's already a cluster. A dozen of these ($1200) would have 768 cores and fit in a microatx case. :)"
But what about performance? For example, how does it perform at parallel integer math (arguably the most common use for these things), as compared to a top-line, price-comparable GPU card?
That's what I want to know. I didn't search for a long time, but I didn't find info on that.
Re: (Score:2)
Re: (Score:2)
I can fit over 9000 bottle caps in a medium sized rainwater barrel. Not sure what I'd do with it though.
Here's a thought (Score:5, Funny)
I could buy enough of these to cover the underside of the floor of my house and mine Bitcoins during the winter. Then I get radiant heat and useless fake money (which is probably just NSA's password cracker anyways).
Re: (Score:1)
I literally don't even use heating. My heating is computer based.
I like to think of it as Efficient Heating.
Re: (Score:3)
Re: (Score:2)
I may be able to line the bottom of my floor with GPUs, connected via a custom PCI extension cables into a large (really large) chassis. But if take numbers into account, I have about a 1,000 square foot house. Let's say an average sized GPU is 4" x 12" (just for round numbers). And let's assume that I place 2 GPU's per square foot. That comes out to 2,000 GPUs, and a lot of money.
Think I will stick to wearing slippers in the winter.
Tiny but useful? (Score:2)
So it's interesting, a light weight ARM processor, without anything better than micro USB and micro HDMI. Neat yes, but really? Useful? Maybe as a wireless router, or some other PoE like device but as a useful processing system? Um...
Even linking many of these together - neat, but again, the world of MPI is based on completely different processor designs and interconnects, you're talking huge amount of time and effort to replicate something on a unique platform which may or may not ever see wide spread acce
Re:Tiny but useful? (Score:4, Informative)
Re: (Score:2)
Your video card, assuming you've got a fairly modern one which supports the various GPGPU programming models.
Re: (Score:2)
There is simply a set of "parallel" function calls which can be built directly into your code. You then just need to compile your code with the proper libraries, usually either mpich or OpenMP. I believe both are availa
Re: (Score:2)
https://computing.llnl.gov/tutorials/openMP/ [llnl.gov]
Real world use? (Score:3)
Anyone out there in /.-land plan on getting these for a real project?
Tell us about it! What language/OS/purpose?
Just curious...
Parallel is not necessarily better (Score:5, Insightful)
Supercomputers are usually just measured by their floating point performance, but that's not really what makes a supercomputer a supercomputer. You can get a cluster of computers with high end graphics cards, but that doesn't make it a supercomputer. Such clusters have a more limited scope than supercomputers due to limited interconnect bandwidth. There was even debate as to how useful GPUs would really be in supercomputers due to memory bandwidth being the most common bottleneck. Supercomputers tend to have things like Infiniband networking in multidimensional torus configurations. These fast interconnects give the ability to efficiently work on problems that depend on neighboring regions, and are even then a leading bottleneck. When you get to millions of processors, even things like FFT that have, in the past, been sufficiently parallel, start becoming problems.
Things like Parallella could be decent learning tools, but having tons of really weak cores isn't really desirable for most applications.
Re: (Score:3)
But indeed, it is the learning experience that is required, because cores are not getting particularly faster, and we are going to have to come to grips with how to parallelize much of our computing. The individual cores in this project may not be particularly powerful, but they aren't really weak either; the total compute power of this board is more than you are going to get out of your latest Intel processor, and uses a whole lot less power. Yes, it isn't ideal given our current algorithms and ways of wri
Re: (Score:2)
You are right that our current algorithms will have to change. That's one of the major problems in exascale research. Even debugging is changing, too, with many more visual hints to sort through millions of logs. Algorithms may start becoming non-deterministic to reduce the need to communicate, fo
Re: (Score:2)
Very *nice* comment -- spot on.
Only other thing to mention is that supercomputing trades latency for bandwidth. i.e. high latency but vastly high bandwidth.
Intel does a great job of masking latency on x86 so we get "relatively" low latency for memory but it's bandwidth is crap compared to a "real" supercomputer or GPGPU.
Re: (Score:2)
Re:Parallel is not necessarily better (Score:5, Insightful)
This device in particular only has 16 or 64 cores, but the Epiphany processor apparently scales up to 4,096 processors on a single chip. And, the board itself is open source.
So, if you developed software that needed more grunt than these boards provide, you could pay to get it made for you quite easily.
That's a big advantage right there.
Re: (Score:2)
So it's an evaluation board that may lead you to contract them for a larger, as yet undeveloped device? That's fine, but isn't really selling it us.
Re: (Score:2)
Parallel computing is a way to get around the limitations on building insanely fast non-parallel computers
by limitations, i'm assuming you mean the laws of physics.
Parallel computing is ... not something that's particularly ideal
it's merely a new paradigm in order to continue processing data faster and it wont be the last.
High core counts are making supercomputing more and more difficult. Supercomputing isn't about getting massively parallel ...
collective operations on supercomputers with hundreds of thousands to millions of cores are one of the largest bottlenecks in HPC code.
the Epiphany architecture is currently limited to 4096 interconnected cores because all the registers and memory (RAM) are memory mapped and the address space is limited. so if you are using 64 core chips it's 8x8 chips.
Supercomputing isn't about getting massively parallel, but rather high compute performance, memory performance, and interconnect performance. If you can get the same performance out of fewer cores, then there will usually be less stress on interconnects.
communication between cores is actually quite fast, 102 GB/s Network-On-Chip and 6.4 GB/s Off-Chip Bandwidth. so for 4096 cores, memory ba
Re: (Score:2)
It's still within the realms of manufacturing constraints at this point. A co-worker was making diodes a couple of atomic layers thick before 2000 but making a circuit at that scale in 2D is going to take a lot more work.
Re: (Score:1)
Re: (Score:2)
A lot of it is even if it isn't all like that. For instance a lot of seismic data processing is about applying the same filter to tens of millions of traces, which are effectively digitised audio tracks. Such a task can be divided up to a resolution of a single trace or whatever arbitrary number makes sense given available hardware.
So even if it "isn't really desirable for most applications" there are still plenty of others where it is desirable.
not a FRICKING supercomputer! (Score:2)
where do people get their definition of supercomputer? a supercomputer is what you have when your compute needs are so large that they shape the hardware, network, building, power bill. this thing is just a smallish multicore chip, like many others (now and in the past!)
Re: (Score:2)
where do people get their definition of supercomputer?
From the 1960's. The CDC 6000's designed by Seymour Cray were the first "Super Computers". Each "Core" had about 30 mips.
Doesn't appear to be cost-effective (Score:2)
This thing is promised to do 90Gflops and costs 100$. A HD7870 can do 2500Gflops for 300$. Sure, you need to build a rig around it, but you'll still be way better off then soldering together a tower of 25 of these boards.
Re: (Score:2)
Re: (Score:3)
now mount that HD7870 inside RC plane, or a quad drone
the closest you can get is mali t604 doing 68 GFLOPS or mali t658 at 272 GFLOPS (theoretical numbers, but everyone including amd uses those)
Re: (Score:3)
bingo. if you've seen some of the crazy acrobatic stuff being done with quad copters over on TED that is using several remote PCs and remote control. The programming could probably all be packed into one of these boards and built right into each copter.
In the purist terms (Score:1)
A super computer is a system that has multiple processors functioning in parallel. be it many individual machines networked together, a single processor with a several processors etc.
The term supercomputer is a very old one back before you could even fathom purchasing a machine capable of housing multiple CPUs, well unless you were a university or very well funded trust fund geek
by the original definition most of our phones are super computers
Parallelism is software-intensive (Score:2)
These boards are only half the solution to a parallel problem. I used to write satellite imaging software that was parallelized on a 12-CPU server. A lot of work went into the code necessary to parallelize the mapping and DTM algorithms. It wasn't trivial either. I'm failing to see the usefulness of these boards for anything other than intensive scientific computation. Because if the code being run isn't written for parallel processors, you're getting no advantage to running it on a multicore/multiproc
Re: (Score:2)
You are missing specialized applications written specificaly for that kind of system. What is indeed an easy thing to miss, because they don't exist, as that kind of system didn't exist until today.
I would have brought a few 3 years ago, but I don't have a need for them now.
IBM Cell Processor Again? (Score:2)
Re: (Score:2)
How is this not any different then IBM's Cell Processor?
Can you actually buy a single Cell processor or even a dev board for one?
Re: (Score:2)