Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Technology

All-Purpose Distributed Computing 150

Markee writes "Wouldn't it be great if any program you write could be executed automatically in a parallel, heterogenious environment?
The TSIA model (Task System and Architecture) suggests a way of programming in which any given program is divided into so-called tasks. An underlying operating system extension (similar to a scheduler) could dispatch these tasks across processors and systems, providing for inherently parallel, fail-safe execution in a heterogenious environment. Actually, the TSIA model is a generaliziation of what is already done in existing distributed computing environments like SETI@home. An implementation of this model would need only minor syntax extensions to given programming languages and a relatively modest OS extension for scheduling and dispatching. The TSIA web site looks primitive and features too many buzzwords, but nevertheless the documents are worth taking a look at. "
This discussion has been archived. No new comments can be posted.

All-Purpose Distributed Computing?

Comments Filter:
  • Ummm... If I remember right, Neural Nets are essential parallel-style nodes with extremely limited intelligence. It is the density of the interconnections between the nodes that actually create the "intelligence". Multiprocessor parallel systems are usually based on very intelligent nodes with limited interconnections (i.e., each node has a big Pentium or Alpha with local memory) and only one network connection.

    Parallel Processing is like a think-tank. Lots of very bright people working on individual problems. Communication and coordination of the individuals becomes a major task and any problem submitted has to be diced up into individually bite-sized morsels.

    Neural Nets is like a brain. A bunch of independently dumb cells with lots of interconnections. Communcation between the cells occurs "naturally" and the problem flows through the cells like an amorphous blob.

    There is no easy way to take advantage of parallel processing. It has to be designed for early in the development process. What this library proposes is to write all applications for distributed processing, even if the software is normally run on single CPU or SMP boxes.

    Of course, there are plenty of problems that are inherently easy to implement in parallel, brute-force crypto-cracking, scanning sections of data for coherent signals, packetized compression, graphics rendering, etc. Heck, even make can be parallelized easily.
  • Yeah, it seems to me that after all of this guys work and ideas are put into developing a compilier for all of his languages (this usually just means C and possibly C++) he will end up with Ada.
  • What about parallel gay geniuses? =)
  • As long as it happens in idle cycles, I really don't care what my computer is doing. I do. Any module running on my cumputer that accepts arbitrary jobs would need to have strict security and policy controls beyond "Run when idle." Such a module would have to impose security restrictions preventing hosted code from doing malicious stuff to my computer. And I will need to be able to set policies like, "Allow NOAA to run weather modeling software, but don't let Cracker Joe break passwords on this host."
  • As long as it happens in idle cycles, I really don't care what my computer is doing.
    I do. Any module running on my cumputer that accepts arbitrary jobs would need to have strict security and policy controls beyond "Run when idle." Such a module would have to impose security restrictions preventing hosted code from doing malicious stuff to my computer. And I will need to be able to set policies like, "Allow NOAA to run weather modeling software, but don't let Cracker Joe break passwords on this host."
  • Hmm...it sort of sounds like our professor's tales of the mainframe days when students had to pay for their computing time. It didn't sound like a lot of fun, but apparently you learned to make sure your code was bug free before you handed your punch cards over the batch-operator guy =)

  • I've never used it, but I think something similar already exists (I could be wrong, since I didn't read the link from the story yet).

    Compositional C++ [isi.edu]

    -ElJefe

  • Actually, no. There -are- parallelising compilers, developed by the University of Manchester. Such things exist, although they are horribly inefficient.
  • Before everything else lets just answer the initial question:

    "Yes, it would be very neat!"

    Needless to say, auto-partitioning of a function into blocks is an interresting topic.

    As i have a background as both a software guy (ada, c, java) and hardware guy (mainly vhdl) i often think about how simmilar the two areas are becomming. In both areas you try to design modular components that can be collected in a library and re-used.

    To the point; Will future compilers generate code for a processor or will they generate functionblocks that are loaded into a gate array and communicates with other functionblocks on the same chip, or on another computer?

    Considering that you only can clock a processor so fast, you have a better chanse to optimize the performace if your 'programs' run as functionblocks on a gate array (using bit serial algorithms?).

    Something that would interrest me would be to design an 'operating system' that allowed a 'program' to allocate space on a gate array, provide I/O services, storage area in a ram (if you dont want to waste space in the array) and 'inter task' communications both on the same chip and via a network. Buy hey, now im just day-dreaming.
  • Wait a minute! You mean that if I write my program as a parallel program to begin with, that I can run it in parallel on a parallel computer or cluster of workstations? Holy cow!


    sc
  • Well, sort of on/off topic, but I'm working on a cross platform distributed mp3 encoder.... Maybe this would help...
  • by l4m3 ( 25501 )
    This is very similar to how Plan 9 works by offloading computer intensive tasks to computers that can handle such tasks easily.
  • Can anyone with a strong CS background comment on this?
  • The long dead corpse of the Ada mummy shall awaken... And leave all hope thee enetring the university gates.

    I have seen what happens when someone desides to teach ada to freshmen... And it is an ugly sight...

  • There are a couple projects out there in the HPC community that are aimed at something like this. The main ones are the Globus project [globus.org] (mainly distributed computing services) and the Cactus project [cactuscode.org] (an application framework). I saw presentations on both last week at the HPDC conference, and while they still have a lot of work to do both are rather impressive.

    --Troy
  • A Task System and Item Architecture (TSIA) provides an application with a transparent reliable, distributed, heterogeneous, adaptive, dynamic, real-time, interactive, parallel, secure or other execution.

    Ouch. I'm getting buzzword sickness. Anyone remember the first descriptions of Java? I think this beats it for buzzword density.

    This, as far as I can tell, is a generalized description of things like RPC and [w3.org]CRL [mit.edu]. Distributed computing is a cool idea, but (there's always a but) you still need to write your programs to be distributed. This thing seems to be basically about a distributable threading model ("tasks" == threads?). I'm pretty sure the day will never come when we can transparently run applications that are not designed to be distributed on many machines at once, but it'd be nifty if someone came up with a system that allows coders to not worry too much about distibuting their program, and still have it work well.
    ----------------------
    "This moon-cheese will make me very rich! Very rich indeed!

  • If it runs as some user with no permissions to do anything (such as, say, 'nobody') then there's no problem. Well, assuming you're in a UNIX which isn't rootshellable from the 'nobody' user. (Hint: there's a reason most well-behaved daemons run as 'nobody'.)
    ---
    "'Is not a quine' is not a quine" is a quine.
  • Maybe I should have been more specific before. This is taken from the tutorial introduction...

    Is parallel programming difficult? Many programmers complain that architecture dependencies, confusing notations, difficult correctness verification, and burdensome overhead make parallel programming seem tedious and arduous. These barriers cast doubt on the practicality of parallel programming, despite its many potential benefits.

    Compositional C++ (CC++) was designed to alleviate the frustrations of parallel programming by adding a few simple extensions to the sequential language C++. It is a strict superset of the C++ language so any valid C or C++\ program is a valid CC++ program. Conversly, many classes of parallel CC++ programs can be simply rewritten as equivalent sequential programs. For these classes of programs, the developement path can be very similar to that of the equivalent sequential program. This compatibility with the C and C++ languages facilitates the transition to the task of parallel programming for users knowledgeable in those languages.

    CC++ extends C++ with the following eight constructs:

    blah blah blah

    Despite the simplicity and the small number of extensions, the conjunction of these constructs, when combined with C++, results in an extremely rich and powerful parallel programming notation.

    The richness of this language is reflected in its suitability to a wide spectrum of applications. In fact, CC++ integrates many seemingly disparate fields:

    blah blah blah

    All of the object-oriented features of C++ are preserved in CC++. These features (especially generic classes and inheritance mechanisms) promote the reuse of code, structure, and verification arguments. Thus, CC++ provides an excellent framework for the developement of libraries and for the use of software templates in program construction. This reuse is especially important and useful in the context of parallel programming.

    -ElJefe

  • What you're describing is still explicitly identifiying available parallelization. That it's part of the language doesn't alter the fact that you're telling the compiler where it can parallelize the code.
    Christopher A. Bohn
  • You can try and think about it as just another optimization, but it won't work very well. Many programmers (especially younger ones) do a *very* poor job of writing code that can be easily adopted for parallel executions: loads of globals, poor structure, etc, etc, etc. When you try to automatically parallelize this, you end up with lots of locks all over the place, and performance, even in a distributed system, will certainly go down. Everything ends up serialized anyway, and now you have the communications and locking overhead on top of everything else.

    Plain and simple fact is that parallel programming is not easy to get right even now (I *know* this, having spent *way* too much time debugging such things), and automating it is NOT going to help anyone. Garbage in, parallel garbage out. You *have* to think about parallel execution, from the beginning, if you want to have any chance at decent parallel performance, either explicitly or via some kind of automatic conversion. I know that optimizing compilers can do a lot of good, but no one has yet convinced me that they the can make up for poor, serialized, designs.

    Pete
  • Burkhard, the guy who is running the tsia website, and one of it's only strong supporters, was out a few weeks ago for a presentation. I was profoundly unimpressed.

    The paragraph of buzzwords was mentioned by another /. poster, so I won't dwell on it. It was not well backed in the presentation though. When pressed for more details on security, for example, Burkhard was unable to do much more than mumble about Fortran and never gave a reasonable (e.g. security related) answer. Even when I threw him the soft ball of Java Applets and their sandbox as being either an analogy to what he was doing or a direction to head he seemed oblivious to the point.

    He is seems completely convinced that TSIAs are the next big thing in computing. Specifically the relationship implied in 'The Third Advance in Application Support' should be taken literally. He thinks that what has happened since the compiler is basically tool using, but TSIAs are such a profound paradigm shift that they represent a complete redirection of development.

    Page 2 of paper.pdf has a great line at the top. "TSIAs suitable for most applications don't exist yet" If a TSIA approach is to be taken seriously as the next great advance then why is it application specific? Seems like to be on par with OSs and compilers you would, at least, need to be general purpose.

    Check out page 9 of paper.pdf. The TSIA paradigm does not allow inter-task communication ("During its execution, a task does not communicate with other tasks"). That pretty much settles the issue for most jobs. Burkhard says that almost all tasks can be completed without communication, I remain unconvinced.

    Page 18 "As described in section 2, a TSIA provides an application with transparent[,] reliable, distributed, heterogeneous, adaptive, dynamic, real-time, interactive, parallel, secure or other execution". Section 2 only treats that ranging subject to the extent of saying look at the next section. Then section 3 has a list of applications that are alleged to be TSIAs (including Apollo) without explanation for how they answered the laundry list. Hint, the TSIA architecture doesn't accomplish any of the buzzwords, but a task oriented approach (and a sufficiently motivated programmer) can be used to implement any of them so by extension TSIAs provide those capabilities. At least that is the arguement in the paper.

    On the list of computing revolutions, TSIAs are pretty far down there.

  • while it may be infeasible for a compiler or OS to do this for you automatically (at least today), aspect oriented programming may be a way to "weave" distributed operation into your code semi-painlessly. IIRC, there has been some work already done on using aspects for distributed computing.

    http://www.parc.xerox.com/spl/projects/ aop/ [xerox.com]

    The guys at Xerox have already developed aspect extensions to Java, which look to be quite powerful.

  • I've been seeing this promise for years. There are a number of stumbling blocks...

    The problem has to lend itself to parallel computation. This means most simple inline code with simple branches doesn't take advantage of parallelism. So the program has to be written to break the problem into small chunks which can be processed asynchronously in parallel. The program has to re-assemble the results into a cohesive whole, re-calcing the missing bits as needed.

    For multi-machine parallel processing, there has to be a whole suite of network communication protocols. These protocols have to ensure parts are distributed correctly to waiting machines, and valid results are returned. On top of that, you have to re-assemble the parts before returning them to the calling program, since the processing power of individual nodes is generally unknown, forcing results to be returned at random times. There also has to be a mechanism to duplicate the work sent to a node in case no answer is returned within a reasonable amount of time.

    There was a parallel computing model called Linda put out in the early 1980s which tried to take advantage of networking. The idea was for something like distributed.net, with hundreds of machines all participating in parallelism. Some machines would be designated as Compute Servers, basically Crays sitting on the networks for any spreadsheet to take advantage of. The designers were eventually overwhelmed with the logistics keeping track of all the outstanding queries and responses. The overhead in the main controlling program grew expotentially as large problems were spread to hundreds of other machines, and eventually the main machine was doing nothing but keeping track, not working on the problem at hand.

    Parallism is a different mindset from sequentialism (turing machines), where every large problem has to have clearly defined interfaces between each subset of small problems. It works for cracking MD5 keys, because each subset is a range of keys, and for S@H with their time and frequency ranges.

    the AC
  • I maybe totally wrong but here is what I know from my own experiences.

    I write some numerical simlulations of heat flow from time to time which are run on a Orgin 2000 array. I write my code normally and then compile it for multiprocessors support. The way this was explained to me seemed to imply that the complier makes the code multiprocessor capabable.

    I.E. I don't have to do anything differently I just recompile for multiprocessor. In any case it seems to works.
  • Less radical, but perhaps more practical:

    Cosm [mithral.com] is distributed.net ex-president Adam Beberg's current project. It's a distributed computing architecture which is sort of a generalization of the distributed.net software. This is the project formerly known as distributed.net "v3", for those of you who remember it.

    It's being designed to work with a keyserver/proxy/client model, like the existing d.net structure but somewhat more flexible. Anyone will be able to set up their own network for private projects. Clients will be driven by drop-in "cores" that contain the actual processing code for specific projects. There are also features planned not currently in d.net, like a distributed filesystem and built-in security through cryptographic signatures.

    The project is in design/early implementation stages currently. It's being developed under a quasi-open source license (it wouldn't meet the Open Source definition, because it's restricts modification too much). The CVS tree is open to anonymous access.

  • It seems to me like this person is just asking:

    How about if we make something... circular.. we may even call it round.... And maybe someone out there has a thin thing we could put 2 of these "round" things on at both ends, and then I think it would be a great idea to combine several of these systems to produce something that would move. I call this movement - rolling.

    Is this guy Al Gore? If he's not, is he going to come up with some idea about a large network of computers across the globe?
    I think this guy just needs to learn how to search for things before he goes off asking. Then again, why does Roblimo think that this is something new?
  • All great things will come with Time .
    Give it a chance . This looks like a prgogram that is still in it's theory stage .
    I am glad that people are talking about it now .
    In five years it may be posssible to have a program that wasn't WRITTEN for paralell computing to be easily recompiled for parallel environments . Or it may be possible ( once these minor syntax differences have been hammered out ) to have somehting like apple had for their move to Macintosh : a program that guides programmers through the porting process .
    I am excited about the developing thoery !!
    Your Squire
    Squireson
    squireson@bigfoot.com
  • Well, no, some things can be identified by a specialized compiler/translator.

    For instance, in a single task world:

    for (i=1;i100;i++) {
    array1[i]=array2[i];
    }

    would actually execute 100 times. Why not, then, instead of executing 100 times, execute one 100 processors! Sure, this is a trivial application, but imagine BigHonkingObjects[1].copyTo(array[i]), or a distributed QuickSort or MergeSort or Binary search? Were things are constant over time, they can be parallelized. It doesn't matter if you search or copy the lower or upper half first, it just needs to get done, and neither is dependent on the other. C++ operator overloading of course throws a gigantic monkey-wrench into this though.

    This would be much nicer in Java ;)
  • The idea is so old, CNN even has a story on it. And we all know how long it takes the mainstream just to acknoldlege these things exist.

    http://www.cnn.com/TECH/computing/9908/09/parall el.idg/

  • Does anybody out there actually think that this would be a simpler programming consideration than multithreading? Theoretically it should be possible to pass a packet to a machine that states "do this processing with this data" and have it return results when it's done. A small server would dole out the packet to the next avaliable "Processing Request Protocol" machine available, possibly with a little flag requesting a minimum speed for the machine (don't want to send big tasks to small machines) Programming for this would have to take into account similar considerations as for multi-threading because the farmed out processes would lock down resources until they returned. The biggest consideration would be dynamically determined memory items, which would require a protocol call -- a WHOLE lot slower to process than merely dereferencing a pointer. I wouldn't suggest this for any language that doesn't already have extensions for thread-safe operation, but beyond that it should be just another library.
  • You're right. A simple loop with no interloop dependences is something a parallelizing compiler should be able to handle on its own. The problem is such loops are often only a small part of the application, and Amdahl's Law tells us that the speedup we get by parallelizing an application is a function of the fraction of the code that gets parallelized. It would be a waste to through 100 processors at this application if your parallelizing compiler could only parallelize 2% of the code.
    IMO, to get near-optimal parallelization, you need the insight only available to a human programmer.
    Christopher A. Bohn
  • That was using Zilla, a cool little program that predated SETI and Distributed.net by five years or more. Still wasn't automagic, however.
  • IBM already has a product that does this, or something very like this. It's called LoadLeveler and it runs on AIX (duh), HP-UX, Solaris, and IRIX. Special account is taken for an RS/6000 SP. (you know, like Deep Blue?) It's reasonably mature, too, with the latest version being 3.0, I think. Obviously it isn't as inherently keen as, say, an open-source product with an industry-standard API for everyone's favorite penguin-mascotted (can I verb 'mascot?' why not) or lil'-devil-mascotted OS.

    Unfortunately AFAIK it only supports parallel-type ops of the divide-and-conquer variety described here on the SP, not on the other supported machines. And you also have to write the app to their API, which is anything but industry-standard.

    (Disclaimer: I consult to IBM - in SP support, no less - but I'm not trying to plug their stuff. Just pointing out that this isn't really a new idea per se.)
  • Check out page 9 of paper.pdf. The TSIA paradigm does not allow inter-task communication ("During its execution, a task does not communicate with other tasks"). That pretty much settles the issue for most jobs. Burkhard says that almost all tasks can be completed without communication, I remain unconvinced.

    As well you should be. Saying that tasks can't communicate pretty much restricts you to embarrassingly parallel problems and what I call "serially parallel problems" (eg. doing parameter studies with a single-threaded code by running all the different cases on separate CPUs/machines simultaneously). Real parallel programs need to communicate.

    --Troy
  • by HeghmoH ( 13204 )
    Cosm [mithral.com].
  • As I see it there are two main obstacles to making this scheme practical. The first, as many people have mentioned, is that automatically parallelizing a serial code is a nontrivial task, even when the underlying problem has a lot of parallelism.


    For instance, I once had the task of parallelizing a subroutine that used an expansion in spherical harmonics to calculate the gravitational potential on the boundary of a fluid dynamics grid. There is a lot of parallelism in this problem; you can parallelize over the mass integrals, over the boundary points, or both. Balancing these advantages, however, is that you would like, as much as possible to reuse the sin and cos terms that recur throughout the expansion (rather than recalculating them every time they crop up). Consequently, there are a lot of scratch arrays and temporary variables scattered throughout the code, as well as weird loop orders intended to allow coefficients to be precalculated and reused.


    The code was to run on an SGI Origin 2000 (an SMP machine), and there was an automagic parallelizing compiler for the machine. To make a long story short, the auto-parallelizer failed miserably on this piece of code. It turned out to be easier to recode the algorithm by hand to run in parallel than it would have been to tweak it so that the compiler could figure it out "automatically." You can expect these problems only to get worse as you try to adapt the automatic parallelizing system to work with distributed machines. (Indeed, to my knowledge all automatic parallelizing systems run on SMP systems.)


    Still, I'm hesitant to say "never." A programmer hand hacking assembly in the 60s might well have been equally skeptical that optimizing compilers would ever outperform human assembly programmers; yet here we are. However, there is another, IMO, more fundamental problem with "all-purpose" distributed computing. The tasks into which you divide the program have to be large enough to cover the overhead of farming them out to the remote hosts. This is not a problem for SETI@home because the 24 hours or so of computing that you get on each block easily dominates the time required to fetch the block. But, how many people work on problems that break down so neatly into huge tasks with no communication requirements? For most people's calculations the communication latency will likely overwhelm any parallel gains you might hope for. Consequently, I expect that heterogeneous distributed computing will for the forseeable future remain exclusively the province of large, special-purpose, compute-intensive software written specifically to work in that environment.

  • My main reason is for encoding cd's at 256kbit, which take admittedly a short amount of time to encode, but not exactly a 1 minute for 1 minute of encode time.. It will use blade encoder as the encoder, and be accessible through win32 clients, although I believe I'm only going to write the "server" portion to run on Linux... Being new to PERL, I'm going to try to use it to be cross platform...
  • Well, that's mostly true. But I think you guys are getting stuck in the details. There are a number of interesting ways to use NN's to simulate intelligence. An easily paralleizable (??) method is known (I believe) as voting: in this scenario, instead of having one large Net with many interconnections, you have many smaller nets, each of which comes to a decision on an output and "votes" on the total output. There's been some work done on this (can't find the references anywhere) and IIRC it seems that this works -- in general -- better at lower numbers of total nodes than a monolithic NN.

    In any case, I just wanted to point out the an NN-like scheme can be devised that works well in a parallel environment.

    stevec
  • I hate to break this to you but some purely functional programming languages can already do this. Haskell is a Lazy, Purely Functional programming language with strict typing. More information on Haskell can be found at www.haskell.org [haskell.org]. The GHC compiler [microsoft.com] is capapable of executing Haskell programs in paraller with out any additional work on the programmers part. However, it is not very efficent right now.

    From the c.l.functional FAQ [bton.ac.uk]: Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands. The expressions in these language are formed by using functions to combine basic values. A functional language is a language that supports and encourages programming in a functional style.

    Progarmming in a purely functional language does require a very diffrent style of programming, however.
  • ...as does pointer aliasing, doesn't it? (e.g. if, say, array2 actually pointed inside array1...).

    Perhaps that's a point in favor of explicit hints to the compiler; in, say, straight C, if both array1 and array2 are pointer types, there's not a clear guarantee from the snippet that the assignments are all independent. As you noted, ditto for C++ operator overloading, too.
  • This entire story should never have been posted. I mean, why talk about something which has been in the literature for many years? The concept of writing a program and not letting it know if it will be run parallely or serialy is a old topic in computer science.

    The problem is you don't have to be in computer science to program, which I personally see as a problem... Every course I take I look back a year or two and think "wow, man, i was dumb then". No doubt in 1-2 years I'll look at today and say the same thing. When I say "dumb" I mean "write things in silly, inefficient ways" and using brute force rather than elegant theory.

  • Partially OT, but at the end I think you'll see why I had to use this example.

    Global War, THE final combat game you will ever play.

    I'm thinking of a massive distributed game server with dynamic on the fly level creation foot soldier level FPS with a Big bad-ass flight sim while the generals direct your actions from a strategy type console. Imagine you and your squad of F/A 18 hornets are escorts to B52s on an early morning bombing run over enemy territory. You're out in front with 5 wingmen to the left and 5 more to the right. All of a sudden on the ground 6 shoulder fired SAMs launch. Even though you've done your best to evade them you lose 3 wingmen to them.

    Then all of a sudden AA guns light up the sky. Before you can launch an AARAM and take out their radar you lose 2 more wingmen.

    Just when you think that the worst is over, you fly into MiG soup. You and the 5 wingmen you have left confront 10 MiG 29s.

    Somehow at the end you and your one surviving wingman finish off the last of the MiGs and get the bombers safely to the target. They let loose and the once modern industrial city below you becomes a 10 mile wide open sore of fire and rubble.

    Meanwhile somewhere else, you're on guard duty just inside the coast line of your country. The sirens sound and you know that an air raid is coming. You sit down at your station and check your radar display and you see a few dozen aircraft moving in towards you who don't respond properly to the IFF query that you just sent them. You begin to lock on and prepare to fire your radar guided SAMs and all of a sudden you see that one of the escort fighters has fired an anti-radar missile. You have two choices. Shut off your radar and let them pass or eat the missle.

    You kill the radar and let the missle crash into a hillside safely away from you. You get on the horn and let the airbase 250 miles to the northwest know of the incoming enemy bombers...

    Elsewhere, you're a marine. You're finishing up a combat sweep inside the borders of the enemy nation. You were just a corporal until a Dragonuv sniper rifle was used to ventillate the chest of your Sarge. You and your unit let loose with the hot stuff. M60's and M16's riddle the side of a crumbling blown in half building while you call in for an artillery strike. The boys with the BIG guns keep firing until the've turned 4 city blocks into a sandbox. The sniper has to be dead, the building he was in and even the whole damned block aren't there any more. You're safe, for the time being.

    Now you've got 20 scared pissed off fresh-out-of-paris-island marines looking to YOU for leadership.

    The generals & admirals pick and choose strategic targets that will best assist them in the war effort. At the top of each food chain is the soverign of each nation who not only decides on what rules shall be adhered to but negotioates to end the fighting (of course all demands made shall be unreasonable just to keep the fighting going)

    I'm thinking about a game where 200 people in at a time would be lame. I'm thinking about a game where 500 people is a minimum to play a decent game. I'm thinking about a game where just the comunication between the distributed servers would saturate a T3. This is the type of thing that could capture the imagination of the ordinary moe so that distributed computing will be understood and supported by the masses.

    If it's done through modules that can be plugged in and unplugged again, you can change everything from the type of airplanes to the ammo the tanks can fire.

    Something like this would need to be done of hecka fast distributed machines of perhaps quantum computers because of the amount of data involvd and the number of variables which must be calculated. In a situation like that you'd be happy to take all of the computing horsepower you can get and heterogenous interoperation is the best way to get it.

    I'm thinking about a game that nobody would ever write because it would be too expensive to code because they'd never get their investment back.


    LK
  • There are several toolkit libraries that allow for this already. There are experimental and not so experimental OS's that shift a process among multiple nodes of a cluster. I see some of the claims akin to Ford putting out a release saying, "Hi, we building cars now."

    The holy grail of parallel programming is a compiler or run-time that can detect a highly iterative task with little or no depencies on the results of previous iterations and farm those tasks to other CPU's without the use of tags in the code or special function calls. The best I can imagine is start with specially designed compilers to emit code that attempts to parallelize every loop and assume the loop will be non-dependent unless it fails a dependency at run-time or something obvious at run-time.

    As far as networking the task goes, given some of today's easily available technology (Beowolf) we've really seen that it maks little difference if the CPU is on the same motherboard or in another case attached by a network. In fact, networks are getting fast enough I predict it won't be long until the CPU bus residing on a network becomes useful in some applications.
  • I've toyed with the idea of parallely recompiling a Linux kernel. Picture using 10-20 older computer that would each chew on it's own piece of the kernel, and then the final putting together would be done by a master host.

    Is there an easy / existing way to do this or is it just one of the weird dreams only I get ?



    Sun Tzu must have been running Linux...
    - Hold out baits to entice the enemy. Feign disorder, and crush him. (Sun Tzu, The art of war)

  • Unless you identify the parallel nature of your software, then this won't be able to help an individual application (though it might help with distributing multiple apps similar to what you can see with SMP). No OS can be expected to identify the parallel nature on its own. The programmer must do it, either with threading or with message passing. You might try a parallelizing compiler, but they have only limited insight into possible parallel nature, and cannot be expected to provide near-optimal parallelization.
    Christopher A. Bohn
  • I think largely parallel processing would involve very different programming languages and more importantly very different ways of thinking about problem solving. The language I use has an inherent looping structure to it. Most languages have an inherent sequential nature to them. To efficently utilize parallel processing you don't do it that way. Even language processing is only partially adapated to parallel processing.
  • Oh, yeah -- nearly forgot. Another issue is that software that provides good performance on SMP can actually run slower on a distributed system than it would on a uniprocessor, due to the communication overhead introduced by the network.
    Christopher A. Bohn
  • > Most languages have an inherent sequential nature to them.

    Ach! Daß Von-Neuman bottleneck!!!
    -- ----------------------------------------------
    Vive le logiciel... Libre!!!

  • And there's always the Legion Project [virginia.edu], which is the successor to research (Hydra) started in the late 1960s.
  • I know, bad form to reply to self, but an example will clarify much.

    Sorting
    sequential: buble sorting is fastest
    parallel: quickly examine the list and determine how mixed it is, apply a metric to determine number of processors to use optimally, break into optimal? number of lists and sequentially sort each, reissue a metric and redeploy processors for merging the lists, now sequentially check the main list
    When it works it is much faster, but the algorithm is several magnitudes more complex.
  • Or are they finding ways to teach programming without pointers (!)?

    Java, perhaps? No pointers there, right? *smirk*

    I guess they only have to explain how two different references can really be the same object. But some students even get hung up on that.

  • I think that MOSIX is a kernel patch. One downloads the 2.2 kernel, downloads the MOSIX changes, and recompiles the kernel. By definition, this is a sourcecode kernel patch. MOSIX isn't a module, but that's different. Also, I do believe that MOSIX works with process threads, and doesn't require any special programming; that is, it will work with all applications, MOSIX-aware, or not.
  • Not quite. Mosix *is* a kernel patch. It does require a specific version of the kernel (2.2.9 last time I checked), although so do most patches :)

    Also, Mosix will transparently migrate processes from busy machines to others in the cluster with more idle cycles available. The processes in question aren't even aware of the change, and they most certainly don't need to be designed to be parallel-processing.

    Run two rc5des clients on the same box, for example, and it'll move one of them to another machine on the cluster with a lower loadavg.

  • I think a neat distributed project would be some sort of AI-system. Neural nets and GAs are suited for parallel projects, so it would be really cool if we could get a few thousands computers hooked together in a neural net. I'd be curious to see how many processors would have to be hooked up before it got really could at natural language processing or playing Go or something.

    Oddly, I had this idea while sipping coffee right before I took a look at /. this morning =)

    -dl


  • Sure: it's what we in the Computer Science community refer to as "old news." I wrote a Ph.D. on language-based parallelism of this kind more than 12 years ago, and even then the field was pretty mature.
  • Have researched into automagically parallelising programs.

    One approach was to develop a compiler which attempted to parallelise at the statement level, rather than the procecure level. From what I understand, this worked (kind-of), but the message overheads were very high.

    A second approach was to manually parallelise as far as physically possible. The OS would then automatically simulate the "ideal" number of processors over the physical number. (There is a subtle difference between this, and simply allocating threads in a round-robin fashion. For a start, it's adaptive.)

    To the best of my knowledge, though, conventional parallel processing use PVM or MPI, which can be royal pains. Give me a Transputer and Occam any day. In any of these cases, though, adaptive parallising is not possible. The program is either written for N processors, or it isn't.

  • Damn! That should read 'really good at natural...'

    OH! That's what the preview button is for =)

  • While not exactly of the same style as the article (um, threads?), we've created a distrubited batch system called Condor [wisc.edu] that uses an oppertunistic model of batch queueing. In it we can run job-parallel (i.e. you run 100 jobs that all do a part of the problem), or PVM (Parallel virtual machine - message passing).

    That said, if it was really nice and easy to make a program parallel, it would be done by now. The closest thing I've ever seen is some nifty magic done by the SGI power-compilers to take advantage of SMP machines... but thats only for one machine. That doesn't even touch inter-machine problems.

    Now, if you are re-writing your code from scratch (as the paper seems to indicate that you'll have to), then by all means you can use an existing package (most likely MPI or PVM) to get the parallelism that you are looking for...
  • One of the biggest stumbling blocks that distributed computing runs into is the ability of the programmer to break the task up into independent subtasks. Most programmers are trained to program in a monolithic, serial manner. Many system programmers can break out of this by further subdividing tasks into threaded execution. To fully take advantage of this capability, the programmer has to be able to break the process down into queued, discrete tasks that can be handled by "anyone" and farmed out to an available compute unit (thread or CPU). The only type of generic programs that lend themselves to compiler generated distributed processing are large array processing types of things like the big old Fortran array processors of the past. Even there, SOMEONE had to generate the library that actually does the computation distribution.

    This is really a technology enablement, not an end user solution.
  • Plan 9 on Earth yet another B movie
  • Most of the previous comments seem to make criticism of the poster's vague generalism over implementation. I think the more interesting point is the idea of running some small module that used your computer's processing time for ANY program as opposed to RC5 or SETI. This is just an idea, but I'd like to see people's responses. Consider if each participating computer ran a tiny servlet (similar to the RC5 crackers or the SETI processors) that ran code during idle cycles. From a global point of view, the entire system is a widely distributed parallel computer. Fine, so let everyone have access to it. Administer a system where one gets an account on the system as a whole, and runs programs on it. This could be very useful for large scale simulations where computing mass is a critical point. That this is doable with current systems is beyond a doubt. As long as it happens in idle cycles, I really don't care what my computer is doing (and considering its a dual celeron 500, its got spare cycles to boot).

    The immediate concern is computing congestion. I'm sure if every Tom, Dick and Harry wanted to run his or her gigantic overly complex simulation on the worlds biggest parallel network, cycle response would get sluggish indeed, so perhaps some arbiter is need to chose who gets access.

    If anyone's read the novel "Permutation City," the author's solution to congestion was to make the system a pay-per-cycle one where people bought computing time.

    I'm not sure to what extent the logistics of the implementation would affect the simplicity of the essential idea, but I think it'd be great if the desktop computers of the world contributed to the world's largest supercomputer in its idle time..
  • mp3 encoder eh? last time i checked mp3 encoding was too patented to really do anything worthwhile.

    but i may be wrong.
  • anyone familiar with cosm? (do a search on google) a guy who split off of distributed.net is creating a general-puspose protocol for doing this. also jini doesn't split things up but it makes it easy to move code...just wanted to mention these as no one else has.
  • It is so confusing and sideways, though. I had no problem with C, C++, Java, any of that, but Scheme killed me. It does teach fabulousways of looking at problems (and recursion), but there's still all of the useless syntax crap. And if you deal with C/C++ syntax crap, you can at least use it later. With Scheme, you can't really use the useless syntax crap later when you're coding for some sort of work related project...


    itachi
  • NNs don't work well on parallel computers, and even worse
    with distributed computing (e.g. a network of PCs). The
    reason is that each node does some very simple processing
    (e.g. adds, multiplies and passes through a simple function).
    So the communications overhead is so massive that you
    don't gain very much at all - in fact you would probably lose.

    Parallel computing is suited to tasks which can be broken into
    many chunks, each of which takes substantially longer to
    process than it takes to send the tasks over the communications
    infrastructure. These include RC5 (tiny data - a range of keys -
    but each block needing lots of number crunching) and
    SETI@home. With a neural net you'd congestion-collapse
    your network and the CPUs would be 99.9% idle.

    - Daniel
  • I say, that's rather defeatist logic, isn't it? It hasn't been done, ergo it can't be, and the first law of Computer Science is the Law of No Progress?


    LOL! To quote Guy L. Steele (who was working at Thinking Machines at the time), "just because parallel computing is the only answer [to getting past Moore' Law] doesn't mean it is an answer."


    People have been trying for decades to do what the paper describes, at the cost of other research. Compiler-generated massively fine-grained parallel code is one of those great vaporware holy grails (like natural language processing) that gets people riled up but never materializes. Coarse-grained parallel projects like distributed.net (and its more generalized fork, the name of which escapes me) are more feasible, and have direct benefits for math and science researchers. "Weak AI" has gained popularity over "strong AI" for good reasons, hopefully coarse-grained projects will continue to gain similar popularity over fine-grained parallelism.

  • here ya go;

    http://lorenzini.com/jlorenzi/encoder.ht ml [lorenzini.com]

    Hope that does it for you...

    Found it in 1 minute with Google [google.com].

  • I'm working on such a project. It's a distributed virtual machine that features automatic parallelization. It's currently in the very early stages of development, but so far it does parallelization of several test programs such as Fibonacci number calculation and others...

    The homepage is at http://www.sslug.dk/TONS/, but you may find my report (finished earlier today) available from my homepage at http://ostenfeld.dk/~jakob/ an interesting read as it has a lot more information about what we're actually doing.

    In short, we define a new language, a scheduler and some node servers to make up the distributed virtual machine, and we're in business.

    However, it will take some time before this project becomes usefull though. But I beileve that we will be able to achieve very good performance (often comparable to hand-parallelized Fortran/C) in a large number of scientific computing applications. Read the report for the real argumentation :)
  • There is a nice online tutorial [haskell.org] (the language definition [haskell.org] is, of course, also online).and if you are capable of reading German, there is a second tutorial [uni-bonn.de] - and after you have got a hang of the basics, some of the academic papers [haskell.org] papers may not seem as weird as before ;-)

    Haskell has some sophisticated (and of course free) compilers, but for learning the language and for small applications, the interpreter Hugs [haskell.org] is IMHO more suitable.

    Chilli

    PS: Though Haskell is a super-cool language, unfortunately, it also does not magically solve the program of parallel programming - the programmer still has to devise algorithms that contain suitable parallelism (IMHO, this is not a problem of the programming language, but a fundamental restriction given todays - and probably also tomorrows - machine architectures).

  • The literature is full of attempts at systems like this. Some of them fairly old. This kind of modification needs to be thoroughly researched before it's just tossed out willy nilly. It's likely that a good deal of insight can be had by just checking the journals.
  • But with simple constructs added to languages, having the compiler create parallelized code would be much more trivial. For instance, when you want to do something 100 times in C, you would do

    int i,foo[100],baz[100];
    /* stuff in here */
    for (i = 0; i /* Some stuff in here not dependant on */
    /* Previous iterations */
    /* We'll add foo and baz */
    foo[i] += baz[i];
    }

    Now, that's silly. You don't want to do it for i=1, then for i=2, then for i=3, you just want to have it done 100 times, for values between 0 and 99. Adding two arrays, for instance, or zeroing them out. So, we need a construct such as

    multido(i,0,99) {
    /* that stuff in here */
    foo[i] += baz[i];
    }

    Now, in most current and traditional program designs, this would still be non-trivial, as the compiler would have to start up two threads, let them process, wait for them to shut down, and so on, but in an architecture where overhead was less (pre-existing threads where you just pass code for them to run, maybe?) things like the above could be extremely beneficial.

  • Distributed computing is a cool idea, but (there's always a but) you still need to write your programs to be distributed
    Yes and no. You don't need to think specifically about where your code will run if you use certain languages/frameworks like Erlang [erlang.org] or this thing. Ponder this: If functional programming had always been the norm, might we consider the jumps from single threading to multithreading, from UP to SMP, from local computing to distributed computing to be simply neat compiler/interpreter optimizations that in no way command a change in the way we design and implement applications?
    --
  • by Erich ( 151 )
    That for loop should read

    for (i = 0; i < 100; i++)

  • heh. "any program you write" is a little bit inconsistent with "minor syntax changes" unless you write everything for the parallelizing compiler. I'm afraid that, if the programmer isn't designing the program explicitly for parallel execution, performance will suck. Add me to the skeptics list too, please. ;)
  • to me it makes not much sense to ditribute mp3 encoding (of a single file) over a network, as it can be done fast enough today IMHO-
    what really matters is encoding large numbers of tracks , which in fact can be done quite easily using some little perl scripts & NFS, each system/CPU could process a single wave file.. have a look at my "spoon" (@ freshmeat [freshmeat.net]) for simple workload distribution.
    xris

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...