Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Supercomputing The Internet

Collaborative Map-Reduce In the Browser 188

igrigorik writes "The generality and simplicity of Google's Map-Reduce is what makes it such a powerful tool. However, what if instead of using proprietary protocols we could crowd-source the CPU power of millions of users online every day? Javascript is the most widely deployed language — every browser can run it — and we could use it to push the job to the client. Then, all we would need is a browser and an HTTP server to power our self-assembling supercomputer (proof of concept + code). Imagine if all it took to join a compute job was to open a URL."
This discussion has been archived. No new comments can be posted.

Collaborative Map-Reduce In the Browser

Comments Filter:
  • Random Thoughts (Score:5, Interesting)

    by AKAImBatman ( 238306 ) * <akaimbatman@gmaYEATSil.com minus poet> on Tuesday March 03, 2009 @04:51PM (#27056153) Homepage Journal

    Two comments:

    1. He places the map/emit/reduce functions in the page itself. This is unnecessary. Since Javascript can easily be passed around in text form, the packet that initializes the job can pass a map/emit/reduce function to run. e.g.:

    var myfunc = eval("(function() { /*do stuff*/ })");

    In fact, the entire architecture would work more smoothly using AJAX with either JSON or XML rather than passing the data around as HTML content. As a bonus, new types of jobs can be injected into the compute cluster at any time.

    2. Both Gears and HTML5 have background threads for this sort of thing. Since abusing the primary thread tends to lock the browser, it's much better to make use of one of these facilities whenever possible. Especially since multithreading appears to be well supported by the next batch of browser releases [owensperformance.com].

    (As an aside, I realize this is just a proof of concept. I'm merely adding my 2 cents worth on a realistic implementation. ;-))

  • Pay Me (Score:5, Interesting)

    by Doc Ruby ( 173196 ) on Tuesday March 03, 2009 @05:21PM (#27056529) Homepage Journal

    If there were a couple-few or more orgs competing to use my extra cycles, outbidding each other with money in my account buying my cycles, I might trust them to control those extra cycles. If they sold time on their distributed supercomputer, they'd have money to pay me.

    As a variation, I wouldn't be surprised to see Google distribute its own computing load onto the browsers creating that load.

    Though both models raise the question of how to protect that distributed computing from being attacked by someone hostile, poisoning the results to damage the central controller's value from it (or its core business).

  • by wirelessbuzzers ( 552513 ) on Tuesday March 03, 2009 @06:28PM (#27057431)

    It would need to be 10000x at the very minimum.

    If a user downloads, say, folding@home, it's running all day, every day, on all cores of the machine, whenever the computer is on and idle, which is most of the time. The user doesn't have to remember to run it, doesn't have to devote screen real estate, attention and so on, and the program is less annoying because of its low priority and relatively low memory footprint (less boxing).

    Additionally, the 40x I cited was in the fastest available browser (Chrome), compared to a relatively slow implementation (OpenSSL), for code that doesn't benefit from vectorization (at least, not on x86-based processors). I expect that the difference between a scientific compute kernel in JS and in assembly would be at least 100x, maybe 200x or more.

    Let's suppose that everyone in your rosy world uses FF 3.1 with JIT. That's 3-5x slower than Chrome in my benchmarks; say 4x. Let's suppose that Chrome is 25x slower than unvectorized C, which is 4x slower than optimized assembly. Let's say people run the site 5 hours a day on one core for a week, but have their dual-core computers on for 10 hours a day, 90% idle and would keep folding@home installed for a year.

    Then the EXE is 4 * 25 * 4 * 2 * 2 * 50 * 0.9 = 72000x more productive.

    Use the right tool for the job.

  • by Nebu ( 566313 ) <nebupookins@NosPAm.gmail.com> on Tuesday March 03, 2009 @06:31PM (#27057463) Homepage

    Something to think about: If I'm sending names to your pc, what can I derive from that list without having the entire list?

    Frequency of each name? Frequency of characters in names? Bayesian probability of one character following another in names? Number of names of a particular length?

    Each worker would compute the stats for their chunk of work (the "Map" part of MapReduce), and then send the results back to the server to be aggregated (the "Reduce" part of MapReduce).

    Some of these may seem interesting, but then again, what interesting data can you derive at all from a list of names, even if you had the whole list?

  • Re:Random Thoughts (Score:3, Interesting)

    by maxume ( 22995 ) on Tuesday March 03, 2009 @08:45PM (#27058853)

    I found this somewhat startling:

    http://code.google.com/p/doctype/wiki/ArticleHereComesTheSun [google.com]

    If you create a javascript object named 'sun' (or several other names), netscape and family (including firefox) load java into memory.

  • by fractoid ( 1076465 ) on Wednesday March 04, 2009 @01:55AM (#27061247) Homepage
    This could be a possible way to generate revenue from popular websites... instead of selling something of such dubious quality as "advertising impressions", high-volume sites such as /. could support themselves by taxing, say, 10% of a viewer's CPU with an unobtrusive background thread, and selling the aggregated processing power to customers. I'd certainly be happier donating a percentage of my otherwise totally wasted CPU time to a site than having to read crappy ads for products I don't want.

"God is a comedian playing to an audience too afraid to laugh." - Voltaire

Working...