Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Technology

Forget Moore's Law? 406

Roland Piquepaille writes "On a day where CNET News.com releases a story named "Moore's Law to roll on for another decade," it's refreshing to look at another view. Michael S. Malone says we should forget Moore's law, not because it isn't true, but mainly because it has become dangerous. "An extraordinary announcement was made a couple of months ago, one that may mark a turning point in the high-tech story. It was a statement by Eric Schmidt, CEO of Google. His words were both simple and devastating: when asked how the 64-bit Itanium, the new megaprocessor from Intel and Hewlett-Packard, would affect Google, Mr. Schmidt replied that it wouldn't. Google had no intention of buying the superchip. Rather, he said, the company intends to build its future servers with smaller, cheaper processors." Check this column for other statements by Marc Andreessen or Gordon Moore himself. If you have time, read the long Red Herring article for other interesting thoughts."
This discussion has been archived. No new comments can be posted.

Forget Moore's Law?

Comments Filter:
  • BBC Article (Score:4, Informative)

    by BinaryCodedDecimal ( 646968 ) on Tuesday February 11, 2003 @09:38AM (#5278854)
    BBC Article on the same story here [bbc.co.uk].
  • Re:clustering (Score:5, Informative)

    by e8johan ( 605347 ) on Tuesday February 11, 2003 @09:53AM (#5278935) Homepage Journal
    Google supports thousands of user request sessions, not one huge straight-line serial command sequence. This means that a huge bunch of smaller servers will do the jobb quicker than a big super-server. Not only because of the raw computing power, but due to the parallellalism that is extracted by doing so and the loss of overhead introduced by running too many tasks on one server.
  • render farms (Score:3, Informative)

    by AssFace ( 118098 ) <stenz77@gmail. c o m> on Tuesday February 11, 2003 @10:12AM (#5279059) Homepage Journal
    google doesn't really do much in terms of actually hardcore processing - it just takes in a LOT of requests - but each one isn't intense, and it is short lived.

    On the other hand, say you are running a renderfarm - in that case you want a fast distributed network, the same way google does, but you also want each individual node as fast as freakin possible.
    They have been using Alphas for a long time for that exact reason - so now with the advent of the Intel/AMD 64s, that will drive prices down on all of it - so I would imagine the render farms are quite happy about that. That means that they can either stay at the speed at which they do things now, but for cheaper - or they can spend what they do now and get much more done in the same time... either way leading to faster production and argueably more profit.

    The clusters that I am most familiar with are somewhere in between - they don't need the newest fastest thing, but they certainly wouldn't be hurt by a faster processor.
    For the stuff I do though, it doesn't matter too much - if I have 20 hours or so to process something, and I have the choice of doing it in 4 minutes or 1 minute, I will take whichever is cheaper since the end result might as well be the same otherwise in my eyes.
  • Re:Misapprehensions (Score:3, Informative)

    by SacredNaCl ( 545593 ) on Tuesday February 11, 2003 @10:16AM (#5279089) Journal
    Mainframes don't have fantastic computing power either -- 'cos they don't need it. Yeah, but they usually have fantastic I/O -- where they do need it. Still a ton of improvements in this area that could be made.

  • NoW (Score:4, Informative)

    by Root Down ( 208740 ) on Tuesday February 11, 2003 @10:19AM (#5279104) Homepage
    The NoW (Network of Workstations) approach has been on ongoing trend over the last few years as the throughput achieved by an N distinct processors connected by a high speed network is nearly as good (and sometimes better) than an N processor mainframe. All this comes at a cost that is much less than that of a mainframe. In Google's case, it is the volume that is the problem, and not necessarily the complexity of the tasks presented. Thus, Google (and many other companies) can string together a whole bunch of individual servers (each with their own memory and disk space so there is no memory contention - another advantage over the mainframe approach) quite (relatively) cheaply and get the job done by load balancing across the available servers. Replacement and upgrades - yes, eventually to the 64 chips - can be done iteratively so as to not impact service, etc. Lots of advantages...

    Here is a link to a seminal paper on the issue if you are interested:
    [nec.com]
    http://citeseer.nj.nec.com/anderson94case.html

  • by DrSkwid ( 118965 ) on Tuesday February 11, 2003 @10:30AM (#5279186) Journal
    With all those hands [club-internet.fr]
  • by imnoteddy ( 568836 ) on Tuesday February 11, 2003 @10:59AM (#5279432)
    According to this article [nytimes.com] the issue had to do with both price and power consumption.

    From the article:

    Eric Schmidt, the computer scientist who is chief executive of Google,

    told a gathering of chip designers at Stanford last month that the computer
    world might now be headed in a new direction. In his vision of the future,
    small and inexpensive processors will act as Lego-style building blocks
    for a new class of vast data centers, which will increasingly displace the
    old-style mainframe and server computing of the 1980's and 90's.

    It turns out, Dr. Schmidt told the audience, that what matters most to the
    computer designers at Google is not speed but power -- low power, because data
    centers can consume as much electricity as a city.

    He gave the Monday keynote at the "Hot Chips" [hotchips.org] conference at Stanford last August.
    There is an abstract [hotchips.org] of his keynote.

  • a one of matrix inversion. well parts of it can't be done efficiently in parallel.

    Though the resulting matrix would probably be applied accross a lot of data and that can be done in parallel.

    A matrix inversion can be done very fast if you have a Very MPP system (say effectivly 2^32 processors!) like a quantum computer.

  • by oliverthered ( 187439 ) <oliverthered@nOSPAm.hotmail.com> on Tuesday February 11, 2003 @11:25AM (#5279667) Journal
    it is a cluster.... [jamstec.go.jp]

    640 processor nodes, each consisting of eight vector processors are connected as a high speed interconnection network.

    That makes it a cluster (640 processor nodes) of clusters (8 vector processors)
  • by minektur ( 600391 ) <junk@clif t . org> on Tuesday February 11, 2003 @11:28AM (#5279705) Homepage Journal
    Matrix inversion comes to mind -- it is very difficult to parallelize.

    I found a nice little read about how to decide if any particular problem you are looking at is easily parallelizable.

    It is in pdf (looks like a power point presentation).

    http://cs.oregonstate.edu/~pancake/presentations /s dsc.pdf
  • Re:clustering (Score:3, Informative)

    by Zeinfeld ( 263942 ) on Tuesday February 11, 2003 @11:38AM (#5279786) Homepage
    Google's approach is good for google. If Google would want to make good use of significantly faster CPUs, they would also need significantly more RAM in their machines (a CPU faster by a factor of 10 can't yield a speed-up factor of ten, if the network can't deliver the data fast enough).

    I think you have the right idea, slightly mistated. The crux for Google is that their problem is actually creating a huge associative memory, many terabytes of RAM. The speed of the processors is not that important, the speed of the RAM is the bottleneck. Pipelining etc have little or no effect on data lookups since practically every lookup is going to be outside the cache.

    That does not support the idea that Moore's law is dead. It merely means that google is more interested in bigger and faster RAM chips rather than bigger and faster processors.

    Long ago when I built this type of machine the key question was the cost of memory. You wanted to have fast processors because you could reduce the total system cost if you had fewer and faster processors with the same amout of RAM. Today however RAM cost is not a big issue, the faster processors tend to require faster RAM so you can make savings by having 10 CPUS running at half the speed rather than 5 really fast processors at three times the cost.

  • by Anonymous Coward on Tuesday February 11, 2003 @11:47AM (#5279852)
    Another hard to parallize problem is the N-body problem. Simulations the size of galaxies are still pretty tough to do. Remember ever star interacts with every other star.
  • by Troy Baer ( 1395 ) on Tuesday February 11, 2003 @11:47AM (#5279854) Homepage
    Check out who's on top of the TOP 500 supercomputers. US? Nope. Cluster? Nope. The top computer in the world is the Earth Simulator in Japan. It's not a cluster of lower end processors. It was built from the ground up with one idea -- speed. Unsurprisingly it uses traditional vector processing techniques developed by Cray to achieve this power. And how does it compare with the next in line? It blows them away. Absolutely blows them away.

    It's worth noting that the Earth Simulator is actually a cluster of vector mainframes (NEC SX-6s) using a custom interconnect. You could do something similar with the Cray X-1 if you had US$400M or so to spend.

    I recently read a very interesting article about this (I can't remember where - I tried googling) which basically stated that the US has lost it's edge in supercomputing. The reason was two fold: (1) less government and private funding for supercomputing projects and (2) a reliance on clustering.

    If you're referring to the article I think you are, it was specifically talking in the context of weather simulation -- an application area where vector systems are known to excel (hence why the Earth Simulator does so well at it). The problem is that vector systems aren't always as cost-effective as clusters for a highly heterogeneous workload. With vector systems, a good deal of the cost is in the memory subsystem (often capable of several 10s of GB/s in memory bandwidth), but not every application needs heavy-duty memory bandwidth. Where I work, we've got benchmarks that show a cluster of Itanium-2 systems wiping the walls with a vector machine for some applications (specifically structural dynamics and some types of quantum chemistry calcuations), and others where a bunch of cheap AMDs beat everything in sight (on some bioinformatics stuff). It all depends on what your workload is.

    --Troy
  • by emil ( 695 ) on Tuesday February 11, 2003 @12:27PM (#5280220)

    While you may validly question his business acumen, he has worked with RMS, JWZ, and knows everybody. He is a reasonable coder and a team player; we need more of him.

  • by po8 ( 187055 ) on Tuesday February 11, 2003 @02:03PM (#5281149)

    Can someone please give an example of a computing task that CANNOT be subdivided into smaller tasks and run in parallel on many processing elements?

    The technical issue here is known as "linear speedup". Take chess, for example: the standard search algorithm for chess play is something called minimax search with alpha-beta pruning. It turns out that the alpha-beta pruning step effectively involves information from the entire search up to this point. With only a subset of this information, exponentially more work will be needed: a bad thing.

    How do parallel chess computers such as Deep Blue work, then? Very fancy algorithms that still get sublinear but interesting speedups at the expense of a ton of clever programming. This is a rough explanation of why today's PC chess programs are probably comparable with the now-disassembled Deep Blue: the PC chess programmers can use much simpler search algorithms, concentrating on other important programming tasks, also a 10x speedup in uniprocessor performance has a 10x search speed increase, whereas using 10x slow processors isn't nearly so effective. Note that Deep Blue was decommissioned largely because of maintenance costs: a lot of rework would have to be done to make Deep Blue take advantage of Moore's Law.

    That said, many tasks are "trivially" parallelizable. Aside from the pragmatic problem of coding for parallel machines (harder than writing serial code even for simple algorithms), there is also the silicon issue: given a transistor budget, are manufacturers better spending it on a bunch of little processors or one big one? This is the real question, and so far the answer is generally "one big one". YMMV. HTH.

    (BTW, why can't I use HTML entities for Greek alpha and beta in my Slashdot article? What are they protecting me from?)

  • by pclminion ( 145572 ) on Tuesday February 11, 2003 @02:18PM (#5281296)
    Can someone please give an example of a computing task that CANNOT be subdivided into smaller tasks and run in parallel on many processing elements? The kind of task that requires an ever faster single processor.

    Computing the MD5 sum of 1TB of data. :-) MD5 depends on (among other things) being non-parallelizable for its security.

  • by Anonym0us Cow Herd ( 231084 ) on Tuesday February 11, 2003 @02:33PM (#5281429)
    the standard search algorithm for chess play is something called minimax search with alpha-beta pruning.

    This algorithm is something I'm familiar with. (Not chess, but other toy games in LISP, like Tic Tac Toe, Checkers, and Reversi, all of which I've implemented using a generic minimax-alphabeta subroutine I wrote.) (All just for fun, of course.)

    If you have a bunch of parallel nodes, you throw all of the leaf nodes at it in parallel. As soon as leaf board scores start comming in, you min or max them up the tree. You may be able to alpha-beta prune off entire subtrees. Yes, at higher levels, the process is still sequential. But many boards' scores at the leaf nodes need computed, and could be done in parallel. Yes, you may alpha-beta prune off a subtree that has already had some of your processors thrown at it's leaf nodes -- you abort those computations and re-assign those processors to the leaf nodes that come after the subtree that just got pruned off.

    Am I missing anything important here? It seems like you could still significantly benefit from massive parallel processing. If you have enough processors, the alpha-beta pruning itself might not even be necessary. After all the alpha-beta pruning is just an optmization so that sequential processing doesn't have to examine subtrees that wouldn't end up affecting the outcome. But let's say, each board can have 10 possible moves made by each player. I want to look 4 moves ahead. This is 10,000 leaf boards to score. If I have more than 10,000 processors, why even bother to alpha-beta prune? Now, if I end up needing to examine 1 million boards (more realistic perhaps) and I can do them 10,000 at a time, I still may end up being able to take advantage of some alpha-beta pruning. And 10,000 boards examined at once, sequentially, is still faster than 1 at a time.

    Vector processors wouldn't be any more helpful here (would they?) than massively parallel?

    Of course, whether a mere 10,000 processors constitutes massively parallel or not is a matter of interpretation. Some people say a 4-way SMP is massively parallel. I suppose it depends on your definition of "massively".
  • by Pig Bodine ( 195211 ) on Tuesday February 11, 2003 @04:39PM (#5282890)

    Inversion is always a bad idea due to the effect of round-off error. What you really want to do is to solve the linear system Ax=b using a stable method without actually computing the inverse of A. For solving Ax=b it is clearer to speak of parallelizable algorithms instead whether solving a general system is parallelizable; there are too many different types of matrices A. Some algorithms work well on some A and not on others.

    Guassian elimination is the one method that works on any system but it does not exploit any structure in A (presence of zero elements, etc.). Also, assuming that is what you mean by "matrix inversion", you are correct. It is not terribly parallelizable: the amount of communication required increases pretty quickly as you increase the number of processors until you don't really gain anything at all. And since it doesn't exploit the structure of A it is typically not very fast. Nevertheless, it's the slow steady method that will eventually give you a solution to a problem on which other methods fail.

    Iterative methods can easily exploit the structure of matrices A that come from partial differential equations. (This covers fluid flow and many other physics/engineering simulations). In some cases there are methods (e.g. domain decomposition) which can make the methods highly parallel. But whether iterative methods work well in parallel, or even whether they work at all, depends on the problem. These methods can be efficient when they work, but can sometimes require an expert to make them to work at all.

    In practice if you pick the right algorithm and have a bit of luck you might get a good speedup solving a linear system on a parallel computer. More often than not, unless you put a lot of work into matching your algorithm to a very specific matrix A, the gains from parallel solution will not be amazingly impressive. Having a faster computer that doesn't depend on parallelism is better. This was true even in the old days when supercomputers had fewer processors; in my experience the gains from vectorization were easier to get and more dramatic than gains from parallelization. Now that we try to divide up the problems on supercomputers with many more processors, the limits imposed by communication are even more significant.

2.4 statute miles of surgical tubing at Yale U. = 1 I.V.League

Working...