Twitter Not Rocket Science, but Still a Work in Progress 111
While it may not be rocket science, the Twitter team has been making a concerted effort to effect better communication with their community at large. Recently they were set-upon by a barrage of technical and related questions and the resulting answers are actually somewhat interesting. "Before we share our answers, it's important to note one very big piece of information: We are currently taking a new approach to the way Twitter functions technically with the help of a recently enhanced staff of amazing systems engineers formerly of Google, IBM, and other high-profile technology companies added to our core team. Our answers below refer to how Twitter has worked historically--we know it is not correct and we're changing that."
Re:And Twitter is... (Score:3, Insightful)
Obligatory xkcd (Score:5, Insightful)
Re:That's "effect", not "affect" (Score:2, Insightful)
Re:Big Brother(s) (Score:5, Insightful)
Re:Big Brother(s) (Score:4, Insightful)
Like the AC said, I think you're wildly exaggerating how ideological workplaces are, particularly from the point of view of a server monkey.
Re:No mention or Ruby/Rails? (Score:1, Insightful)
Re:Plurk seems pretty stable so far (Score:3, Insightful)
Re:Big Brother(s) (Score:3, Insightful)
Re:It's the algorithm, stupid (Score:3, Insightful)
Good scalability is not about how fast something processes, it's about how much the speed degrades as the load increases. It sounds simple - but it's NOT.
Re:It's the algorithm, stupid (Score:3, Insightful)
Re:It's the algorithm, stupid (Score:3, Insightful)
The language you choose may affect your ability to scale when you take its concurrency model (or lack thereof in some cases) in to account. For instance, I can have a O(1) algorithm, using a hashmap, but that doesn't mean that I'll be able to have the runtime performance of constant time. For a solid example, let's use Java (Java 6, with java.util.concurrent, as this is the concurrency framework I'm most familiar with).
I can have a ConcurrentHashMap (a regular HashMap with a concurrency wrapper) with constant time access (get and put). However, every time that I modify the internal structure of the HashMap, every other thread that is accessing the HashMap at that time is kicked out. That means I've got my optimal algorithm, but due to competition for resources, it'll have the runtime performance of an algorithm chugging along at exponential time if everyone wants to write to that data structure.
Granted, there are ways around this, but you can't just throw hardware at the problem and pretend it doesn't exist. In reality, more hardware raises the communication overhead to the process. Your language of choice may or may not scale at the same pace as others when you take in to account your concurrency needs and perhaps the backend you are given to interface. All that being said, I'm not sure how varied the runtime characteristics of various languages (say, an interpreted JIT versus native machine code) scale WRT each other. I would suspect that a JIT would have much more overhead when you layer it on top of an OSes concurrency model (handling locks within your own code, the JIT handling resources, and the OS doing the same, all at the same time... probably with a database and filesystem doing so at varying degrees). Of course, I could be wrong.
Re:It's the algorithm, stupid (Score:4, Insightful)
If your whole web application is bottlenecked by one hashmap, you're going to run into scalability problems as soon as you need more than one machine anyway. On the other hand, If the performance of the web application as a whole does not depend on the hashmap, then your argument is irrelevant to the scalability of the application as a whole.
I concede that a more efficient runtime environment might make better use of the same hardware, supporting, say, 70 clients instead of 50 per machine. But that's not the kind of scalability I'm talking about. Even a platform that achieved only one client per machine that scaled linearly would be better than one that handled 70 clients per machine, except that you were limited to one machine.
And yes, on one machine, a bad choice of data structure can affect scalability. But the blame for that rests on the data structure itself, not the language in which it is implemented. As an associative array, a Python hash table (dict) will scale far better than a C linked list. Why? Because one's a hash table and one is a linked list!
Which data structures are available in which language might factor into the choice of language, but it's only a convenience: you can always create your own data structure implementations.
Creating a scalable application means being able to throw hardware at the problem.
Let's assume you've gotten your application to scale beyond one machine anyway. That's a prerequisite for this section.
Now, if the machines don't communicate and users don't care, you automatically win O(N) scalability.
If your machines must communicate, they do so over some kind of network. The way this communication is achieved determines the scalability of the application. While some environments might have more intuitive network facilities than others (think Erlang), ultimately one can use any approach to networking with any language.
Again, we're reduced to choice of data structures and algorithms, not language, as the marker of scalability.
The choice of language does not dictate the data structure the designer of the application uses, and so the language is not a serious barrier to scalability. I concede it may be more difficult to implement efficient protocols in some languages than in others, but we're dealing with turing-complete languages here, aren't we?
I should note that languages typically thought of as "slower" are often more expressive. It often takes less effort to write efficient algorithms in expressive languages.
(Returning to our previous example, since writing a hash table is more complex than writing a naive linked list in C, a C programmer is more likely to use a linked list at the expense of scalability. In Python, using a hash table is as simple as writing {}, so an equally-skilled programmer is more likely to use the more efficient data structure, resulting in better performance in a "slower" language.)
The bottom line is that if communication between nodes is required, complexity must be > O(N). And if complexity is greater than O(N), then as N increase without bound, the communication overhead approaches infinity anyway. The key is to make that growth as slow as possible.
The tools and techniques used to slow that growth --- thinking about the problem, designing efficient algorithms --- are features of the human mind, and not any particular language.
Saying that one language is better at scaling than another is like arguing that one human language is better for building cars than another!