The London Stock Exchange Goes Down For Whole Day 792
Colin Smith writes "TradElect, the Microsoft .Net based trading platform for the London Stock Exchange, was offline for about seven hours, meaning that their 5-nines SLAs are shot for approximately the next 100 years. The TradElect system was launched back in June of 2007 and was designed for increased speed and system capacity."
The London Stock Exchange Goes Down For Whole Day (Score:5, Funny)
...now if only my wife would do that! /rimshot!
Re:The London Stock Exchange Goes Down For Whole D (Score:5, Funny)
nudge nudge, wink wink.
Using Microsoft for a 5-nines SLA? Is that a joke? (Score:5, Informative)
That was the their first mistake. What were they thinking? You need a 3 highly available Unix clusters with three SANs. You need three to elect a quorum. If you don't know what a quorum is you shouldn't be attempting to design system that is supposed to deliver on a 5-nine SLA. Each geographic location should include 1 cluster and 1 SAN. All three locations networked with dark fiber. fiber routing should be set up so that a cluster can fail over to a SAN in another location. As far as Hardware is concerned, I would go with a cluster of IBM P6-570 and use an EMC Symmetrix DMX SAN at each site. .Net trading platform.. I have to laugh! Microsoft .net = 5.none SLA! .Net is only good for people who would like to create a light duty website. Under a load it breaks. The London Stock Exchange proves my point.
Who the heck designed this?
Re:Using Microsoft for a 5-nines SLA? Is that a jo (Score:5, Interesting)
WTF did a moderator mark this as flamebait? The poster was right, HA is a) hard and b) expensive.
I designed some of the HA stuff many years ago for Eurex [eurexexchange.com]. We used OpenVMS and had two clusters (over 40Km apart) for the main and standby with the standby system also being used for development with a flick of a switch the standby cluster could take over in production. We had no SANs in those days but used Digital's Hierarchical Storage Controllers. These days it runs with SANs but the host systems still run VMS and there are now product specific clusters.
The next level down there are access points containing communications servers providing connectivity to member systems and routing to the hosts which are scattered around the globe. A member normally has connectivity to two access points. The only single point of failure for a member is where both lines come together for the last few metres into their building and some idiot digs a hole in the road.
Re:Using Microsoft for a 5-nines SLA? Is that a jo (Score:5, Informative)
I work in London as a freelancer in IT in Investment Banking. My professional experience was mostly with IT Products/Services companies.
Although I haven't worked in the LSE, from the places I've worked in around here I came out with the impression that most people in IT in this industry are amateurs (and that includes those in other geographical locations).
Any kind of more advanced IT concepts such as technical analysis, software/hardware architecture, iterative software development processes are pretty much either not done or done by people you don't have clue about what they're doing.
I'm hardly surprised with what happened in the LSE.
Re:Using Microsoft for a 5-nines SLA? Is that a jo (Score:5, Insightful)
I have a feeling that the 'normal' IT situation was to blame for this.
Preamble: Technical Expertise provided a wonderful architecture that was HA and robust, fast, and scalable.
Bean Counters looked at the cost and said "You Tech guys spend too much money."
IT architects: "How much is your data worth?"
Bean Counters: "Not this much. Look we don't really need all of these systems. My home system has been working for 4 years with no problems. And I've talked with Microsoft Execs and they will cut us a deal for their platform. Now go away, I've just decided how the architecture will be done. Why did we hire you anyways?"
Re:Using Microsoft for a 5-nines SLA? Is that a jo (Score:4, Insightful)
Um, wrong - Ever heard of the Mono Project?
Mono provides the necessary software to develop and run .NET client and server applications on Linux, Solaris, Mac OS X, Windows, and Unix.
http://www.mono-project.com/ [mono-project.com]
Glad you fell for Microsoft's marketing campaign. There is a reason they don't crush mono. It gives a illusion that there is choice. Name me
Re:The London Stock Exchange Goes Down For Whole D (Score:5, Funny)
If she's down then who will do the dishes and laundry? How do you reboot her? Does it really take 7 hours? Don't they make drugs for that?
What.. what's a wife?
It's like a mother, but requires less therapy.
Re:The London Stock Exchange Goes Down For Whole D (Score:5, Funny)
What.. what's a wife?
WIFE: Specialized form of WIFI, indicating one of two stations engaged in a (semi-)permanent point-to-point link, the other station typically called HUSBAND. Unsecured transmission often leads to packet loss 9 months after initial association, resulting in long-term elevated QoS requirements. Roaming is usually forbidden by link protocol, although experiments with mesh networks have been reported. DOS attacks often lead to severed links, litigation and possibly material and financial damages.
Re:The London Stock Exchange Goes Down For Whole D (Score:5, Funny)
Maybe he meant rim-job...
That's okay (Score:5, Funny)
most of the american stock exchanges have been going down all year.
Re:That's okay (Score:5, Funny)
most of the american stock exchanges have been going down all year
My wife did that once. Nearly killed me. Come to think of it, it was just after we signed up for the life insurance ...
Re:That's okay (Score:5, Funny)
Ya laugh, but I've been trying to describe this to people all day...
"The FTSE has crashed!"
"What, like another Black Monday?"
"No, no, crashed, as in gone down!"
"Errr..."
99.9967% Uptime if up the next 100 years (Score:5, Informative)
Re:99.9967% Uptime if up the next 100 years (Score:5, Funny)
Yeah, because they turn it on when trading starts and turn it off when trading ends.
Oh, my. (Score:4, Interesting)
So what happens when this happens again?
Re:Oh, my. (Score:5, Funny)
The same thing that happened this time?
Re:Oh, my. (Score:4, Insightful)
Ah. Some blamecasting, after which everybody pretends it had never happened?
Re:Oh, my. (Score:5, Funny)
So what happens when this happens again?
Well, first "Have you tried turning it off and on again?"
Otherwise, "Are you sure it's plugged in?"
Re:Oh, my. (Score:5, Insightful)
Re:Oh, my. (Score:5, Funny)
Well, it looks like it's hosed. You should probably reinstall the OS.
Re:Oh, my. (Score:5, Interesting)
Actually this is "again".
The LSE used to run on HP-NonStop (w/ Cobol and C as far as I can find) but still managed to take itself down for 8 hours in 2000.
If they're going to go down for a day every 7-8 years it might as well be cheaper and faster. (Articles quote the CTO as citing 10x performance increases).
(All based on a quick google search)
So before the hounds descend upon Microsoft it would seem the LSE has a history managing to bring down whatever system they run on.
Re:Oh, my. (Score:5, Funny)
Re:Oh, my. (Score:5, Informative)
Which from the sounds of this article http://www.computerweekly.com/Articles/2008/06/12/231031/agile-trading-software-critical-to-london-stock-exchange.htm [computerweekly.com] was the intent.
One very interesting note is at the end of the article:
Timeline for Tradelect upgrades
18 June 2007: Tradelect launched, reducing the time taken to process trades from 140 milliseconds to 10 milliseconds. Capacity increased from 593 to 2,500 orders a second.
November 2007: Version 2 upgrade. Trading time reduced from 10 milliseconds to about 6 milliseconds. Capacity increased by 70% from 2,500 to 4,200 orders a second. Introduced full suite of Mifid-compliant services.
September 2008: Planned migration of Italian trades to Tradelect platform.
September 2008: Tradelect Version 2 to launch. Plans to double trading capacity to 10,000 continuous messages per second. Aims to cut average time taken to complete a trade by half from 6 milliseconds to 3 milliseconds.
Coincidence that this month was when they intended to release a new version?
Re:Oh, my. (Score:4, Interesting)
I think its all about network latency - ie the marketing machine says 3ms, but they are referring to the time taken to get the message to the stock exchange's switch.
A Computer Weekly [computerweekly.com] article (and its first link) explains it - basically, they replaced the old networks with new fibre-based ones and colocated servers for brokerages.
Latency is kinda pointless for this kind of stuff (Score:5, Insightful)
I mean, that might be what they worked on, but it's kinda pointless; what's interesting is the # of transactions per second, and that can usually be improved at the expense of individual latency. For example, databases can be configured to wait a few milliseconds to group transactions, so as to write several to disk in one single write/sync.
Re:Latency is kinda pointless for this kind of stu (Score:4, Insightful)
Based on this description, seems to me that "arbitrage" is a nice word for inserting yourself into a trade which has nothing to do with you for the purpose of bleeding both the seller and the buyer out of some profit without producing or contributing anything of value. Making it more difficult would make the actual productive parties in the trade better off, and likely help economy as a whole.
Or, to put it even more bluntly: arbitrage, as described by you, is a nicer name for parasitism.
Re:Battery-backed write through cache (Score:5, Insightful)
Oh, yes.. battery backed write cache. With batteries produced by the lowest bidder. The warranty is for 3 years, and the battery lasts just that long before silently failing. When the power goes, well you really didn't need that data written to disk on your database server, did you?
We now do not allow any server to be put into production with any kind of write cache on it. Ever.
Re:Oh, my. (Score:5, Insightful)
Re:Oh, my. (Score:5, Insightful)
I couldn't disagree more. Although automatic garbage collection is nice, this doesn't mean that you'll get "five nines uptime" systems by working with "less experienced" coders.
If you're building a system that must guarantee 999.99% uptime, you wait until your best professionals become available, because it doesn't only involve code. You DON'T give the job to the less experienced ones, no matter how great the programming language. Five nines uptime requires a very robust design and very solid code quality running on a very solid platform which is running on a very solid OS on a very solid infrastructure. You'll want everything to be tested by unit tests, integration tests, regression tests, and whatnot. That involves a whole lot more than 'just' coders, but whoever works on it, they better be good at it.
Re:Oh, my. (Score:5, Funny)
999.99% uptime: The system never crashes, and after you turn it off, it keeps running 9.9999 times as long as you had it running.
Tee Hee (Score:5, Insightful)
Re:Tee Hee (Score:5, Insightful)
I also, long ago, used to believe that language features could improve software reliability. Nowadays the idea just makes me cackle
Why? Certain languages have features that eliminate large classes of errors. Whilst its possible that programmers will find other ways to screw up, I'd have though that reducing the set of errors that are actually possible would go some way to improving reliability.
Out of curiousity, what languages are you familiar with? Have you worked much in languages with very tough compile-time checks, like Haskell?
Re:Tee Hee (Score:5, Insightful)
I also, long ago, used to believe that language features could improve software reliability. Nowadays the idea just makes me cackle
Why? Certain languages have features that eliminate large classes of errors. Whilst its possible that programmers will find other ways to screw up, I'd have though that reducing the set of errors that are actually possible would go some way to improving reliability.
Out of curiousity, what languages are you familiar with? Have you worked much in languages with very tough compile-time checks, like Haskell?
Y'know, I agree with the grandparent. On my first coding job there was a guy (Chris Burton) who'd worked on the Manchester Mark One [computer50.org]. He was retirement age when I met him. We had a new model of inkjet printer, which had a new processor none of us had ever seen before. It printed characters, we needed it to print bitmaps.
Chris took the datasheet for the printer and the datasheet for the processor home on the train with him, and came back next morning with new code for the printer PROM written out - in opcodes, not assembler mnemonics - in longhand on a pad of paper. That code was blown into the PROM and worked first time, and continued to work without any errors reported for the three years I was on that project.
Programmers like that just don't seem to exist any more. Automatic memory allocation, bounds checking, type checking, etc. are great technology, and I wouldn't choose to live without them. But they mean we are all sloppy and careless, because we can get away with it, and when humans can, they do.
Re:Oh, my. (Score:5, Funny)
Well, that gives a new meaning to opening Windows to Dungeon Dimensions.
Ugly Day (Score:5, Informative)
It was an ugly day of finger-pointing and near-fixes, but in the end, it just left all the financial firms standing there staring at the Exchange. Definitely was a big deal--and it seemed like a lot of volume spilled over to US markets, creating volume related issues here.
MS should hurry up and patent.... (Score:5, Funny)
.... a method of controlling the market.
Patch Tuesday (Score:5, Funny)
Reliability? (Score:5, Funny)
Looks like someone needs to brush up on their buzzwords, specifically "mission critical" and "services no longer required".
Re:Reliability? (Score:5, Funny)
Looks like someone needs to brush up on their buzzwords, specifically "mission critical" and "services no longer required".
More like "Would you like fries with that?" and "Would you like to upsize?"
single page (Score:5, Informative)
I wish people would get into the habit of linking to the single page version of the FA [reuters.com].
Misleading summary (Score:5, Informative)
Re:Misleading summary (Score:5, Insightful)
Why the heck they were using MS Windows for this type of environment is stunning... Transactional processing which is the bulk of this type of setup is where Solaris and Linux excel. Any company that builds a system like that on .Net should be thown out on the street.
In short.. Not to rock on Windows, but different platforms always offer different strengths..
Re:Misleading summary (Score:5, Insightful)
As is normally the case M$ threw lots of money at the exchange to get it to switch unix/linux base to windows net so that M$ can tout that a major exchange is running windows.
Full page ads touting the switch and the reasons they cited were better through put and better up time.
They even had ads touting it here on /.
Re:Misleading summary (Score:5, Interesting)
Yep, I remembered and laughed so hard I had to put the images next to each other:
http://tipotheday.com/2008/09/08/microsofts-foot-in-mouth-london-stock-exchange/ [tipotheday.com]
Re:Misleading summary (Score:5, Insightful)
No... Actually I deal with this everyday. Windows is great for places where you need desktop apps or such. It also does well when you must have generic developers for web development.
Where Unix/Linux/BSD truly shines is on back office type transactional processing. There are many reasons for this, and have a long history at doing exactly this. Meaning, mainframes may not have every been considered sexy, but they ran critical systems in companies for decades with very little problems... Actually they built such a reputation that when they failed most instantly assumed it was a hardware failure... Working on them, however, takes a more polished developer...
Re:Misleading summary (Score:4, Insightful)
The point is you shouldn't be running mission critical systems on new and shiney (it's bound to have bugs) you should be running it on old and reliable (or at least where the bugs and workarounds are well known)
Re:Misleading summary (Score:5, Informative)
You have no clue. When people mention Linux in these environments they mean Linux running on one of these [ibm.com], not a home-brew distro running on a $150 PC.
Choice quotes (Score:4, Insightful)
Nick Illidge [onwindows.com] Financial Markets Sales Manager at Microsoft UK "We are delighted that the London Stock Exchange has selected the Windows platform to base a significant part of its business on. This is further evidence of the enterprise scalability of the Windows franchise. We see our relationship with the Exchange and Accenture as a strong partnership. The Exchange is bold in its technology vision, Accenture provides the capability to deliver this vision, and Microsoft is providing the core technology to help provide the business benefits that the Exchange is looking for."
David Lester CIO at the LSE says [advancedtrading.com] ... that the LSE "is the only exchange in the world not to have had a single outage in six years."
"This is all about the question, 'How are we going to take over the world?'" says Lester, "... I believe this system -- because it's fast, agile and reliable -- will help us compete better. Our current system has to go down for four hours every evening to get ready for the next day's trading," he says. "The batch processing is '80s and '90s technology. You can't run a global market with a system that has to be down for four hours."
Here [londonstockexchange.com]'s a great factoid
Before joining the Exchange in 2001, David worked for Thomson Financial and Accenture.
Re:Misleading summary (Score:5, Informative)
Internal? Dual(+) homed servers, redundant switches, redundant AC, redundant power.
External? BGP on 2 or more transits on separate physical runs.
What, you say that you need to account for natural disasters? Then get a second site, at least a few hundred miles away, and repeat.
Virtual 100% uptime is a solved problem in the networking world.
Re:Misleading summary (Score:5, Interesting)
The Johannesburg Stock Exchange, which uses the LSE's trading platform TradElect, also suspended trading.
Hmm. Smells like a new version to me.
Re:Misleading summary (Score:5, Informative)
Re:Misleading summary (Score:4, Insightful)
I thought unfair advantage was the whole point of capitalism...I have it you don't! what kind of communists run the place?
Re:Potentially misleading summary (Score:5, Informative)
Well, the Reuters article does say that trading started normally, but some traders were unable to connect, so the whole exchange was bought down to avoid unfair advantage/disadvantage occurring, so actually both stories are consistent.
performed as expected... (Score:5, Funny)
"and was designed for increased speed and system capacity"
and see - it went down far faster and more completely than the previous system would have been able to. So that's progress. It's all in how you present it.
5 nines? (Score:5, Funny)
So their 9.9999% uptime is screwed?
Re:5 nines? (Score:5, Funny)
Maybe they should shoot for 9 fives instead. When the problem is too hard, just lower the goal posts.
Nothing taxes can't fix (Score:5, Funny)
After the malfunction, TradElect was immediately bought by UK's government for $200 billion and all its debts waved. In an unrelated story, medicare tax was raised yet again because of an unexpected shortfall.
What, no ads? (Score:5, Funny)
Does anyone else remember the "The london stock exchange chose windows 2003 for reliability, they didn't choose linux" ad banners that used to run all over the place, including slashdot if i remember?
Funny how it's all come crashing down...
"The london stock exchange chose windows, but after 7 hours of downtime wishes they had chosen linux".
5-nines SLA (Score:5, Informative)
"5-nines SLA"
I had to look this up, so I imagine other people didn't know it either (I thought was was a stock exchange term). First Google search result reveals the answer,
The Battle With "3 Nines" and The Goal of "5 Nines" [cubiccompass.com]
ketan (Score:5, Interesting)
Quote .NET (Score:4, Funny)
It's official. (Score:4, Insightful)
Let me explain computers to you. See, the developer uses a set of platforms, languages, integration components, etc.. to deliver his functionality to the end user. A failure at any level can cause the application to fail. It could be application logic, network issues, hardware issues, integration with third party systems, a dipship systems administrator, etc...
And yet the 90-105 IQ SlashDweeb set comes out in numbers with no data and says "lolz Windoze! .NET haha!". Crikey.
Re:It's official. (Score:4, Insightful)
Well, no, it's just that Microsoft shouted long and hard about how reliable the LSE would be now it was running on Windows Server System 2003. So it's deliciously ironic that after all this trumpet tooting, it still fell flat on its face, regardless of the reason...since Microsoft's ads were obviously to get everyone to believe that the system would be highly reliable.
Re:It could be .. but wasn't :) (Score:5, Insightful)
Why did the upgrade fail, I guess is what an intelligent person would ask. You haven't asked that. You've hilariously assumed it's .NET or Microsoft's fault.
As a matter of like for like, I'm going to assume it was because some Linux dweeb walked in and tripped over a network cable. Ergo, I now claim Linux dweebs are clumbsy oafs who should be banned from computer rooms.
Comment from an affected trader: (Score:5, Funny)
President of Exchange: [Randolph Duke has just collapsed with shock] Mortimer, your brother is not well. We better call an ambulance.
Mortimer Duke: Fuck him! Now, you listen to me! I want trading reopened right now. Get those brokers back in here! Turn those machines back on!
[shouts - it echoes pathetically throughout the trading hall]
Mortimer Duke: Turn those machines back on!
Get The Facts (Score:4, Informative)
"In the past six years, there have been no production outages at the London Stock Exchange, and the new systems running on Microsoft technologies are critical to maintaining this 100 per cent reliability record."
http://www.microsoft.com/casestudies/casestudy.aspx?casestudyid=200042 [microsoft.com]
Re:Get The Facts (Score:5, Insightful)
Right from your article "and be cheaper to manage"
sounds like the LSE fired expensive. knowledgeable admins and went for 'cheaper' ones, there is your problem right there. windows server isn't perfect, but clearly they had good hardware, were running mission critical apps, but went with cheaper less experienced admins.
also, your fine article specified there were 'no production outages', they don't claim the system ran 24/7/365 with no reboots or glitches, but that there was no production outages for six years. there is quite a bit of difference. the former states that admins and hardware were able to offer the specific services needed at the time it was needed for 6 years, but not on the amount of redundant hardware, etc required to accomplish everything.
so given everything i've read here, under experienced windows admin approves an under tested system upgrade that epic fails, and takes down the production server for the first time in 6 years. no shock here, they wanted to cut corners on admin costs, they brought the epic fail on themselves.
Re:Get The Facts (Score:5, Interesting)
Interesting since they haven't been "running on Microsoft technologies" for "the past six years"...
Bad upgrade (Score:5, Informative)
Re:Bad upgrade (Score:5, Insightful)
this same kind of thing( replace *nix with Windows ) is what took out the LAX comm system a few years ago and left dozens and dozens of airplanes in the air and on the ground at/over LAX without communications.
What blows me away is that for years, UNIX systems were one of the defacto standards for mission critical OSs. Along comes a marketing company, Microsoft, and people are saying it is capable of mission critical use even when there are constant disruptions from virus attacks, Ctl-Alt-Del and BSoD are a well known features, and any of a hundred other reasons it is NOT ready for mission critical systems.
What kinds of morons are running the show anyways? And it is about time people start getting fired for this junk. From my experience on operating systems, UNIX was the one OS where when you wrote code, you dealt with the business logic/code and not OS issues. Only once in a blue moon did an OS patch or structure tweak get in the way of coding the application(s). OS/2 was pretty good but not as good as UNIX and Windows was the worst. Gawd, I still hear people complaining about that little Windows Mobile OS crashing. They can't even get a small chunk of code working properly let alone the behemoth that is the Windows desktop and server OS.
LoB
Link to incident status page (Score:5, Informative)
Notice that there were several unsuccessful attempts to bring it back up.
What's really pitiful, LSE has just a fraction of data/trade volume of major US exchanges like Nasdaq or NYSE and still, their systems are regularly getting hosed, albeit not as much as today's meltdown.
Hopefully in coming years LSE will lose market share to Nasdaq/Europe, BATS/Europe, Chi-X and other electronic markets - that should teach them well.
Back in the day - Not only London (Score:5, Insightful)
IIRC, Brazil Bovespa had a small glitch last month or two.
Back in the day when Wall Street and financial markets ran on Solaris systems (AFAIK), this shit wasn't common.
Now it's probably going to become *acceptable* for stock exchanges and aviation reservation software to crash.
Apparently, there's a new generation of a-holes on the system administration markets who grew up with Windows and the Blue Screen of Death, that thinks it's acceptable for operating systems to crash, once in a while. Is it evolution?
Re:That's some strange math... (Score:5, Insightful)
Since when is 7 hours even close to "a whole day"? Maybe you meant "almost a whole business day"?
It's a whole trading day--and that's all that really matters when it comes to a major market.
Re:That's some strange math... (Score:5, Funny)
Well, I'm a state employee, and I can tell you that a few 7 hour days in a row would outright kill me.
Re:Still don't know why... (Score:5, Insightful)
Re:Still don't know why... (Score:5, Insightful)
Wait! Are you suggesting that downtime can be caused by application problems, network problems, hardware problems, dumbass systems administrators and a whole slew of other things completed unrelated to the platform on which it is running?
I am *shocked*! *Shocked* I tell you!
A critical system shoud be RELIABLE! (Score:5, Insightful)
It's about the same thing when people say that "XP does not crash, it's faulty device drivers that crash".
If a system should be reliable, then it should be reliable, no excuses accepted. It does not matter if it's system bugs, application bugs, hardware failures or power outages, a system that pretends to achieve 99.999% availability should take all that into account.
The operating system is not at fault if the power goes down, of course, it's a sloppy engineer that designs a system without redundant power supply. But, likewise, a sloppy engineer will prefer a system that lets him configure and operate it by click-and-drag, instead of a carefully designed and tested set of procedures.
A critical system should NEVER depend on an operating system that does not have a proper batch language. That should be a compact and powerful script language, using TEXT files for configuration that can be hand edited if needed, that can be stored and archived in a version control system, so that bugs can be tracked.
Re:Still don't know why... (Score:5, Informative)
Re:Good lord, they're running on Windows? Why? (Score:5, Insightful)
Oh please. Persuasive marketers can get Windows installed just about anywhere including US war ships.
While it is commonly accepted by many techies (and strongly denied by others) that Microsoft Windows is not a suitable platform for that level of computing, sales people often bypass the techies who know better and sell to managers and executives who still believe "you can't get fired for using Microsoft."
With all this said, it will be quite some time (and possibly never) that we will ever know for certain what is at the root cause of the failure. You can be sure that Microsoft is all over this problem both technically and P.R.-wise. They won't let the facts get out if they are damaging. Recall the major power outage that many still believe was caused by a worm attacking Microsoft servers? As far as I can see, the true cause of that failure has yet to be revealed.
But if this was a planned event, or an unplanned disaster resulting from a planned event gone bad (updates, upgrade, other maintenance), you would think they would have provided for mishaps in some way or another.
But as this news story is all I have to go on, there is no indication of cause and so I will not presume this is a Microsoft problem. But it says a lot that NYSE runs on Linux and not Microsoft. It seems SOMEONE did listen to the techies.
Re:Good lord, they're running on Windows? Why? (Score:5, Insightful)
Perhaps the bit you're missing is that windows isn't quite as bad as the /. crowd likes to say it is. Especially if its an older (translation: fixed & stable) variety like win2k or even nt4.
I'm not sure if you're serious or not, but surely you aren't trying to compare NT4 uptime with the 5 9s of a solid System z platform?
Re:How many failures before.. (Score:5, Informative)
Also he said support was crucial for his company. If something went down, he wanted to be able to call someone immediately. He couldn't afford to just post a question on a message board and hope someone replies. He wanted contracts with 3rd party support that had experience with similar huge enterprise systems that he had.
When I said there were companies who could provide excellent Linux support, he said his ass was on the line if something broke so he wanted to be able to justify his software choice to the the C-level guys. And those guys knew the name Microsoft. So he didn't see anything else as an option.
Re:How many failures before.. (Score:5, Insightful)
In other words, he used the "no one ever got fired for buying IBM" defense.
Re:How many failures before.. (Score:5, Informative)
No, but I can point to the New York Stock Exchange, which uses AIX and Linux [techtarget.com].
Re:How many failures before.. (Score:4, Informative)
Off the top of my head, I know that all the LiffeConnect-based systems (London Financial Futures Exchange, EuroNext, Amsterdam, CBOT Metals Complex, Tokyo Futures Exchange, probably a couple of others) run on Linux (a relatively recent change from Sun boxen). NYSE now owns that codebase, and I'm pretty sure that the NYSE uses Linux and AIX on its own platform.
The Chicago Mercantile Exchange's GLOBEX trading engine (running CME, CBOT non-Metals, NYMEX plus a couple smaller exchanges like Minneapolis and Kansas City) platform runs on Linux. They migrated from Solaris to Red Hat back in 2004.
The Intercontinental Exchange's WebICE platform is written in Java and I believe it's running on Linux, but there may be some Solaris still around.
The CBOEdirect system is Java but runs mostly on Sun Enterprise hardware. There is some Linux in the mix, and they certainly use it on some of their other trading systems.
In the (futures and options) trading world, running on Windows servers is considered to be a sure sign of being bush-league. Demand for UNIX/Linux is huge. And I'm not saying this as a Java/UNIX/Linux snob - most of the systems I've written were Microsoft-based (for a variety of reasons - most started out as technology demonstrations that grew way beyond their intended lifespan - "the client's always right").
Re:100 years? (Score:5, Funny)
Guess that depends on what hours it is supposed to be working doesn't it?
c/o User Friendly
"Sid, Stef
- Stef: How reliable is our network?
Sid: As far as our customers are concerned, five nines.
Stef: What does "five nines" mean?
Sid: 99.999% uptime.
Sid: Wait... Why?!
Stef: So would "reliable to nine fives" in our newspaper ad be not very good?"
To be fair (Score:5, Informative)
Of course it is very unlikely that MS achieves five 9s on any installation, let alone as an average.
Re:100 years? (Score:5, Informative)
5 nines does not mean what you think it means.
No, you're right. By my calculation, the actual figure is more like 360 years.
(Remember, this is a system that only operates 7.5 hours per day, 250 days per year)
Re:100 years? (Score:5, Funny)
Nah.
They'll be back at "5 nines" by next week.
The trick is to either redefine what the term means (so they are actually referring to 9.9999% uptime), or the timeframe (we've been at "5 nines" for the whole year" - said Jan 1 2009), or both ("so, we use 1 day as a data point, then if we've been up for any part of that day, we're good... so we've always operated at '5 nines' reliability")
Re:100 years? (Score:5, Insightful)
It's called framing and it is making public debate in western society increasingly difficult.
Re:100 years? (Score:5, Interesting)
In business, generally it means that solution provider (software + hardware) bears direct responsibility for all unplanned downtimes.
If solution cannot provide such service availability, the solution provider has to be ready to cover all the damages. And it is often planned that way from day one: some downtimes are covers by the "5 nines", some are covered monetarily by solution providers.
That's why 5 nines solutions cost as much as they cost: on one side to allow providers to bring quality of solution to desired level, on another side, in case of emergency, to let them to cover some downtimes with money.
But covering seven(!) hours(!!) can be lethal to the solution provider. But again, it all depends on their support contract. Some (cheaper) 5 nines are delivered without any guarantees: they only theoretically 5 nines and provide only "best effort" service availability.
Re:100 years? (Score:5, Funny)
which vista version are you using?
Re:What do Brits say when stuff like this happens? (Score:4, Interesting)
You've seen the first scene of "four weddings and a funeral", surely?
Re:It appears high load/usage crippled the system. (Score:4, Insightful)
No different then what can happen on a unix box I suppose.
Note that the current system is built around a large cluster of 2.2GHz servers, while the unix-based system it replaced (which coped perfectly happily with a substantial portion of the same traffic) ran from a smaller cluster of much slower servers.
The primary purpose for the new system, introduced less than a year ago, was to expand capacity. For it to have failed within a year due to lack of capacity basically means that it has failed in that objective.
Re:It appears high load/usage crippled the system. (Score:5, Insightful)
No, actually the Windows system (10 ms per transaction) was a 13x speedup over the older system (135 ms per transaction), followed quickly by an addiditonal 50% speedup (6 ms per transaction). The Windows system was just recently updated to double performance again (3 ms per transaction), so it's now 45 times as fast as the unix-based system it replaced.
You may be able to fault it on reliability (though the olde system wasn't perfect either), but you can't fault it on performance.
Re:It appears high load/usage crippled the system. (Score:4, Informative)
I'm not sure I understand the distinction you're trying to draw,
Latency versus throughput. If the new system processed those serially while the old could handle 130 in parallel, then the old system would be 10x faster even though the new was 10x quicker.
but total transaction capacity of the system increased along the same lines.
Yes, after throwing massive amounts of hardware at the problem.
Re:It appears high load/usage crippled the system. (Score:4, Funny)
6.40K transactions/second ought to be enough for everyone.
Re:In other NEWS... (Score:5, Informative)
No, he'd waggle his arse .
A fanny would be a vagina in Britain.
Come on +5 informative!
Re:d'OH (Score:4, Funny)
yes, but how many times [thewebsiteisdown.com] did they reboot it?
Re:Vietnam outperforms London (Score:4, Interesting)
Ok, so here's the tally I've seen so far:
- LSE today (7 hours downtime)
- Ho Chi Minh City stock exchange (3 days downtime)
- Brazil futures, BM & F, aug 26, 2008 and Bovespa Nov, 30th, 2007.
that I've heard of.
It's incredible! This looks systemic and widespread.
I guess it's a great marketing achievement for Microsoft.
When will people in the financial sector wake up and learn they've been duped?