Turing Award Winner On The Future of Storage 227
weileong writes "Ars Technica highlights an interview at ACM Queue with Jim Gray, a winner of the ACM Turing award *(among other things) by one of the pioneers of RAID (among other things). Many issues touched upon, including: "programmers have to start thinking of the disk as a sequential device rather than a random access device." "So disks are not random access any more?" "That's one of the things that more or less everybody is gravitating toward. The idea of a log-structured file system is much more attractive. There are many other architectural changes that we'll have to consider in disks with huge capacity and limited bandwidth."
Actual interview has MUCH detail, definitely worth reading."
dupe (Score:5, Informative)
And an old one! (Score:3, Funny)
Re:dupe (Score:2)
We know it a dupe - but we're simulating stream storage. We can't use that old random access thingy to go back. That would be cheating.
Re:dupe (Score:2)
This sounds familar.. (Score:2, Informative)
I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?
Let me just read your mind... (Score:5, Funny)
I must have read an article earlier about this same thing, probably by this same guy. Can anybody confirm that?
Thanks to my well-developed powers of telepathy, I can tell you that you have read a previous article on the topic by the same author. So I'm happy to confirm that for you.
I can also tell you, thanks to my equally well-honed powers of clairvoyance, that this post will soon be modded up as funny.
(Sheesh. And I thought that some recent "Ask Slashdot" questions were dumb.)
I can help you with that too... (Score:2)
Uhhh, I can't use my telepathy to tell you where they are because you don't know yourself. And if you don't know where you left them then how am I supposed to read that info from your mind?
However, I can use my clairvoyance to tell you where they will be. They'll be in the last place that you look.
Glad to be of assistance.
Solid state is the way to go. (Score:5, Interesting)
I think we'd all be better off when solid state, non-mechanical disks become commonplace.
Is there any reason other than cost why we can't have 100Gb solid-state drives yet?
Re:Solid state is the way to go. (Score:3, Funny)
HDD = a buck a gig, solid-state = 100 bucks a gig.
Though supposedly magical MRAM will come along and revolutionize the world. OLED screens too. And oh yeah, Duke Nuk'Em Forever.
Re:Solid state is the way to go. (Score:2)
I think someday cost will be less of an issue than convenience. Think of the state of monitors today: LCD sales are going well, and while they haven't replaced CRTs yet, they're on their way. Apple no longer sells CRTs at all. This is despite the fact that CRTs are cheaper for the same size screen, because LCDs have a significant edge in size, weight and power consumption.
Flash memormay be around $100/GB right now, but if that drops low enough (say $20/GB), it'll
Re:Solid state is the way to go. (Score:2)
There's no way that people will pay $20/GB for primary storage. The cost of a HDD is around $1/GB and dropping fast. It would be exceedingly difficult to convince people to pay a 100% premium (2x the price) for solid state storage. $20/GB would be a 1900% price premium! Smaller size and lower energy consumption are all very nice and good, but $2000 for a 100GB drive seems a little steep
Re:Solid state is the way to go. (Score:2)
Tho' I still don't think that would be enough to justify it.
I'm actually afraid to move to the higher Gig'ed drives, I don't backup enough now and larger drives will just let me put it off even longer.
Having 30 Gig's die would be bad. Having 100 snuffed out could kill me.
Re:Solid state is the way to go. (Score:2)
Sure, they may eventuelly catch up, but the attract
Re:Solid state is the way to go. (Score:2)
Re:Solid state is the way to go. (Score:3, Informative)
So sure, you could replace your current 80Gb disk drive with 80Gb of solid state, but where are you going to store your 50Gb 3D movies in 1000x1000x1000 resolution? They're going to be on disk, and you'll have to deal with the increasing size:bandwidth and size:access-speed ratios. After all, I can buy a smartmedia card with the capacity of my first hard drive for about what I u
Re:Solid state is the way to go. (Score:5, Informative)
I think we'd all be better off when solid state, non-mechanical disks become commonplace.
A company named SolidData [soliddata.com] sells solid state "drives".
next level up is more sequential (always?) (Score:2)
"programmers have to start thinking of the disk as a sequential device rather than a random access device."
"I think we'd all be better off when solid state, non-mechanical disks become commonplace."
Now that you mention it, I don't think so. In reality, there's a lot time spent dissing the latency (and even bandwidth) of DRAM. (any flavor) That's why caching is being elevated to a fine art form. That's why Intel is i
near-infinite storage (Score:2)
I believe that sometime in the future, we'll look back on our spinning disks and chuckle. I think we will eventually get to near-infinite storage, and sequential will be the way to go. There won't be any erasing necessary, you will just write to the next available space, move the pointer to it, and move on.
Re:near-infinite storage (Score:2)
Re:near-infinite storage (Score:2)
30 years ago there weren't any personal computers. :-)
As long as it's physical, it's limited. And as long as it's limited some of us will reach those limits.
Hence, my use of the term "near-infinite", and by that I mean "infinite for most uses". If storage goes to the organic molecular level, you would be talking about near-infinite storage, depending on the size of the media. But if we get to the point where w
Next on Slashdot (Score:4, Funny)
Next week: Cars that can haul less can be more fuel-effiecent!
The week after: Algorithms that use more memory, but are faster to execute!
Huge disks (Score:5, Insightful)
It's still random access: I can choose and access an object, even individual photos, without scanning through large amounts of unwanted data.
Bandwidth... (Score:5, Funny)
The biggest problem I have mailing disks is customs. If you mail a disk to Europe or Asia, you have to pay customs, which about doubles the shipping cost and introduces delays.
Thereby adding a corrolary to the old adage "Never underestimate the bandwidth of a vanload of tapes barrelling down the highway"...
"Never underestimate the bottleneck caused by a far-Eastern customs inspector."
The van metaphor (Score:2, Funny)
It's "A station wagon full of..." (Score:2)
That's because the original is "station wagon" (or "stationwagon"). Another common variant is "a 747 full of...". See e.g. this story [bpfh.net]
And no, it's certainly not Tannenbaum 1996; it was (IIRC) mentioned in Bentley's "Programming Pearls" CACM column/book in the 1980s. It's unclear that anything original can be attributed t
Re:It's "A station wagon full of..." (Score:3, Interesting)
The highway in question (as in station wago
Re:It's "A station wagon full of..." (Score:2)
I don't believe this is true, having both lived/programmed through that era and also having read all the various histories...but at best you're speculating that the existence of Minix was important historically... I didn't think so at the time, although it was somewhat interesting. People might say the same thing about Xinu, Bill Jolitz' Unix efforts, Cromemco Unix, Xenix, etc etc etc.
I saw and see nothing critically important about Minix, even th
Very much a pioneer, even IF he works for MS (Score:4, Informative)
But he's a Microsofty! (Score:2)
And here he is singing the praises of open source software, MySQL, Linux, Posgresql, Oracle, IBM etc! He'll most likely be getting a visit from Balmer in person I think. Obviously the brainwashing didn't work on this guy.
Network speed (Score:4, Interesting)
they are part of Internet 2, Virtual Business Networks (VBNs), and the Next Generation Internet (NGI). Even so, it takes them a long time to copy a gigabyte. Copy a terabyte? It takes them a very, very long time across the networks they have
Is this really true? Wasn't there a recent Slashdot story where researchers transfered a gigabyte of data, in fourteen seconds or so, on Internet 2 from California to the Netherlands?
I suppose that disk access times will be limiting factor in both ends if you were to read and write the data from/to a disk.
Tweaked (Score:5, Informative)
Re:Network speed (Score:4, Informative)
Couldn't find the article with the Slashdot search, but Google produced it. Here it is. [slashdot.org]
The real numbers were 8,609 Mbps, which translates roughly into a DVD transfered every five seconds. Btw., it was Switzerland, not the Netherlands.
Also, I don't understand the part where he mentions bandwidth costs of $1 per gigabyte. Maybe you have to pay that much on the Internet 2, but my DSL costs is somewhere in the region of $0.05 per gigabyte, i figure. Maybe I'm just spoilt.
I2 != Research (Score:2)
Re:Network speed (Score:2)
Now certainly, the broadband companies don't expect you to be downloading at maximum speed constantly, and if you were you'd be in the top
Re:Network speed (Score:2)
You can easily buy bandwidth in the sub $1 pr gigabyte when buying bandwidth per gigabyte transferred to a colo. If you need to lease physical lines to your office, the cost may end up a bit higher. $41 per GB sounds like someone has been smoking crack.
The reason there's a gap for your DSL, though, is exactly as you mention - that most users only utilize a fraction, so you're only paying for the average amount transferred per user plus some contingency.
Re:Network speed (Score:2)
Re:Network speed (Score:2)
They basically got alpha versions of a 10Gig ether cards from intel. Got the ISP company go give them a light path directly between two points. I think they got one of the underwater cables entirely to themselves for a few hours or something like that. The cost for that was what makes it so expensive.
Yes, they did tweek the heck out of that software (by the way, they did use linux - don't remember which distro).
Ouch... (Score:5, Insightful)
Re:Ouch... (Score:2)
Or:
He seems to be suggesting that rather than try to make access quicker, we should stop making hard drives bigger. ?!?!
pr0n (Score:5, Funny)
That's a good excuse to use on my wife: "No honey, those are my
ACM Turing Award Winner (Score:5, Funny)
Re:ACM Turing Award Winner (Score:4, Funny)
My wife likes to tell people that her first job, back in the late 70's, was with a Civil Engineering firm in New York, where her job title was "Computer". She did the calculations (and error checking
They've since changed the job title.
Funny how quickly such terminology can change.
Re:ACM Turing Award Winner (Score:2)
Nobody has a job as a computer anymore... they've all been replaced by computers.
2 quotes... (Score:3, Interesting)
Gray, head of Microsoft's Bay Area Research Center, sits down with Queue and tells us (...)
JG: If it is business as usual, then a petabyte store needs 1,000 storage admins. Our chore is to figure out how to waste storage space to save administration.
MS bashers will have a field day on this one...
Re:2 quotes... (Score:2)
Tell that to the high energy physics community; they use petabyte size stores as local caches.
Troll in the article (Score:5, Insightful)
Apart from speculating as to whether this attempt at FUD was the real payload of the article, did it really say anything that most of us haven't already noticed? Whether Flash or fast SCSI, we could do with an intermediate layer of backing store, with faster random access than current IDE HDDs. And we are fast heading for removable IDE drives to be a better and cheaper tape replacement. And the Internet has limited bandwidth. I'm sorry, but you don't need a Turing prize to work any of that out.
Defending Jim Gray (Score:4, Insightful)
Chrisd
Re:Troll in the article (Score:3, Interesting)
Re:Troll in the article (Score:2)
As for FUD about MySQL, I don't see it in the article. MySQL is lacking some features that keep it from competing in the same spaces as Oracle, but that's a decision on the part of the MySQL team
Re:Troll in the article (Score:2)
His point was simply that DB2 and Oracle, and, to a lesser extent, SQL Server, are mature database products, and MySQL is just a baby. "Real" databases are optimized like crazy for each OS version/CPU combination. MySQL isn't there y
Re:Troll in the article (Score:3, Insightful)
> Look at all the stuff about MySQL and Linux in the middle. It's as if a Microsoft Marketoid had suddenly taken over the interview. Or someone who didn't understand the difference between many thousands of developers working on Linux and the smaller number that work on MySQL.
He's correct as far as he goes.
MySQL and MS SQL Server actually have the same problem, and it is called SQL; both even go downhill from there.
SQL is simply too complex to implement properly, and it only gets worse when you sta
Re:Troll in the article (Score:2)
LSFS (Score:4, Informative)
http://citeseer.nj.nec.com/rosenblum91design.ht
Re:LSFS (Score:2)
Really good idea. The canonical criticisms, as described by OS teachers I had (hint: one of them WAS Mendel...):
TAOCP (Score:2)
So I guess the disk algorithms from Knuth's TAOCP are still useful after all those years?
The hierarchical object file system (Score:4, Insightful)
One final thing that is even more speculative is what my co-workers at Microsoft are doing. They are replacing the file system with an object store, and using schematized storage to organize information. Gordon Bell calls that project MyLifeBits. It is speculative--a shot at implementing Vannevar Bush's memex [http://www.theatlantic.com/unbound/flashbks/compu ter/bushf.htm]. If they pull it off, it will be a revolution in the way we use storage
I've talked about it before [slashdot.org]. This guy thinks what Microsoft is doing is revolutionary. Come on all you people, can't you see the problem with today's file systems ? the problem is that the type information is lost!!! we need objects, and we need type information to be stored along those objects!!! This is the only way lots of problems will go away.
Re:The hierarchical object file system (Score:2)
MRAM saves the day (Score:4, Interesting)
This doesn't just affect file storage and virtual memory. It also changes the economics of cache and main memory, and makes deployment of 64-bit CPUs more urgent. It also makes system crashes much less tolerable, because turning the computer off and on doesn't involve long shutdown and boot procedures any more.
Re:MRAM saves the day (Score:2, Funny)
Yup. And Duke-Nukem Forever will eat Half-Life 2s panties.
Fuzzy numbers, or can this be right? (Score:4, Funny)
So, one could rent a $20K device for $240/year? Those must have been the days...
That can't be right.
Interesting Idea... (Score:3, Interesting)
there is a current trend towards cramming as much storage into something the size of a 3in Hard drive.
I wonder why they dont make larger harddrives in the physical sense? A hard drive the size of a washing machine using todays technology would store a phenomenal amount of stuff, but whatabout something more reasonable like a hard drive merely twice the physical size of todays. how much more storage could you get just by scaling up the platters? anyone here g
Re:Interesting Idea... (Score:2)
Re:Interesting Idea... (Score:2)
But 5.25 HDs, single height, could hold (judging by platter area) over twenty times the data. Still easy to swap, but a nicer size.
They'd be slow, but I connect to my fileserver over 100mbps, even with 10 people using it I rarely see any delay.
Three letters: F, U, and D (Score:4, Insightful)
My buddies are being killed by supporting all the Linux variants. It is hard to build a product on top of Linux because every other user compiles his own kernel and there are many different species.
Ain't it sweet? I count five lies:
(1) people being killed by supporting (gasp) operating systems... gosh, horror and violence, not nice at all!
(2) all the Linux "variants", are in fact pretty much one standard, LSB, with several skins
(3) "hard to build a product on top of Linux", rather than, hmmm, Windows? Linux is incredibly easy to build for. I suspect the fact that it's very standard helps.
(4) "every other user compiles his kernel"... maybe at Microsoft. I suspect less than 1 in 20 Linux users ever compiled a kernel.
(5) compiling a kernel means you can't support it... WTF? The kernel is incredibly stable, since most changes are in external modules. And I can't remember a single case where a kernel change broke one of my apps.
(6) (sorry, I was not counting well), "many different species"... well, AFAICS the only difference between the Linux distributions is that they have different packaging methods, different timelines as to their versions, and different UI tools for hardware detection, configuration, etc. Nothing at all that makes life hard.
Look: I just installed Xandros, which is Debian with a nice face. On two different types of machine, and it installed without asking a single question about my hardware except whether the mouse was left or right-handed. Check my journal...
Windows never worked this nicely. Where is the support issue?
In the writing indistry we call this "to condemn with faint praise".
Yeah, Windows kinda works, I mean, it'll run Office without crashing too often, but it's just killing by buddies to have to maintain Win2K, WinXP, and even some older Win98 machines, not to mention we have a whole cupboard simply filled with driver CDs for every PC we have.
Re:Three letters: F, U, and D (Score:3, Insightful)
FUD: The sound made by someone attempting to wish away inconvenient facts.
http://www.eod.com/devil/archive/fud.html
He is right, but nothing to do with the kernel (Score:5, Insightful)
Three answers and an observation (Score:2)
Second, it is trivial and cheap to build packages for RedHat, Debian, and SuSE as you need them, we do this automatically. See, when the OS is free, it costs you little to set-up development systems. If you're tight for hardware, use UML.
Third, there are serious arguments against delivering binary-only packages, and in favour of building from source, and these arguments ar
Re:Pay attention. (Score:2)
It takes a few hours to entirely automate the build process for a product under Linux. You have this CVS thing at one end, a bunch of Linux distros at the other, you press a button, and ten minutes later you get a bunch of neat binary packages back.
So painful it hurts.
Ah, insult me again, I'm not doing anything special today.
Re:He is right, but nothing to do with the kernel (Score:2)
Re:He is right, but nothing to do with the kernel (Score:2)
Re:Three letters: F, U, and D (Score:2)
MySQL vs. Oracle (Score:2)
I just love your sense of humour. I remember when we switched an ISAM application to Oracle in the mid 1990's, on a Unix box. A single record access by primary key was 20,000 times faster with the ISAM system than under Oracle.
I retested this with later versions of Oracle and found that the performance was worse, not better.
Now, I have a nice server under a desk here, and we reloaded an Oracle 9 database on it, it took something like 8 hours to rebuild. Since we make port
Re:MySQL vs. Oracle (Score:2)
Because, of course, performance is the sole indicator of a product's worth. Using that reasoning, everyone should be driving an Indy car instead of a station wagon.
Sure, the Volvo ($enterprise_DBMS) may not be quickest off the line as a Ferrari (MySQL) but I'd rather be in the Volvo when something goes wrong.
Performance (Score:2)
Let me list my criteria for, e.g. a database product:
1. accuracy
2. performance
3. ease of administration
4. ease of installation
5. price
Not in any specific order. I've used Oracle databases for about 12 years, and on every single one of these counts, MySQL wins. Every single one, without exception.
Oracle wins on a number of other criteria:
1. profitability
2. complexity
3. need for expensive DBAs
4. consumption of excess time
5. image
6
Re:Performance (Score:2)
You have not used Oracle's database (unless you write low-level driver calls to the actual data itself) - you use Oracle's DBMS product.
In any rate, I find it difficult that you can say that MySQL is more 'accurate' than Oracle (and by extension PostgreSQL, or MS SQL Server, or Sybase ASE or
The constraint handling is poor at best (you can only have very minimal constraints). You have no such thing a triggers or views. The datatype
Re:Three letters: F, U, and D (Score:2)
3 Terrabytes on a credit card? (Score:4, Interesting)
invented a way of cramming 3 Terrabytes on a credit card. Apparently it would have cost about 35 pounds to manufacture. this was a couple of years ago, why hasnt it happened yet?
Surely something like this is the real future of storage ?
Terrabyte on a credit card [cmruk.com]
Re:3 Terrabytes on a credit card? (Score:3, Insightful)
Sneaker net? (Score:3, Funny)
"Sneaker net" was when you used your sneakers to transport data?
Oh my. How old I feel when someone has to ask what "sneaker net" was. And someone has to answer...
AMAZING!!! (Score:5, Funny)
Can we download a copy of this "Jim Gray" yet?
Re:AMAZING!!! (Score:3, Funny)
No, too big to transfer over the Internet at this point. You'll have to use UPS.
sequential vs direct-access (Score:2)
This is partially already true for classic UNIX userspace behavior. You pipe the data from the input file(s) trough a filter and generate the output, sequentially.
A completely different model from the FS drivers or a SQL database.
New File System (Score:3, Interesting)
Imagine, media files stored in such a way that both random and sequential access was optimized, where the file structure was automagically defragmented and organized behind the scenes.
Imagine a computer that watched what files were used at bootup, and organized them so that the hard drive streamed the bootup data sequentially, straight into memory.
Imagine being able to start PRELOADING applications before you even finish the second of your double clicks on the datafile.
Imagine Database files that were automagically indexed as part of the file system.
Imagine Security and encryption being built into the filesystem beyond today's capabilities, where the security and encryption does not rely upon a master controller or centralized security policies, but rather has the ability to follow the file, seemlessly.
I am sure that I haven't even begun to tap the possibilities.
Re:New File System (Score:3, Interesting)
For Meta Data to work, there has to be some sort of STANDARDS based way of describing said data.
For instance, a table. How would you describe a table? Is it Tab delimited text, Spreadsheet or a HTML based Table? Does it reference cells and or other tables? Are those available? Is the data from missing tables, available as a static value?
Is the data within the
Re:New File System (Score:2)
The relational model also allows for definite relationships between the tuples (and indeed the different element
IDE replaces DVD (Score:5, Interesting)
I currently have about 100 GB of images and it takes more than 20 4.7 GB DVD-R discs to create a full backup. Although DVD media is still slightly cheaper than new large capacity IDE drives, the added time and hassle factor of burning 20 disks far out weighs any minor costs savings. Moreover a 3.5" drive in a padded anti-static bag takes up less room in the safe deposit box than 20 DVDs (especially if you have the DVDs in protective jewel cases). And if HD-based-backup lets me avoid some future artists tax on burnable media, so much the better.
A Firewire enclosure and a rotating collection of IDE drives is the way to go.
Re:IDE replaces DVD (Score:2)
CD and DVD media is great as a replacement for the floppy disk. But harddrives have been the only affordable transportable mass-storage media for years. DVD has never been an option for me, but I suspect it will replace CDRW media within a couple years.
$1/GB or less since 2001.
Turing Award? (Score:2)
Re:[OT] Your sig (Score:2)
Re:[OT] Your sig (Score:2)
Who knows... dictionary.com doesn't list either virii or viri as a valid plural of virus (and I assumed you were commenting on virii vs. virus; apologies if that's not the case). I guess "viri" looks too short, or not imposing enough, or something.
Hell, it's all in fun anyway. I hope so, at least.
Re:[OT] Your sig (Score:2)
However, when you apply the sample plural construction to "virus" as you do to other words like "octopus" or "cactus" or "radius", one would think the result would be "viri", not "virii". I don't know where the extra 'i' comes from, but seeing as how both constructions are not standard, I suppose it
The Ol' Roadmaster Scenario (Score:2, Informative)
Missing the logical boat (Score:3, Interesting)
> To some extent you can think of Codd's relational algebra as an algebra of punched cards. Every card is a record. Every machine is an operator.
Interesting how the guy literally wrote the book on transactions, yet grossly misrepresents Codd's work, which BTW wasn't simply the relational algebra, but even higher level: the relational model of database management, including the relational calculus.
While the algebra is somewhat procedural, the calculus is set-oriented, and they are fully equivalent. The idea is exactly not looking at records and operators, but describe what you want -- just leave the relational system set the procedures to get that in the most efficient way it can.
Incidentally this has a big impact on all Gray is discussing -- without a fairly simple and powerful data model, so much data is basically a waste. He's thinking too low level, including the object stuff he touts, but we will only find use for so much data the day we get proper relational implementations, and this excludes SQL in general and MySQL in particular.
Re:Missing the logical boat (Score:3, Informative)
You said it yourself:
While the algebra is somewhat procedural, the calculus is set-oriented, and they are fully equivalent.
and, uncoincidentally, the isomorphism extends further to machines that manipulate physical punch cards. You go on to say:
The idea is exactly not looking at records and operators, but describe what you want -- just leave the relational system set the procedures to get that in the most efficient way it can.
Right. And what Gray ha
Hey, does this sound familiar?.... (Score:2)
Or what happens when the concept of taking complexity (made up of simpler things) and automating it such that it is easy to use and reuse, by the user, is applied here?
The exception that proves the rule (Score:2)
Actually I got about halfway through and decided to skip the rest of it.
Re:I tried to use a tape drive this way :-) (Score:2)
Re:I tried to use a tape drive this way :-) (Score:2)
I was really wondering why my !"!" machine was suddenly to totally slow. Fortunately they fixed this in some later SP.
Re:Wait (Score:2, Interesting)
Now, when software opens a file, it gets a handle to the storage and seeks all over it to get the data it needs and finally write it back. This is particularly true of files that consist of many records. Some software mmaps (memory maps) the file, mapping it into the memory address space and making it appear as a large, s
Re:Wait (Score:2)