Researchers Create Database-Hadoop Hybrid 122
ericatcw writes "'NoSQL' alternatives such as Hadoop and MapReduce may be uber-cheap and scalable, but they remain slower and clumsier to use than relational databases, say some. Now, researchers at Yale University have created a database-Hadoop hybrid that they say offers the best of both worlds: fast performance and the ability to scale out near-indefinitely. HadoopDB was built using PostGreSQL, though MySQL has also successfully been swapped in, according to Yale computer science professor Daniel Abadi, whose students built this prototype."
Please stop (Score:3, Interesting)
Uber-cheap is not a word, and it doesn't even make sense because you're saying it's "above cheap". Stop making up stupid shit.
Re: (Score:2)
Re:Please stop (Score:5, Insightful)
It's called a prefix. We use them in the English language. This one has recently been adopted into our language. Pick up the pace or shut up about things you don't know.
Re: (Score:3, Funny)
I am a metaphor for war.
You are for metaphor war.
Re: (Score:3, Informative)
Considering that "Ubermensch" was translatable to "Superman" then "Ubercheap" would be "Supercheap"
No, it wouldn't. It would be word soup that any German would find to be awkward. To say something is "super cheap" they would say something like "superpreiswertes" which would literally translate as "super inexpensive". They wouldn't use über in such a situation.
Re: (Score:2)
Word soup? Have you *HEARD* German lately? Most of their speech is made up of huge conglomerations of words and prefixes and suffixes.
For example, the word for CPR in german? Herzkreislaufwiederbelebung (heart-circle-run-again-enlivenment).
Re: (Score:2)
Of sure, German is an extremely over verbose language at times, but the fact of the matter is that CorporateSuit, despite all his blusterings, is about as clueless in German as he tries to claim others are.
Re: (Score:2)
the fact of the matter is that CorporateSuit, despite all his blusterings, is about as clueless in German as he tries to claim others are.
I suppose that all comes from living in Germany for several years, speaking nothing but German with around 1,000 people a week, face to face. I suppose anyone who had gone through such rigors would end up being "clueless" in German as well. All sarcasm aside, perhaps you are more right than you think. Some Germans don't consider Koelsch or Hessisch (the dialects I ended up speaking) to be real German at all (Although they are more understandable than Bayerisch or Frankfurterisch - which is like Hessisch o
Re: (Score:3, Insightful)
Compounds parse easier with correct parantheses: (herz)(kreislauf)(wiederbelebung) or (heart)(circle-run)(re-activation), where each of the bracketed words is itself a common compound. FWIW, Cardiopulmonary resuscitation has more characters than the German term. German and English aren't very different, in fact, in terms of compounds; English also has a huge number of compound words, even though they are often not spelled as a single word: circuit breaker, for instance. As English compounds get increasingly
Re: (Score:1)
Re: (Score:3, Insightful)
We've been co-opting other language's words into English for a long, long time now. To a growing number US citizens prefixing anything with "uber" is the same as saying "ultra" or "super". You know the saying "it's all over except for the shouting"? Yeah, that's pretty much where this is.
Feel free to mod this entire thread, including the parent, uber off-topic.
Re: (Score:2)
You're right that ueber would not conventionally be used as a prefix in this situation, but we weren't talking about the German prefix ueber, but about the English prefix uber, which was adopted from German. The fact that you wouldn't say ueberbillig in German doesn't mean that it's improper to use ubercheap. It makes you sound a bit like an ass, but I would argue that it's in line with other conventional uses of the "uber" in English. To put this in a different perspective, English probably uses the Latin
Re: (Score:2)
Uber and Super both mean "above", knucklehead. Same proto-indo-european root, in fact.
Today may just be the day that you learn that a word may have more than one definition. In fact, the word you use "root" refers not just to a word's origin, but it can also refer to a very important part of plants. Do not squander this opportunity. It will open an entire new world of linguistics. I have nothing but hope for the grand future that awaits you and your once-tunneled view of the English language.
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Re:Please stop (Score:5, Funny)
This thread is uber-dumb.
Cartman would say it was "hella-stupid".
Re: (Score:1)
hmmmm...
Re: (Score:2)
Re: (Score:1, Insightful)
Uber-cheap is not a word, and it doesn't even make sense because you're saying it's "above cheap".
You remind me of my English teachers. Every year they kept saying that "ain't" isn't a word because it's not in the dictionary. Then one day I looked in the dictionary and it was there. The lesson I learned was that humans create words and "rules" of language aren't really rules at all. They are merely traditions. I suppose you think the French are just speaking bad Latin? No, languages change. From Old English to Middle English to Modern English it changed. I bet all along the way there was some know-it-al
Re: (Score:2)
The way I see it, the real question should be "does it increase the ambiguity of the language or decrease it's expressive power?". As long as someone understands what is being said (with slang like "ain't" that has been in use long enough so it is widely known) then I don't see a problem with it. We may become, somewhat Balkanized in the short-term, but, hopefully, this will serve to get those conservatives used to living in a pluralistic society and will wear down some of their xenophobia. I see the rea
Re: (Score:2)
Since a high price is above a low price, "above cheap" means "expensive".
Re: (Score:2)
Uber doesn't mean 'above' in popular parlance, its an absolute measure of greatness. Therefore 'uber' cheap would refer to more cheapness.
PostGreSQL (Score:5, Informative)
It's PostgreSQL [postgresql.org]... but I sympathize with the mixed case confusion and refer you to this Postgres vs PostgreSQL permathread [postgresql.org].
Re: (Score:2)
I do try to capitalize PostgreSQL in more formal communication / documentation, however.
If it works as described it will be VERY important (Score:2, Interesting)
If both the performance and scalability is as good as described I can safely say that this is the most important thing of the decade and not only for DBMS.
Handling large portions of data would get cheaper by an order of magnitude at least and scaling out would be way cheaper than now as well. I do hope it's true.
Re: (Score:1, Insightful)
It won't deliver. In the mean time for those of us living and working in the real world, hard-drives will be bigger and faster, file systems will get better, and SSDs will start to shit all over spinning platters.
Re: (Score:1)
It it will deliver it will change much. Not for your average blogger with a $10 hosting, wordpress and all his 100 readers but for all the folks that have sites successful enough to go beyond that a single DB server can deliver. Now you have to work really, really hard to make it all work with replication as pretty much no free CMS offers data sharding. Now you won't have to. Just get a DB cluster (as a service) that works out of the box with none/very little modification to the software you are using. The
Re:If it works as described it will be VERY import (Score:4, Funny)
What about Essbase? (Score:2)
I thought Essbase was supposed to be one of the best databases for managing too much information. Is this supposed to be an alternative, or act as something in-between using Essbase and a mysql server?
Re: (Score:2)
Transaction speed has never been high point of Essbase, nor storing anything but numerical data.
Changes to the data are not reflected immediately except in the lowest level members until it is re-calculated. It is not unusual to find calc scripts that run for 8+ hours.
Typical (Score:1)
Re: (Score:2)
I hate to disagree with you ... but ....
Anyone can come up with ideas is true, HOWEVER not all ideas are GOOD ones. The problem with coming up with GOOD ideas is often people don't have a basic understanding of the problem or the implications of various ways of implementing an idea.
Getting people to do the work is often not quite as easy as it seems. First you have to have qualified people. They have to be motivated to actually complete the work given.
As for degree programs at schools and such, a MS is noth
Re: (Score:1)
I take a look at what I know, verses what I knew graduating college, and I know substantially more, and more practical knowledge, things that no MS piece of paper can show.
Does your extensive post-collegiage learning include constructing a multi-clause sentence in the English language (..and I wouldn't even mention the spelling error) ? Ordinarily I wouldn't be an asshole about this except you screwed up the exact sentence where you're bragging about your amazing skills, acquired over a long professional career. And whatever that career might be, writing a readable sentence in your main language is a basic skill (and I know you're an American from your other posts).
Re: (Score:2)
Can't spell worth shit, doesn't negate my intelligence. I know idiots who spell perfectly. I know very intelligent people who spell well, and idiots who can't spell like me. Spelling is NOT a sign of intelligence nor education level.
And I didn't know I was going to be "graded" on a spur of the moment post to a web log. Had I known you were a lurking grammar nazi, I would have proofread my post more carefully. Perhaps even hiring someone to draft(write) it for me to post as my own.
Knowing my weaknesses (spel
Re: (Score:1)
It's not just your writing that sucks. Going by your response, your reading comprehension is not so hot either. I like your rant (which, your probably did get proofread ironically) but it was about *THE WRONG THING*. I'd leave it at that. Maybe you should spend some time on reflection rather than defending the indefensible.
The SQL language is also an issue (Score:5, Insightful)
Scalability is one thing, but what we appreciate in SQL-free databases is also that they don't require SQL.
When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.
The Tokyo Cabinet API is absolutely excellent in this regard. And there's no need to learn yet another domain-specific language like SQL, just use the language you use for the rest of the app.
Now, SQL-zealots would troll "but how would you do with ?".
And yes, for complex requests as in data mining, SQL and XPath make sense. For people who aren't developpers, SQL makes sense as well. For interoperability with 3rd-party apps, SQL is also useful, just as FAT is still useful today in order to share filesystems between operating systems.
But for the rest of us, SQL is cumbersome. Databases like MongoDB make you achieve similar results in a more natural way instead of forcing you to learn SQL and to rethink everything in a tabular way.
Re: (Score:1)
Is there something terribly dangerous about parameterized queries?
Re: (Score:1)
When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.
"Cheaper" in what sense? You can't mean cost, as I'm not aware of ORMs introducing any cost overhead. So presumably you mean the cost of marshaling the data between SQL and your object model.
So, how, pray tell, do these new magical DBs convert the stored data to/from objects that your software can consume? Do they not marshal at some level? And i
Re: (Score:2)
Marschaling is still required, but you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.
Look at MongoDB: http://www.mongodb.org/ [mongodb.org]
Re: (Score:2)
That's not a disadvantage in many cases, especially for the long term. There's a benefit of forcing people to use SQL to talk to the DB. It becomes a layer of abstraction, somewhat like a standard protocol or interface.
When you use SQL, your database can be used by 100 different people and programs, and when you add co
Re: (Score:2)
Re: (Score:2)
Marschaling is still required, but you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.
Well, that's all very interesting, but it doesn't sound any different than your average OODBMS, something which has been around for a *long* time (I worked with Versant nearly a decade ago doing exactly what you describe). Heck, the Smalltalk world has had intelligent object databases for
Re: (Score:2)
I would argue that all solutions that currently exist for databases are ideal for some specific set of problems AND some specific set of users for each problem within that initial set.
There is no "perfect" solution that will work for all types of data, be it a flatfile structure, a hierarchical structure, a relational structure, object-oriented or some combination of those. (The star-structure of OLAP databases is a hybrid, for example.)
What would be good is if there was a suitable metalanguage in which you
Re: (Score:1)
I would just like to state for the record, that IMHO SQL is a beautiful thing. Its ease of interoperability (both between languages and backends) has saved my butt on numerous occasions (not to mention the ease with which you can go from very simple to very complex depending on the need of your application) ...
...and you can get rid of it and replace it with OOP when you pry it from my cold dead hands.
Re: (Score:2)
Not exactly since Excel macros can perform loops at least.
Re: (Score:2, Insightful)
Yes, I'm an SQL troll, but ... if using SQL to get a row by a unique ID is too hard for you and too insecure, there is no amount of code which is going to fix your problem, which is that you are a shitty developer who is far too lazy to make a function or macro to wrap around the simple sql request.
There are PLENTY of reasons to not like SQL, but you
Re: (Score:1)
What's so hard about building a wrapper function for the times you want simple:
And you'd still have SQL for the times you want fancier stuff.
Re: (Score:2)
"id" is MP3 data. How do I find the title of the song? Will your 3 liners do the trick?
In a document-oriented database id wouldn't be an issue. No need for any wrapper.
Re: (Score:1)
My example is based on the original example given. But this illustrates part of my point: when you need features you didn't originally anticipate (or ask for), you have the power of SQL to help out.
Re:The SQL language is also an issue (Score:5, Insightful)
"a record, calling get(id)"
So you're relating "id" to "a record." I assume that the record in question is a blob of potentially binary data that your program parses however it wants. So you want to relate unique identifiers to blobs. You can do that quite easily with SQL. Looking up a given unique identifier quickly is something your average relational database is very good at. And writing the wrapper function to implement your hypothetical get() function is trivial in most languages. I'm completely at a loss for what your SQL-free database is offering me in this case. It's saving you from the horror of writing 10 lines of code, once, to implement get(in)? 60 minutes with a good SQL tutorial will teach you everything you need to know. Sure, there is a lot more you can learn, but for the simple case you're describing you can understand SQL at only the most simple level.
Or are you handwaving the "a record" is actually automatically squeezed into one or more variables or objects in your code? You say get("ChaosDiscord") and out pops the UserObject populated with the relevant information. Of course, at this point you need to start teaching you database, or at least your database wrapper, how your objects are structured, and how to serialize them. This is admittedly a bit of a nuisance, but an SQL-free database doesn't magically make the problem go away. Sure, an SQL-free database can provide a layer to simplify or automate it, but so can a layer on an SQL database (Ruby on Rails is perhaps the best known). Sure, you'll need to tell it that username is a string, userid is an integer, and so on, but you only have to say it once in SQL instead of in your program. The total work hasn't gone up.
Ultimately, you appear to be complaining that SQL is too powerful (and thus complex) for your needs. But you can easily learn and use a subset of SQL that corresponds to what you claim you're looking for in an SQL-free database! You might as well complain that Java is too powerful it has thousands of classes you don't need. The time to learn the relatively minor amount of SQL you need is insignificant compared to the time to develop any non-trivial application. If even that hour is too much, you can outsource the work to a geeky college student for some pizza and soda.
There are some compelling reasons to look at SQL-free databases, but "SQL is too powerful" isn't one.
Re: (Score:2)
Some people are able to grasp new concepts and others cant???
I'll get off your lawn now.
Re: (Score:1)
Some people are able to grasp new concepts and others cant???
I'll get off your lawn now.
And the later types are the ones that write shitty code on a regular basis? /sarcasm-gone-amock :-/
Re: (Score:2)
Didn't realize Joe Celko was a Slashdotter.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Judging from your response to ahabswhale, it seems that you've pretty much made up your mind that everyone else is wrong and you're right, but I'll take a stab anyway at why I think you're missing the point:
Re: (Score:2)
The grandparent ultimately asserted that "for the rest of us, SQL is cumbersome," calling out that an "SQL-free" database is "easier", "more secure," and "cheaper."
If we're talking about essentially key-value stores, SQL can do it well.
It's harder, but we're talking about an hour or so of work, an hour's worth of value you can reuse for future projects. For all of the complaints about needing to worry about tables with lots of fields and serialization, it's moot if you just want a key store. All that
Re: (Score:3, Insightful)
I understand putting another API above SQL to make it simpler to use, but avoiding using SQL because its powerful makes no sense.
Re: (Score:2)
Not to be a troll, but this sure sounds a lot like IMS. Write a program to analyze the data.
some mainframers would be laughing their asses off.
Re: (Score:2)
And there's no need to learn yet another domain-specific language like SQL,
SQL, "domain specific"? Wow. I am taken aback. Over 30 years of coding, I think SQL is singlehandedly the most productive addition to the development environment I can think of since the compiler. There are a lot of reasons that using a SQL database might not make sense (small platform, single user, low cost, small required footprint, etc) but domain specificity isn't on my list. I can't think of a less domain specific development
Re: (Score:2)
> WTF! I think that ranks as one of the stupidest statements I have ever read on slashdot!
Tons of people aren't exactly writing PHP websites, but are still able to install vbulletin, phpbb, phpnuke, joomla, wordpress on mutualised hosting. And then they fire phpmyadmin in order to remove bogus users, to count the number of posts or visits, etc. SQL perfectly makes sense for this.
Re: (Score:1)
I can not agree more. I started working with RDBMS on dBaseII in the early 80's. ...
Hate to nitpick, but if you are talking of the IBM product, I think you better say "DB2".
dBaseII was an Ashton-Tate product, originally conceived for CP/M. The dBase line introduced RELATIONAL functionalities only with DBase IV (and they allegedly never really worked).
Re: (Score:2)
Talked about "records" and "id" because people familiar with SQL might not be familiar with other kind of databases, but you took it the wrong way.
Now, ask Google. How many critical vulnerabilities were due to SQL injections? How many similar vulnerabilities were found in SQL-free databases?
I agree with you that workarounds do exist, and that developpers are to blame instead of SQL, but in the real world, SQL is how a lot of services are compromised by kiddies.
Why do we need to invent wrappers?
Why do tools
Re: (Score:2)
And there's nothing wrong with SQL. There's a lot wrong with people who think SQL will solve every single problem under the sun. Unfortunately, those people seem to be employed writing 3rd-party abstraction layers and ORMs.
Re: (Score:2)
It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".
No, it's about having less issues by using modern tools, rather than trying to find who's to blame.
If HTML/JS/CSS/HTTP could be redesigned today, do you think that the way a browser manage cookies, XHR requests and sandboxing in general would be the same as it is today? Do you think that the SMTP protocol that was good enough 30 years ago is not a big pile of crap nowadays, even, just like ORMs, their content is now
Re: (Score:2)
It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".
No, it's about having less issues by using modern tools, rather than trying to find who's to blame.
If HTML/JS/CSS/HTTP could be redesigned today, do you think that the way a browser manage cookies, XHR requests and sandboxing in general would be the same as it is today? Do you think that the SMTP protocol that was good enough 30 years ago is not a big pile of crap nowadays, even, just like ORMs, their content is now shown in webmails? SQL is just like SMTP. Or the FAT filesystem. An old thing. There are worthy proposals and even working products that could superscede them, but because of legacy applications and people who want to stick with the same technologies till the end of the universe, these old things remain. They just get bloated with new extensions instead, in order to keep up with mandatory requirements.
Of course, if you were designing something today to do the job of 'relational database', you'd probably get something different from SQL. That doesn't change the fact that today, the SQL / RDBMS combo is the best tool for solving a lot of problems. That doesn't mean that people won't try and use it improperly, but those people are idiots. People don't stick with SQL because it's old, they stick with SQL because it gets the job done, and in the hands of someone who knows what they're doing, it gets the job d
Re: (Score:2, Insightful)
The answers to those questions will say a whole lot about why PHP sucks, but very little about SQL.
in particular:
Why does a stock PHP have 5 different APIs just to issue basic MySQL queries?
Because the PHP developers have re-invented the wheel five times and still haven't figured out it's not supposed to have sharp corners. Nothing to do with SQL. Perl's DBI is a good example of a database abstraction layer done right.
Unobligatory Monty Python Reference (Score:2)
But for the rest of us....
Sorry, but could not help thinking but to this line from "Life of Brian":
But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system, a
Re: (Score:1)
Re: (Score:2)
Actually it's quite the opposite. For any complex task, writing a script for MongoDB, CouchDB or TC/TT is way easier and faster than an unbearable 100-lines SQL statement, that even you are unable to understand the day after. Plus it's able to get things that just can't be written as an SQL query.
And your "we'll have to run it over the weekend because it'll kill the server" is also why when you need to extract stuff out of a large dataset, you write a script to process data in chunks, not a single SQL state
Re: (Score:1)
Actually it's quite the opposite. For any complex task, writing a script for MongoDB, CouchDB or TC/TT is way easier and faster than an unbearable 100-lines SQL statement, that even you are unable to understand the day after. Plus it's able to get things that just can't be written as an SQL query.
Can you provide concrete proof of this? "For any complex task" pretty much uses universal quantification over all problems dealing with data representation. Considering that some (not all) data (with its associated meaning and function) is best represented using relations, while others are better represented using network models (think cyclical graphs), and others are best represented as simple key-value mappings, I would find it hard to believe that truly and verily any complex data manipulation task is best represented writing a MongoDB/CouchDB/TC/TT script.
Now, if you decide to reply by saying "well, not all, but most", please provide proof, or at least some logical demonstration that this is indeed true. For, if it is, holy crap, you need to write a dissertation on this.
And your "we'll have to run it over the weekend because it'll kill the server" is also why when you need to extract stuff out of a large dataset, you write a script to process data in chunks, not a single SQL statement. If SQL is so wonderful and the answer to everything, why do stored procedures exist?
Re: (Score:1)
Actually it's quite the opposite. For any complex task, writing a script for MongoDB, CouchDB or TC/TT is way easier and faster than an unbearable 100-lines SQL statement, that even you are unable to understand the day after. Plus it's able to get things that just can't be written as an SQL query.
Can you provide concrete proof of this? "For any complex task" pretty much uses universal quantification over all problems dealing with data representation. Considering that some (not all) data (with its associated meaning and function) is best represented using relations, while others are better represented using network models (think cyclical graphs), and others are best represented as simple key-value mappings, I would find it hard to believe that truly and
Re: (Score:1)
Looks like you never dealt with denormalized and sharded data.
Denormalized data (except in certain cases) is usually a sign of bad design, not an intrinsic RDBMs attribute.
As for sharded data (assuming that it's properly normalized, otherwise see previous paragraph), and assuming that it's properly sharded among functionally-sound partitions, what's the trouble in implementing the hypothetical request? Badly partitioned data is just as denormalized data; signs of someone who didn't know what he/she was doing.
One could also imagine a hypothetical scenario where dat
MySQL? (Score:4, Funny)
Re:MySQL? (Score:4, Informative)
MySQL has its fan boys from circa 1994-2001. During this period, the MySQL license was much more permissive, and gained a certain momentum from PHP that carries it through to this day. At the same time, PostgreSQL was still using Cygwin on Windows, the INSTALL had a table of contents, and was lacking performance enhancements (particularly on Windows). Eventually Cygwin was dropped and the threading was happy on windows, and the performance enhancements were good. Along with this came a much shorter INSTALL file and all reason to use MySQL had disappeared. But once you know something, people like to keep on using it. Then MySQL got things like triggers, foreign key constraints and full ACID compliance. So in the end it ended being a wash. However, and not to start a flame war, it seems that PostgreSQL, having been feature-complete (ACID, foreign keys, etc) maintained a performance edge. But also to this day MySQL has a very fast table implementation, provided you don't need things like ACID compliance. For a variety of applications, this is "good enough" and the trade-offs of feature completeness vs performance are worth it. Disclaimer: I have used both extensively in the past. I prefer PostgreSQL, but now use neither. Now I only do SQLite (embedded tables) or Oracle (for hot replication).
Bits of software are tools.. (Score:2)
So why is it that people feel the need to rally around or defend them? After all, only the developers who have done the work are capable of understanding the snips and criticism leveled against them, and these are the
Re: (Score:2)
I've used both pretty extensively in a wide variety of environments, and I don't take such a balanced view at all. IMHO, the best answer to most database-related problems is to use PostgreSQL or SQLite. MySQL sits somewhere between them in terms of reliability, scalability, ACIDity, etc, and kinda fails at being good at anything in particular. For that matter, even if you *like* where MySQL lies on those tradeoffs, compared to either of the other two mentioned products (especially Pg), the quality of the
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Not sure what you were talking about, but hadoop and postgres are open source. Unless they're stupid, they wouldn't make the resulting product closed source.
I'm not going to make the whole free software pitch here, but lets just say I believe in the superiority of the development process and the end product through my experiences developing and using software.
I have no confidence in Intersystems Cache's long term survival.
Yaaaaay (Score:2)
(In my best Special Ed impersonation)
Yaaaaaay, now we can scale out Hadoop! Yaaaaay! Yaaaay Hadoop! Yaaaaay!
Other approaches to scalable SQL (Score:2)
There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:
http://hadoop.apache.org/pig/ [apache.org]
http://hadoop.apache.org/hive/ [apache.org]
Don't you read Information Week? (Score:1)