Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Databases Software Programming IT Technology

Researchers Create Database-Hadoop Hybrid 122

ericatcw writes "'NoSQL' alternatives such as Hadoop and MapReduce may be uber-cheap and scalable, but they remain slower and clumsier to use than relational databases, say some. Now, researchers at Yale University have created a database-Hadoop hybrid that they say offers the best of both worlds: fast performance and the ability to scale out near-indefinitely. HadoopDB was built using PostGreSQL, though MySQL has also successfully been swapped in, according to Yale computer science professor Daniel Abadi, whose students built this prototype."
This discussion has been archived. No new comments can be posted.

Researchers Create Database-Hadoop Hybrid

Comments Filter:
  • Please stop (Score:3, Interesting)

    by Anonymous Coward on Tuesday July 21, 2009 @03:06PM (#28773339)

    Uber-cheap is not a word, and it doesn't even make sense because you're saying it's "above cheap". Stop making up stupid shit.

    • German prepositions do not have direct english equivalents. I suppose being an "Ubermensch" would be talking about the HATS that people wear, since that's what's Over the Mensch (person). Stop getting your panties in a twist over things you're wrong about.
    • by spatley ( 191233 )
      how about Ultra-Cheap
      hmmmm...
    • Re: (Score:1, Insightful)

      by Anonymous Coward

      Uber-cheap is not a word, and it doesn't even make sense because you're saying it's "above cheap".

      You remind me of my English teachers. Every year they kept saying that "ain't" isn't a word because it's not in the dictionary. Then one day I looked in the dictionary and it was there. The lesson I learned was that humans create words and "rules" of language aren't really rules at all. They are merely traditions. I suppose you think the French are just speaking bad Latin? No, languages change. From Old English to Middle English to Modern English it changed. I bet all along the way there was some know-it-al

    • by jd ( 1658 )

      Since a high price is above a low price, "above cheap" means "expensive".

      • Uber doesn't mean 'above' in popular parlance, its an absolute measure of greatness. Therefore 'uber' cheap would refer to more cheapness.

  • PostGreSQL (Score:5, Informative)

    by tcopeland ( 32225 ) <tom AT thomasleecopeland DOT com> on Tuesday July 21, 2009 @03:08PM (#28773349) Homepage

    It's PostgreSQL [postgresql.org]... but I sympathize with the mixed case confusion and refer you to this Postgres vs PostgreSQL permathread [postgresql.org].

    • And... a lot of times in written communication, I simply say "pgsql".

      I do try to capitalize PostgreSQL in more formal communication / documentation, however.
  • If both the performance and scalability is as good as described I can safely say that this is the most important thing of the decade and not only for DBMS.
    Handling large portions of data would get cheaper by an order of magnitude at least and scaling out would be way cheaper than now as well. I do hope it's true.

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      It won't deliver. In the mean time for those of us living and working in the real world, hard-drives will be bigger and faster, file systems will get better, and SSDs will start to shit all over spinning platters.

      • It it will deliver it will change much. Not for your average blogger with a $10 hosting, wordpress and all his 100 readers but for all the folks that have sites successful enough to go beyond that a single DB server can deliver. Now you have to work really, really hard to make it all work with replication as pretty much no free CMS offers data sharding. Now you won't have to. Just get a DB cluster (as a service) that works out of the box with none/very little modification to the software you are using. The

      • by gandhi_2 ( 1108023 ) on Tuesday July 21, 2009 @03:47PM (#28773795) Homepage
        I can't say I'm looking forward to bigger, faster, shit-covered platters...but hey. Who am I to stand in the way of progress?
  • I thought Essbase was supposed to be one of the best databases for managing too much information. Is this supposed to be an alternative, or act as something in-between using Essbase and a mysql server?

    • by hemp ( 36945 )

      Transaction speed has never been high point of Essbase, nor storing anything but numerical data.

      Changes to the data are not reflected immediately except in the lowest level members until it is re-calculated. It is not unusual to find calc scripts that run for 8+ hours.

  • The grad students do all the work, and the professor takes all the credit. Anyone can come up with ideas, the real work is in actually getting things done. This is the reason I stopped grad school with my MS even though I LOVE computer science, more than anyone i've ever met.
    • I hate to disagree with you ... but ....

      Anyone can come up with ideas is true, HOWEVER not all ideas are GOOD ones. The problem with coming up with GOOD ideas is often people don't have a basic understanding of the problem or the implications of various ways of implementing an idea.

      Getting people to do the work is often not quite as easy as it seems. First you have to have qualified people. They have to be motivated to actually complete the work given.

      As for degree programs at schools and such, a MS is noth

      • by grepya ( 67436 )

        I take a look at what I know, verses what I knew graduating college, and I know substantially more, and more practical knowledge, things that no MS piece of paper can show.

        Does your extensive post-collegiage learning include constructing a multi-clause sentence in the English language (..and I wouldn't even mention the spelling error) ? Ordinarily I wouldn't be an asshole about this except you screwed up the exact sentence where you're bragging about your amazing skills, acquired over a long professional career. And whatever that career might be, writing a readable sentence in your main language is a basic skill (and I know you're an American from your other posts).

        • Can't spell worth shit, doesn't negate my intelligence. I know idiots who spell perfectly. I know very intelligent people who spell well, and idiots who can't spell like me. Spelling is NOT a sign of intelligence nor education level.

          And I didn't know I was going to be "graded" on a spur of the moment post to a web log. Had I known you were a lurking grammar nazi, I would have proofread my post more carefully. Perhaps even hiring someone to draft(write) it for me to post as my own.

          Knowing my weaknesses (spel

          • by grepya ( 67436 )

            It's not just your writing that sucks. Going by your response, your reading comprehension is not so hot either. I like your rant (which, your probably did get proofread ironically) but it was about *THE WRONG THING*. I'd leave it at that. Maybe you should spend some time on reflection rather than defending the indefensible.

  • by chrysalis ( 50680 ) on Tuesday July 21, 2009 @03:19PM (#28773465) Homepage

    Scalability is one thing, but what we appreciate in SQL-free databases is also that they don't require SQL.

    When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.

    The Tokyo Cabinet API is absolutely excellent in this regard. And there's no need to learn yet another domain-specific language like SQL, just use the language you use for the rest of the app.

    Now, SQL-zealots would troll "but how would you do with ?".
    And yes, for complex requests as in data mining, SQL and XPath make sense. For people who aren't developpers, SQL makes sense as well. For interoperability with 3rd-party apps, SQL is also useful, just as FAT is still useful today in order to share filesystems between operating systems.

    But for the rest of us, SQL is cumbersome. Databases like MongoDB make you achieve similar results in a more natural way instead of forcing you to learn SQL and to rethink everything in a tabular way.

    • by maxume ( 22995 )

      Is there something terribly dangerous about parameterized queries?

    • When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.

      "Cheaper" in what sense? You can't mean cost, as I'm not aware of ORMs introducing any cost overhead. So presumably you mean the cost of marshaling the data between SQL and your object model.

      So, how, pray tell, do these new magical DBs convert the stored data to/from objects that your software can consume? Do they not marshal at some level? And i

      • Marschaling is still required, but you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.
        Look at MongoDB: http://www.mongodb.org/ [mongodb.org]

        • by TheLink ( 130905 )
          > you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.

          That's not a disadvantage in many cases, especially for the long term. There's a benefit of forcing people to use SQL to talk to the DB. It becomes a layer of abstraction, somewhat like a standard protocol or interface.

          When you use SQL, your database can be used by 100 different people and programs, and when you add co
          • > That's not a disadvantage in many cases, especially for the long term. You Aren't Going to Need It.
        • Marschaling is still required, but you don't need to think about being restricted to a schema, columns, types, to define identifiers for everything, to do explicit joins, etc. Just store your objects as they are in memory.

          Well, that's all very interesting, but it doesn't sound any different than your average OODBMS, something which has been around for a *long* time (I worked with Versant nearly a decade ago doing exactly what you describe). Heck, the Smalltalk world has had intelligent object databases for

    • by jd ( 1658 )

      I would argue that all solutions that currently exist for databases are ideal for some specific set of problems AND some specific set of users for each problem within that initial set.

      There is no "perfect" solution that will work for all types of data, be it a flatfile structure, a hierarchical structure, a relational structure, object-oriented or some combination of those. (The star-structure of OLAP databases is a hybrid, for example.)

      What would be good is if there was a suitable metalanguage in which you

    • I would just like to state for the record, that IMHO SQL is a beautiful thing. Its ease of interoperability (both between languages and backends) has saved my butt on numerous occasions (not to mention the ease with which you can go from very simple to very complex depending on the need of your application) ...
       
      ...and you can get rid of it and replace it with OOP when you pry it from my cold dead hands.

    • Re: (Score:2, Insightful)

      by BitZtream ( 692029 )

      When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement, and way cheaper than using an ORM.

      Yes, I'm an SQL troll, but ... if using SQL to get a row by a unique ID is too hard for you and too insecure, there is no amount of code which is going to fix your problem, which is that you are a shitty developer who is far too lazy to make a function or macro to wrap around the simple sql request.

      There are PLENTY of reasons to not like SQL, but you

    • by Tablizer ( 95088 )

      When what we want is just to retrieve a record, calling get(id) is way easier and more secure than building an SQL statement

      What's so hard about building a wrapper function for the times you want simple:

      function getRecord(int id) {
        return(query("select * from foo where id=%", id));
      }

      And you'd still have SQL for the times you want fancier stuff.

      • "id" is MP3 data. How do I find the title of the song? Will your 3 liners do the trick?
        In a document-oriented database id wouldn't be an issue. No need for any wrapper.

        • by Tablizer ( 95088 )

          My example is based on the original example given. But this illustrates part of my point: when you need features you didn't originally anticipate (or ask for), you have the power of SQL to help out.

    • by ChaosDiscord ( 4913 ) * on Tuesday July 21, 2009 @04:38PM (#28774475) Homepage Journal

      "a record, calling get(id)"

      So you're relating "id" to "a record." I assume that the record in question is a blob of potentially binary data that your program parses however it wants. So you want to relate unique identifiers to blobs. You can do that quite easily with SQL. Looking up a given unique identifier quickly is something your average relational database is very good at. And writing the wrapper function to implement your hypothetical get() function is trivial in most languages. I'm completely at a loss for what your SQL-free database is offering me in this case. It's saving you from the horror of writing 10 lines of code, once, to implement get(in)? 60 minutes with a good SQL tutorial will teach you everything you need to know. Sure, there is a lot more you can learn, but for the simple case you're describing you can understand SQL at only the most simple level.

      Or are you handwaving the "a record" is actually automatically squeezed into one or more variables or objects in your code? You say get("ChaosDiscord") and out pops the UserObject populated with the relevant information. Of course, at this point you need to start teaching you database, or at least your database wrapper, how your objects are structured, and how to serialize them. This is admittedly a bit of a nuisance, but an SQL-free database doesn't magically make the problem go away. Sure, an SQL-free database can provide a layer to simplify or automate it, but so can a layer on an SQL database (Ruby on Rails is perhaps the best known). Sure, you'll need to tell it that username is a string, userid is an integer, and so on, but you only have to say it once in SQL instead of in your program. The total work hasn't gone up.

      Ultimately, you appear to be complaining that SQL is too powerful (and thus complex) for your needs. But you can easily learn and use a subset of SQL that corresponds to what you claim you're looking for in an SQL-free database! You might as well complain that Java is too powerful it has thousands of classes you don't need. The time to learn the relatively minor amount of SQL you need is insignificant compared to the time to develop any non-trivial application. If even that hour is too much, you can outsource the work to a geeky college student for some pizza and soda.

      There are some compelling reasons to look at SQL-free databases, but "SQL is too powerful" isn't one.

      • Some people are able to grasp new concepts and others cant???

        I'll get off your lawn now.

        • by elnyka ( 803306 )

          Some people are able to grasp new concepts and others cant???

          I'll get off your lawn now.

          And the later types are the ones that write shitty code on a regular basis? /sarcasm-gone-amock :-/

      • Didn't realize Joe Celko was a Slashdotter.

      • It's because SQL isn't just SQL. It's all the cruft that goes with it. Accessing a DB from an OO language is simply a major fucking pain in the ass and much harder than it should be even when using the ORM du jour. A lot of this complexity comes from the fact that OO and RDBS just don't play well together no matter how you slice it. Instead of focusing on the business domain you end up spending far too much time dicking with the data layer. A lot of this would go away by using an OO database but then y
        • I believe the great-grandparent poster was talking about simple key-value stores, similar to the Tokyo Cabinet system he mentioned. When people talk about Anti-SQL or SQL-Free, that seems to be what they're always talking about, although usually on the larger end with things like BigTable and HBase. My criticism was directed in that direction. Compared a key-value store to a subset of SQL, or even a key-value store implemented in SQL, the complexity difference is negligible for any but the most simplisti
      • Judging from your response to ahabswhale, it seems that you've pretty much made up your mind that everyone else is wrong and you're right, but I'll take a stab anyway at why I think you're missing the point:

        Looking up a given unique identifier quickly is something your average relational database is very good at. And writing the wrapper function to implement your hypothetical get() function is trivial in most languages. I'm completely at a loss for what your SQL-free database is offering me in this case. It

        • The grandparent ultimately asserted that "for the rest of us, SQL is cumbersome," calling out that an "SQL-free" database is "easier", "more secure," and "cheaper."

          If we're talking about essentially key-value stores, SQL can do it well.

          It's harder, but we're talking about an hour or so of work, an hour's worth of value you can reuse for future projects. For all of the complaints about needing to worry about tables with lots of fields and serialization, it's moot if you just want a key store. All that

    • Re: (Score:3, Insightful)

      by MikeBabcock ( 65886 )

      I understand putting another API above SQL to make it simpler to use, but avoiding using SQL because its powerful makes no sense.

      • by chez69 ( 135760 )

        Not to be a troll, but this sure sounds a lot like IMS. Write a program to analyze the data.

        some mainframers would be laughing their asses off.

    • And there's no need to learn yet another domain-specific language like SQL,

      SQL, "domain specific"? Wow. I am taken aback. Over 30 years of coding, I think SQL is singlehandedly the most productive addition to the development environment I can think of since the compiler. There are a lot of reasons that using a SQL database might not make sense (small platform, single user, low cost, small required footprint, etc) but domain specificity isn't on my list. I can't think of a less domain specific development

    • Talked about "records" and "id" because people familiar with SQL might not be familiar with other kind of databases, but you took it the wrong way.

      Now, ask Google. How many critical vulnerabilities were due to SQL injections? How many similar vulnerabilities were found in SQL-free databases?

      I agree with you that workarounds do exist, and that developpers are to blame instead of SQL, but in the real world, SQL is how a lot of services are compromised by kiddies.

      Why do we need to invent wrappers?
      Why do tools

      • by growse ( 928427 )
        SQL injection vulnerabilities don't exist because of the database, they exist because of the crappy programmer who doesn't know how to use the database being let loose writing production code. It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".

        And there's nothing wrong with SQL. There's a lot wrong with people who think SQL will solve every single problem under the sun. Unfortunately, those people seem to be employed writing 3rd-party abstraction layers and ORMs.
        • It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".

          No, it's about having less issues by using modern tools, rather than trying to find who's to blame.

          If HTML/JS/CSS/HTTP could be redesigned today, do you think that the way a browser manage cookies, XHR requests and sandboxing in general would be the same as it is today? Do you think that the SMTP protocol that was good enough 30 years ago is not a big pile of crap nowadays, even, just like ORMs, their content is now

          • by growse ( 928427 )

            It's a bit like saying "Lets blame the internet for Cross-site scripting vulnerabilities!".

            No, it's about having less issues by using modern tools, rather than trying to find who's to blame.

            If HTML/JS/CSS/HTTP could be redesigned today, do you think that the way a browser manage cookies, XHR requests and sandboxing in general would be the same as it is today? Do you think that the SMTP protocol that was good enough 30 years ago is not a big pile of crap nowadays, even, just like ORMs, their content is now shown in webmails? SQL is just like SMTP. Or the FAT filesystem. An old thing. There are worthy proposals and even working products that could superscede them, but because of legacy applications and people who want to stick with the same technologies till the end of the universe, these old things remain. They just get bloated with new extensions instead, in order to keep up with mandatory requirements.

            Of course, if you were designing something today to do the job of 'relational database', you'd probably get something different from SQL. That doesn't change the fact that today, the SQL / RDBMS combo is the best tool for solving a lot of problems. That doesn't mean that people won't try and use it improperly, but those people are idiots. People don't stick with SQL because it's old, they stick with SQL because it gets the job done, and in the hands of someone who knows what they're doing, it gets the job d

      • Re: (Score:2, Insightful)

        by mabinogi ( 74033 )

        The answers to those questions will say a whole lot about why PHP sucks, but very little about SQL.

        in particular:

        Why does a stock PHP have 5 different APIs just to issue basic MySQL queries?

        Because the PHP developers have re-invented the wheel five times and still haven't figured out it's not supposed to have sharp corners. Nothing to do with SQL. Perl's DBI is a good example of a database abstraction layer done right.

    • And yes, for complex requests as in data mining, SQL and XPath make sense. For people who aren't developpers, SQL makes sense as well. For interoperability with 3rd-party apps, SQL is also useful, just as FAT is still useful today in order to share filesystems between operating systems.

      But for the rest of us....

      Sorry, but could not help thinking but to this line from "Life of Brian":

      But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system, a

    • PostgreSQL allow turning any programming language into a query language, AFAIK. BTW, this may be off-topic, but I'm pretty sure I'll get the most DBMS geek eyes on this article, so here it goes - would it be possible/feasible to integrate the compiler system and cache with the VCS within the database? My idea is about getting the flexibility of Portage/pkgsrc systems without the hassle of compiling the whole thing, start to finish. I'm pretty sure most compile time options can be recalculated quickly, and r
  • MySQL? (Score:4, Funny)

    by trisweb ( 690296 ) on Tuesday July 21, 2009 @03:38PM (#28773711) Journal
    No offense to the creators (well, maybe some offense) but why the heck would you want to put MySQL in where PostgreSQL already was? That's like taking out your star quarterback and putting in, well, me!
    • Re:MySQL? (Score:4, Informative)

      by scorp1us ( 235526 ) on Tuesday July 21, 2009 @04:31PM (#28774363) Journal

      MySQL has its fan boys from circa 1994-2001. During this period, the MySQL license was much more permissive, and gained a certain momentum from PHP that carries it through to this day. At the same time, PostgreSQL was still using Cygwin on Windows, the INSTALL had a table of contents, and was lacking performance enhancements (particularly on Windows). Eventually Cygwin was dropped and the threading was happy on windows, and the performance enhancements were good. Along with this came a much shorter INSTALL file and all reason to use MySQL had disappeared. But once you know something, people like to keep on using it. Then MySQL got things like triggers, foreign key constraints and full ACID compliance. So in the end it ended being a wash. However, and not to start a flame war, it seems that PostgreSQL, having been feature-complete (ACID, foreign keys, etc) maintained a performance edge. But also to this day MySQL has a very fast table implementation, provided you don't need things like ACID compliance. For a variety of applications, this is "good enough" and the trade-offs of feature completeness vs performance are worth it. Disclaimer: I have used both extensively in the past. I prefer PostgreSQL, but now use neither. Now I only do SQLite (embedded tables) or Oracle (for hot replication).

    • We might create the software intending it to do and be used in one way, but how it will actually be used is determined by the users. Postgre and MySQL don't carry any intrinsic values, only the values which their users discover and, well, use. Without users they have no good or bad features.

      So why is it that people feel the need to rally around or defend them? After all, only the developers who have done the work are capable of understanding the snips and criticism leveled against them, and these are the
      • I've used both pretty extensively in a wide variety of environments, and I don't take such a balanced view at all. IMHO, the best answer to most database-related problems is to use PostgreSQL or SQLite. MySQL sits somewhere between them in terms of reliability, scalability, ACIDity, etc, and kinda fails at being good at anything in particular. For that matter, even if you *like* where MySQL lies on those tradeoffs, compared to either of the other two mentioned products (especially Pg), the quality of the

    • Comment removed based on user account deletion
  • (In my best Special Ed impersonation)

    Yaaaaaay, now we can scale out Hadoop! Yaaaaay! Yaaaay Hadoop! Yaaaaay!

  • There are also two Hadoop subprojects that either support SQL or will shortly. They both translate SQL queries into map/reduce programs. They are:

    http://hadoop.apache.org/pig/ [apache.org]
    http://hadoop.apache.org/hive/ [apache.org]

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...