Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed 423
ericatcw writes "The inaugural NoSQL meet-up in San Francisco during last month's Yahoo! Apache Hadoop Summit had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party.
Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of burdensome, expensive relational databases in favor of more efficient and cheaper ways of managing data, reports Computerworld."
A time and place for everything (Score:4, Insightful)
SQL is great for financial data. SQL is terrible for genetic data.
Tilting at windmills (Score:5, Insightful)
Seems to be a silly thing to be against. Relational databases and the stuctured query language may not be perfect, but I bet these people could die in their 90's and people will still be using relational dbs and sql.
If you want to tout open or cheap dbs and more lightweight types of storage/db servers, then they might have some points, but being against sql is just plain dumb.
Re:Quit Whining (Score:4, Insightful)
The horrible lag I get when using address completion in Firefox 3 makes me wish more people thought that way!
Flat Earth (Score:3, Insightful)
I've seen strong reactions from various camps with regard to concern over saying no to SQL. I'm not sure why people freak out over it. First, you have to strike out toward new things if you want to progress the world. Second, SQL hasn't caused people to stop using spreadsheets or Access databases. Third, there are groups that get together to dispute that the earth is round; insisting that it is flat. Or that gray aliens are visiting earth regularly and probing our anuses.
Bring on the next fascinating data technology. SQL will continue to have a major place for many years to come, no matter what happens.
Not mutually exclusive (Score:3, Insightful)
Just because a crowbar can pull out a stubborn nail better doesn't mean they should replace all the hammers. Then what would we put nails in with? Different tools for different jobs.
Re:A time and place for everything (Score:4, Insightful)
It would be interesting to hear why this is.
Yeah, so why are they better? (Score:5, Insightful)
If I was to read the article, I bet somewhere someone would be wittering on about Key Value Datastores.
The brainchild of a generation brought up on high level collections, they learn one (in this case Map) and apply it to everything.
Sadly SQL, and RDBMS, works for most people. It maps object data well (oh whaaaa, i have to do foreign keys - GROW SOME FUCKING BALLS YOU LAZY GRADUATE!) and it is well understood. And with abstractions like LINQ to query them, even the lazy dumb Windows .NET programmer doesn't have to strain their brain to learn SQL.
And when you have terabytes of specific unique data, you clearly should go away to work out how best to store it. Even a RDBMS/SQL solution is too generic for all problems.
Re:Flat Earth (Score:5, Insightful)
And yet where the other corporations; the oil companies, the banks, large merchant conglomerates. In IT we seem to have this sort of myopic view that if it isn't an IT company of some kind, it doesn't exist. Google, as compared to the huge companies that use tools like Oracle, is a bit player. I know that's hard for all of us who have sucked at the teat of silicon valley for so long have a hard time dealing with, but a significant amount of data that has nothing to do with social networking and finding pr0n goes on and does use tools like SQL.
What's the benefit exactly? (Score:4, Insightful)
I'm not seeing anything that offers a real advantage over using advanced features like one finds in postgres combined with memcached. Some of my program likes to think of its data as a structured object while other parts like seeing that data as rows in a table (they even link up to other tables through foreign keys!).
Re:Tilting at windmills (Score:5, Insightful)
SQL isn't the only way possible to query relational databases. It's nice and does a really good job for even mildly complex queries and I would not want to ditch it just yet, but seriously... who hasn't had a business need for multiple levels of aggregates (eg averages of sums across multiple groupings, say "average across all customers' total balances") As it is, you end up splitting the logic between the database and the application, or creating a view of the first level of aggregation, then querying against that and hoping that the performance doesn't suck total ass.
Re:A time and place for everything (Score:4, Insightful)
Design an efficient table relating a tree structure. Then design queries to answer questions such as:
* Find the nodes in the subtree under B.
* Find all ancesters of G
* Find the nearest common ancestor of D and H
Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.
Re:A time and place for everything (Score:3, Insightful)
It would be interesting to hear why this is.
My guess would be that because SQL is a Structured Query Language it is best used for handling structured data. If you have serial, unstructured data you have to invent your own format for it to use inside the database, and then the query language isn't helping you.
How about saying yes to the alternative (Score:5, Insightful)
Saying no to SQL and relational databases is just fine if you've got something better to replace it with. However I know of no such thing. The reason they're popular is that they are so powerful for data storage. If something better came along you wouldn't even need to say no to SQL. You'd just say yes to the newer better rival.
Re:Tilting at windmills (Score:3, Insightful)
I agree, there are problems SQL doesn't solve well. But I think it's unlikely that other, better solutions to those problems will also be superior to SQL where it *does* perform well. As such, "no SQL" is probably not the right plan any more than "SQL only".
SQL is not a database (Score:5, Insightful)
SQL is not a database, it is a standard interface to a feature set commonly associated with relational models. Before everyone standardized on SQL, there were other relational query languages. The "No" part of "NoSQL" refers to the fact that some basic elements of relational implementations cannot be usefully expressed using a much simpler distributed hash table model.
All the "NoSQL" does is eliminate all the parts of traditional relational databases that do no scale -- discarding the bottleneck rather than fixing it. These are things like joins and external indexing. Unfortunately, discarding those things means you discard a lot of very important functionality as a practical matter, notably the ability to do fast, complex analytics. Adopting the NoSQL architecture runs contrary to the trend toward more real-time, contextual analytical processing. There are a great many analytical applications that are not amenable to batch-mode pattern-matching, and the NoSQL model is a lot less applicable than I think some people want to acknowledge. In its domain, it is a great tool but it has many, many prohibitive limits. We are essentially trading power for scale.
That said, do not take this as an endorsement of traditional SQL relational databases either, as they have a number of serious limitations themselves. As just mentioned, a number of the core analytical operations those models support are based on algorithms that scale poorly. The SQL language itself has mediocre support for many abstract data types (e.g. spatial) and data models (e.g. graph), which in part reflects the inadequacies of the assumed underlying database algorithms (e.g. B-trees) that are implicit in SQL. The inability to efficiently do event-driven/real-time applications is also more a reflection of the access methods used in databases than any intrinsic weakness in SQL; SQL may be clunky for that purpose, but that is not the real limiter.
A truly revolutionary deviation from SQL would usefully implement a superset of the features SQL supports, not take them away. Of course, we would need access methods more capable than hash tables and B-trees to useful implement those features, which is a lot more work than discarding features that scale poorly. NoSQL is a stopgap technical measure for that small subset of applications where the serious tradeoffs are acceptable.
Re:Quit Whining (Score:3, Insightful)
Data out-lives applications (Score:5, Insightful)
First: my mantra: Data belongs to the organization, not the application... if the app fails and data is accessible then we all go on - if the data fails or is locked away - what was the point of the app again?
In a SQL database then data is understood by the organisation, DBAs and data architects. If left to app developers taking an app-centric approach to data... I get nervous quickly.
So long as the data is just as definable and accessible as current SQL databases then all good - give me an app with some odd-ball storage then it is bye-bye.
Re:Yeah, so why are they better? (Score:4, Insightful)
Saying RDMS's map object data well is a bit of a stretch, they map relational data well and that's it.
http://www.codinghorror.com/blog/archives/000621.html [codinghorror.com] for some good background on the problems.
For me using an RDMS as the persistence layer for an object-oriented application has ALWAYS felt like a bit of a kludge. Like we're using it just because it's what we have, rather than the best tool for the job.
Re:A time and place for everything (Score:4, Insightful)
Cartesion Product? (Score:2, Insightful)
Re:The problem is performance not SQL (Score:5, Insightful)
It's just that now that we can assume local clusters and WANs worth of co-operating data stores, there are probably better, more performant ways of implementing persistence, replication, distribution of data than traditional RDBMS implementations.
You can also assume magical fairy dust and free energy, but that doesn't make it so. You can ask if there are better ways, but you can't assume it, and in the end you will find there is no magic.
Clusters and replication are NOT NEW. Not even remotely new. There is, in fact, nothing new architecturally at all that would indicate some new capability that hasn't already been repeatedly analyzed and tried. That doesn't mean you can't tweak something for a situation, or that you need a giant Oracle database for everything, but "the web" and "cheap hardware" change the equation by precisely nothing.
What has changed the equation is cheap, unimportant data, which covers the majority of the web. "Real" applications, where data integrity is important (like say, your bank account), and immediate accuracy guaranteed, require the main thing you use a database for: data integrity. Your facebook page, your google search, that blog entry, or some video on youtube: these don't matter. If it's a little slow, or doesn't update immediately, or you get an error, no one is losing money. No one cares.
In essence, if a reliable database isn't important for your app, your app isn't really handling important data. This may be fine; in the mainstream, there's a lot of noncritical stuff. But this doesn't make databases unimportant.
RDBMS and application logic (Score:5, Insightful)
That is one view. It's nice and all, but incomplete. The issue is performance.
Any time you're dealing with a large quantity of data, it's always easiest to process or filter where it's located. Transmitting it, processing it, and transmitting back changes adds an unreasonable amount of overhead. Hence, SQL is a "Query" language. In other words, you have the RDBMS do reasonable data processing and filtering of records for you. Your application should only need to specify the operations performed, and should only process data if your computation is particularly unusual. This makes feasible computations that would otherwise be entirely unreasonable. (note that an application working on the same machine generally has the same issue as one working on a separate system. SQL servers present the application with a stream of data - pipe, socket, etc)
My opinion: SQL is horrendous. It's a pain to use, and many basic data transforms cannot be described in that language (at least without some huge, awful, convoluted command == maintenance nightmare).
Re:Quit Whining (Score:3, Insightful)
Re:A time and place for everything (Score:3, Insightful)
The genetic data you describe is not too different from other things we all have to deal with. Trace or log data. Video streams. Sequences of real time events from practically anything. All of these things consist of partly structured streams from which we need to extract meaning. And yes, for all these things storage in a relational database doesn't add any value.
SQL is to data query languages (Score:2, Insightful)
The worst ever devised excepting everything else that has ever been tried.
Re:A time and place for everything (Score:2, Insightful)
PostgreSQL also has specialized index algorithms for handling exotic arrangements of data. One index type that I'll be looking into in the near future [I've been told] makes it possible to efficiently take two dimensional data, and return rows that all fit within the specified radius starting from two coordinates. Although this can be done with a combination of indexes and various formulas, it's not as elegant as what PostgreSQL can do now. So, when I see statements that SQL hasn't progressed, I question the level of expertise of those making such statements.
As for "all they want is their objects," I think that's true -- just look at all the PHP newbies out there who like to create tables with hundreds of columns instead of taking advantage of one of SQL's greatest hallmarks: Relationships between tables
Java (with JDBC) and Perl/mod_perl/mod_perl2 (with DBIC, a.k.a., DBIx::Class) should make these folks happy though because they do provide slick OO interfaces to SQL that, unless you keep all your hundred or so columns in a single table, also require at least a basic understanding of SQL.
Anybody working with databases in a serious way has to have at least a basic understanding of the underlying technology. Creating an alternative to SQL seems ludicrous to me because it will eventually take away from the large pool of SQL expertise that exists today (I never knew about an anti-SQL movement until I read about it here today on SlashDot); many other alternatives do already exist too, such as BTrieve which has a totally different interface from SQL, and although it performs well it's just not as popular anymore (didn't Pervasive, the company that currently owns the BTrieve technology, switch to a customized rendition of PostgreSQL or something like that anyway? I wonder what their real reasons were for focusing on SQL in favour of BTrieve?).
As a developer, I see that performance (speed) is a very important factor, but that's not the only factor for me -- quality of code, helpful documentation, and overall system reliability (including, very importantly, the ability to always recover gracefully from a power outage that occurred while thousands of INSERTs, UPDATEs, and DELETEs are active, which I confirmed in some informal "pull-the-plug" testing many years ago that PostgreSQL and Oracle both do by issuing ROLLBACKs at database mount time after the OS recovers) because it will mean easier development with fewer potential problems for me to deal with in the future if something does fail.
Re:A time and place for everything (Score:4, Insightful)
There are certainly common CS themes running between all three. We have three languages not because people haven't thought about those things, but because they make our lives easier.
Whenever I hear people bitching about 'doing away' with SQL, I always wonder what they think is wrong with it. SQL certainly has some limitations, don't get me wrong, but it is a great language for the vast, vast majority of cases. If your application is so specialized that SQL isn't appropriate, well, bravo, but that does not mean that the relational database concept is flawed. Personally, I think if people spent a few moments doing some formal analysis before they built their databases (imagine that, thinking before doing?!), they would find that SQL is a beautiful thing. If your implementation of SQL doesn't cut the mustard, maybe you just need a better query optimizer?
Re:Quit Whining (Score:4, Insightful)
Flat files are a perfectly viable option in some circumstances. Not everything requires data uniformity or the ability to run complex ad-hoc queries, nor does everything need information to be controlled by a separate process running on a different machine. Not every system integrates multiple applications through a shared data-store. The NoSQL crowd isn't arguing that SQL is bad, just overused. There are a great many situations where something like flat files or Berkeley DB is more than sufficient, and yet people still use relational technology. In my experience it's generally because that's all they know. In their mind, if one needs to store data one uses SQL. They don't select the right tool for the job because they honestly don't know there are other tools.
How hard is it to understand? (Score:2, Insightful)
Use the appropriate tool. Always. There are tons.
Don't use a relational database to try to represent hierarchical data. Don't try to use LDAP to do analytics. Think of the performance implications before you have more than two users accessing your system. Data storage is a very different animal, you are often (though not always) I/O bound. This is very different from being limited by the amount of instructions you can deal with per unit of time. Don't think otherwise because it will bite you in the ass.
And still I see people making the same stupid mistakes over and over. But it's pretty simple really:
A solution designed to be generic will ALWAYS be slower than a solution that is customized. This shouldn't be surprising. If you have serious performance requirements (ESPECIALLY if they are coupled with huge amounts of data) then a custom solution is definitely something you should look into. At some point you will run into a brick wall and find out that there is stuff you can't do with the solution you have in place. This is natural. Custom solutions to hard problems always lead to restrictions in terms of future features. Always. You will NEVER be able to anticipate all features that you would like to have. (Yes, this is true for Google as well. No they don't have any special kind of magic dust that they sprinkle on their things there, they do the best they can and then they get bitten in the ass too, just like everybody else.)
Re:A time and place for everything (Score:5, Insightful)
The only real show stopper and a real reason to replace RDBMSs is #5. All the others can be worked around by just deeper study of data modeling techniques. Data modelling is not something most developers can figure out intuitively. There is a lot of theory to be learned to do it right and it can very easily be done badly leading to severe performance problems and an unmaintainable application. ,but that lets them get around # 5,
With regards to # 5: I went to a presentation at Javaone where some Ebay engineers explained that they do not use transactions in any of their database operations. They just leave junk rows around in the db if a transaction half completes and as long as they aren't reachable they don't consider it a big deal. They have to very carefully organize the order in which they manipulate data to avoid data corruption
Re:Quit Whining (Score:5, Insightful)
One of the reasons is because RDBMSs offer a lot of tools, like atomicity, durability, backup/restore, centralization, point-in-time-recovery, etc. Many application developers need these things without actually needing the abstraction of a relational system.
Re:A time and place for everything (Score:3, Insightful)
As a web programmer, I wanted to take offense at your statement, but something that happened to me only a few weeks ago is making me have a hell of a lot less faith about the available pool of web programmers out there:
During a round of interviews, we sent out a take-home quiz. We mostly wanted to know if the candidates either knew the actual answers, or could at least google it. One of the questions involved simple aggregates in SQL. Given a table with a unique id and a date of birth, I wanted a query that would produce a list of the months of the year, and how many unique records had a DOB that fell on that month. It's a one-liner.
One of my candidates wrote TWELVE counting queries, each one counting DOBs between the (hardcoded) start and end of each month, then she used UNION to make it send out the 12 one-row queries as one 12-row query.
Both of us evaluating the results screamed when we read her answer, and we did not pursue her further. I used to complain that programmers simply didn't give a shit about learning beyond the querying aspects of the RDBMS, which kept us at the mercy of overpaid DBAs. Now? Now we are starting to see that programmers don't even give a shit about learning how to query.
Re:A time and place for everything (Score:2, Insightful)
I want to say you're breaking fourth normal form, but I can't.
I want to say you're storing derived data, but I can't.
I CAN say that data structure is just butt-ass-ugly.
Re:What? (Score:3, Insightful)
How many Googles or Yahoos are there? Like, 5. Let them do whatever broken things they want -- it works for them... for now. It's still expensive, probably just as much as "big iron". Not to mention the countless engineer hours and hosting/electricity costs for their "scale out" systems. It's what happens when you let a bunch of ivory tower PhDs solve real engineering problems.
In the end, the rest of us serious enterprise engineers will allow Oracle, Microsoft, and the people who have been doing this for 30 years to optimize their code to run on multicore mainframes ... which is where massive computing belongs. Then we query it with a few lines SQL instead of convoluted algorithms in some "Map Reduce" environment, and you move on with our lives.
Re:A time and place for everything (Score:4, Insightful)
Design an efficient table relating a tree structure. Then design queries to answer questions such as:...
I don't know, but I recall reading that Postgres 8.4 is now out and includes support for recursive queries. [slashdot.org] (trees) Not sure about the reputation of the blog in question, but you may have heard of it?
the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones
You are kidding, right? Just today I cooked up a 7-table query including 2 subselects, and a left outer join to a meta table consisting of 2 inner joined tables. Total of some 11 tables comprising a highly complex data set. Don't know what you mean by "very simple one dimensional ones" but 11 tables each joined in either a one-to-many or many-to-many mapping provides at least 11 dimensions. (more if you self-join tabls, often needed) And this isn't particularly hard for me - often I have joins combining 12 or more very large tables with unrestrained combinations somewhere in the billions to trillions of possibilities that all somehow seem to parse just a few seconds thanks to a few well-placed indexes and a well-structured query.
Methinks you don't really understand SQL?
Re:Quit Whining (Score:3, Insightful)
If your data isn't complex enough to require a RDBMS, you almost certainly don't need a program.
Really? IM, word processor, spreadsheet, vector graphics, photo editor, ... Google probably uses MapReduce without a "normal" RDBMS behind it - is that data complex enough?
Re:A time and place for everything (Score:4, Insightful)
So what you are saying is that you do not know enough of SQL to understand that query; therefore you are not qualified to comment on the practicality and viability of using SQL for complex structures.
I suspect the same applies to Perl.
-dZ.
Re:The RDBMS responds to the nitpick (Score:1, Insightful)
Ok, you owe me a new keyboard. Coffee EVERYWHERE!