Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed 423
ericatcw writes "The inaugural NoSQL meet-up in San Francisco during last month's Yahoo! Apache Hadoop Summit had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party.
Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of burdensome, expensive relational databases in favor of more efficient and cheaper ways of managing data, reports Computerworld."
The problem is performance not SQL (Score:4, Interesting)
The problem is the performance of transactions and persistence and distribution of data techniques, not
whether we are using a logic-like STRUCTURED QUERY LANGUAGE to ask for data matching certain conditions.
The latter is still, and will continue to be, very useful.
It's just that now that we can assume local clusters and WANs worth of co-operating data stores, there
are probably better, more performant ways of implementing persistence, replication, distribution of data
than traditional RDBMS implementations.
The two concerns: The logical model of how we QUERY for data (or combine it in bulk), which is the core of SQL,
and how we persist it and retrieve it quickly, now have more options for being separated.
Re:Flat Earth (Score:4, Interesting)
I've seen strong reactions from various camps with regard to concern over saying no to SQL.. Third, there are groups that get together to dispute that the earth is round; insisting that it is flat.
Corporations represented in this group included the likes of Google, Last.fm, Amazon, and Facebook. Hardly the same caliber of people who claim the earth is still flat. I'm inclined to listen to engineers from these companies if they say that an SQL database does not scale well for vast amounts of data.
Re:Flat Earth (Score:3, Interesting)
The whole thing is just reactionary mumbo-jumbo. There are kinds of data that relational databases are fantastic for, and kinds of data they're not, and sometimes none of it is exactly perfect. SQL is actually a pretty damned good, single-purpose language. It's not hard to learn, and once you learn it, the differences between RDBMS implementations becomes a little like Javascript, just something you have to put up with, not that a lot of people actually have to worry all that much about writing fully-portable SQL queries.
Re:A time and place for everything (Score:3, Interesting)
Re:Flat Earth (Score:1, Interesting)
You sure there's absolutely no difference between the nature of a bank and the nature of a massive search engine?
And how sure are you that a bank's IT staff are on the leading edge of innovative technologies? If anything, they lag behind because it's "safer" than risking the untested new thing.
Try a few of the Post-Relational databases, read up on the CAP Theorem, understand the -nature- of the problem you're talking about, and then come back.
Or I'll save you some time. RDBMS systems focus on Consistency, and trade Availability for it. Your bank's computer can be down for an hour... inconvenient, but acceptable. But they cannot, under ANY circumstances, be incorrect. Period. Google, on the other hand, can handle some slightly incorrect data... but being offline is totally unacceptable.
Amazon's CTO gave a great example. He talked about how a Shopping Cart must have Availability, and slight inconsistencies in the data as that data propagates a network are acceptable. In the end, the data is eventually consistent anyways, and you NEVER want your customer to not be able to add a cart item. The checkout, however, is financial, and heavily needs Consistency. Alternatively, after the order is done, the list of past transactions again can lose consistency a tiny bit (since it's read-mostly anyways) in exchange for always being up.
Hmm... more to the issue than you thought? XD
Pros & Cons of non-relational solutions (Score:5, Interesting)
Note that most of these solutions come from the interwebs, social networks, etc. And it isn't so much anti-sql as it is anti-relational database (sql != rdb).
The basic premise is that we need different solutions that: can scale very high for very narrowly scoped reads & writes, don't need to perform ranged queries / reporting /etc, and don't need ACID compliance. And that may be the case. Sites like slashdot, facebook, reddit, digg, etc don't need the data quality that ebay needs.
On the other hand, ebay achieves scalability AND data quality with relational databases. And when I've worked with architectures that scale massively and avoid the relational trap for better solutions - they inevitably later regret the lack of data quality and complete inability to actually get trends and analysis of their data. It *always* goes like this:
Me: So, is this thing (msg type, etc) increasing?
Developer: No idea.
Me: Ok, so lets find out.
Developer: How?
Me: I don't know - typical approach - lets query the database.
Developer: It'll take four+ hours to write & test that query and then days to run. And when it's done we might find that we wrote the query wrong.
Me: What?!?
Developer: We had to do it this way, you can't report on 10TB databases anyhow
Me: What?!? Are you on crack? there are dozens of *100TB* relational databases out there that people are reporting on
Developer: well, we probably don't need to know what that trend is anyhow
Me: I'm outta here
Re:A time and place for everything (Score:5, Interesting)
Sorry, that's not true. Have you tried analytical functions? You would be amazed how complex scenarios can be handled easily with them. And they are part of ANSI SQL standards. And db providers (Oracle etc) have taken the concept and improved a lot on it.
I think the anti-sql 'movement' has more to do with new (internet era) languages and their developers than so called 'lack' of features. In my limited experience, I have observed people coming from C (and such) background have no problem with sql, while java developers (and this is probably true for most developers working on web-based applications) are the worst kind when it comes to understanding even basics of sql. All they want is their objects.
I strongly believe that a competent programmer designing/developing system which includes data and data-storage should at least know normalization, indexes, and what does it mean by 3NF. Programming language is one thing, database is another, and knowledge of both is required to build a decent system.
Re:Flat Earth (Score:2, Interesting)
The RDBMS wasn't the first thing. If you bother to crack open a
text book and review the history, you might find a data storage
model that's better suited to your problem.
If your problem is "big", any solution is bound to appear overly
complex and too expensive. That's just how the solutions to big
problems tend to work out.
Re:A time and place for everything (Score:2, Interesting)
TIMESTAMPs are very useful for retaining temporal order of data. Those articles don't deal with timestamps at all. If they knew about TIMESTAMPs (like what PostgreSQL has), things could be a lot better.
Cumulative totals can be achieved as well, either within a SELECT statement (that sums from the starting date to the current date being processed) or with a stored procedure (which I'd prefer to ensure efficiency since I could just keep adding to an internal variable along the way instead of re-summing everything for each row returned).
Moving averages (with varying temporal ranges), Ranks, etc., can all be handled with a stored procedure using some fairly straight-forward "plpgsql" or other DBMS back-end language. With enough inventiveness, someone could probably figure out how to do the same with a SELECT statement (possibly needing sub-SELECTs).
Adding features such as "accreting" (see bottom of first article referenced at dbmsmag.com above) seems like a nice idea to me -- I'm certainly in favour of adding more features to an RDMBS since it will make it more useful to people. I worry, however, that those authors are expecting SQL to do more than simply return data sets (which their reporting code should be responsible for formatting for the user) since that would be missing the point of what SQL is for.
Re:A time and place for everything (Score:4, Interesting)
You've got a bunch of information with one shape, but you really need it in another shape.
The code is about dealing with the embarrassment of it all.
If you've got tabular stuff, and you so very often do, the relational model is fantastic.
If you're dealing with some kind of graph, you're hating life.
People coming in complaining that their aircraft makes a poor submarine are initially amusing, but become tedious.
Re:A time and place for everything (Score:3, Interesting)
>> Find all ancesters of G
> SELECT * FROM rows WHERE left [right hand value of G]
Now promote a tree node so it's before G...
Re:Yeah, so why are they better? (Score:2, Interesting)
But damn, Linq to everything else fucking rocks faces, and anyone who says otherwise seriously needs to buy a linq book and actually use the shit. Linq to XML/collections
Yeah, linq is handy with Entities, but you run into a whole messuh problems if you don't be careful with it. (And people who don't understand relational databases should stay away from it.)
At least, that is my opinion...but don't take it too seriously
Seriously misguided (Score:3, Interesting)
Trash SQL in favour of coding all your data access needs. Welcome back to 1973, guys.
It's not like [wikipedia.org] we could do parallel SQL in the 1980's. Or that you can't do parallel SQL in a compute cloud [vertica.com] today.
No, It basically seems like they don't want to pay software vendors any money for database technology. That's mostly what the arguments boil down to. Oracle RAC is very scalable, arguably easier to do at massive scale than MySQL - but you have to pay Oracle money. For an Internet startup, I can understand why you'd take your chances with "roll your own". For an enterprise... I think not.
I see SQL as more of a "protocol" (Score:3, Interesting)
SQL is going to be around for a long time, because it's useful as an "API" - as a protocol or layer of abstraction.
Programmers can write all sorts of programs in all sorts of programming languages and then use SQL to talk to the DB. If the DB changes a bit, they can often use the same SQL or modify it slightly.
You often see lots of grumbling and cursing in various companies because people actually end up doing that and companies end up with lots of stuff hooked to the DB - MS Access, perl, python, ruby, java, radius servers, openvpn, accounting and finance stuff...
They grumble, but the fact is the database is being used. The data has become more useful.
If you have your database locked up behind some new fangled protocol that only 20 people in the world know, it's not going to be as easy to do that - and often each bunch will start creating their own databases and you end up with a different mess, and a mess that's not as useful.
Having everyone use SQL to talk to the DB is not actually a bug it's a feature.
One man's impedance mismatch is another man's layer of abstraction.
Ditch SQL, not Relational DBs! (Score:3, Interesting)
SQL syntax sucks, is inconsistent, and just non-standard enough at its corners that it's completely annoying to write anything for more than one DB. Also lacks various features which logically _should_ be there, because of the relational back-end. SQL is a toy, and though I'm the guy everyone in the office turns to if they want to write a query that does more than SELECT * FROM sometable, that doesn't mean I have to like it.
But that's not the fault of relational databases. The relational logic makes sense, and we'll be seeing it referenced in countless "new ideas" that come along for years, just as ideas which Lisp already had in 1970 will be touted a new features on for the next millennium (you hear? PHP can do Lambda functions as of yesterday!)
SQL sucks, but SQL is NOT what makes something relational.
Cloudscape/Derby (Score:3, Interesting)
Derby 10.5, meanwhile, still has a tiny footprint, and can do most if not all of the SQL you will ever want for a typical Java application, along with features like the ability to do live backups, live table compaction from within the application while running, and now at last the ability to do cursoring in SELECT statements. Installation and configuration are simple.
I actually think that the actual problem is that we old C programmers actually learned programming and data structures, and as a result know a lot about the kind of problems for which SQL is well suited, while a lot of modern programmers learn a lot of theory about OO, but don't actually learn to program. Therefore, they have to try to reinvent wheels that were in fact designed in the 70s, and have no idea of what tools are available and how they map onto typical real-world application level problems.
Re:Tilting at windmills (Score:3, Interesting)
I don't know what kind of language people will be using for HPC programming in 20 years. I don't know the features it will have. I do know that it will be called Fortran.
I wouldn't be surprised if the same applies to SQL. The language has evolved a lot over the years, to better express different kinds of data. In 20 years time, I wouldn't be surprised if the most commonly-used subset of SQL is nothing like the subset currently popular, but I'd be surprised if the thing to replace SQL isn't called SQL.