Facebook Crawler Speaks Back 317
Last week we ran a story about Facebook suing to get a crawled dataset offline. This week we have a bit of a
response written by Pete Warden, the guy who actually did the crawling. He followed robots.txt, and then Facebook's lawyers went after him. It's actually a quite interesting little tale and worth your time.
Pretty naive (Score:5, Insightful)
Did this guy really think he could just give away the data that Facebook sells (or intends to sell) to third parties and NOT have them sue him for it? It's no secret that the business model of most of the social sites and big search engines factor in the massive amounts of data they collect on users as a major corporate asset, to be used internally for data mining and also sold (supposedly after being anonymized) to advertisers and other third parties. It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Is the guy in the right? Probably. Would he have a case? Probably. Does either of those facts matter if he doesn't have the big $ needed to hire lawyers and fight through several courts? Nope.
Re:Pretty naive (Score:5, Insightful)
If he's in his right, but not having as much money as a big cooperation means he'll lose anyway, then your U.S. court system is broken. Please fix it.
Re:Pretty naive (Score:4, Insightful)
Re: (Score:2, Funny)
Re:Pretty naive (Score:5, Informative)
Re:Pretty naive (Score:5, Insightful)
Want to fix the ELECTION laws, while not breaking the First Amendment Rights to Free Speech? It is really quite simple. One simple rule.
Only People (persons, not legal entities)who are eligible to vote can donate to political campaigns.
This doesn't deny corporations from running ads, they just have to do it on their own, and out in the open where everyone can see who they are telling people to vote for. They have to buy their own ads to tell people to vote for Harry Reid or Mitch McConnell.
This also goes for Unions and all other organized groups. Make them buy their own ads for their own causes.
Simple rule, clear, concise, straightforward and solves all sort of problems with current campaign laws, without any bias towards or against anyone.
AND that is why it won't ever be implemented.
And I'm sure that there is someone that is going to be upset because their favorite group won't be able to donate money to a candidate/campaign while at the same time restricting anyone that might oppose them (it) from doing likewise at the same time.
Re: (Score:2, Offtopic)
Re: (Score:2, Offtopic)
Only People (persons, not legal entities)who are eligible to vote can donate to political campaigns.
So no first amendment rights til you turn 18 then?
Re: (Score:3, Insightful)
You can speak all you want, post blogs, bitch on slashdot, just not be able to donate. Donating is NOT speech and should not be covered under the 1st amendment.
Re: (Score:3, Informative)
This doesn't deny corporations from running ads, they just have to do it on their own, and out in the open
Why out in the open? The Supreme Court has held for a very long time now that the right to free speech means the right to anonymous speech, especially political speech. Having explicitly granted corporations the right to free speech means that they no longer can be required to identify themselves, especially with regards to political speech.
AND that is why it won't ever be implemented.
That was what th
But that can be fixed (Score:3, Informative)
The Supreme Court has held for a very long time now that the right to free speech means the right to anonymous speech, especially political speech.
Yes.. for people. But not necessarily for organizations.
Of course, making such a distinction will require reversing a very old (and recently reinforced) precedent in US law, where organizations have personhood. Probably requiring an amendment. So it won't happen.
Re: (Score:3, Insightful)
Re: (Score:3, Insightful)
Re: (Score:3, Insightful)
The real problem is that people are being influenced by ads. You shouldn't base your vote on how noisy somebody is, that's terrible and that's what we should be focusing on.
In an ideal world, McDonalds spending tons of money on political ads would be useless because people wouldn't pay attention to them.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Informative)
Re: (Score:2)
So, you already gave up.
"We'd like to have a fair system, but hey, we can't win anyway."
Your government is a good government and a true representation of the people that live in the USA. The US government doesn't fix this problem, because its people just don't care.
More on topic: I believe that regarding sites like facebook, we're going through a phase of "awareness" where the general public has no clue of how much this website knows about them.
And therefore I salute the guy who tried to screw Facebook and
Re: (Score:2)
Re: (Score:3, Insightful)
Nope I haven't gave up, I'm hoping for an uprising. problem is that 99.997% of all Americans are placated with their cable tv. Fat dumb and happy is the American way. Almost nobody here will even inconvenience themselves for "freedom" Then we have these "tea party" idiots. loudmouths simply looking for 10 minutes of fame who really have no desire to protect freedom.
Re:Pretty naive (Score:4, Insightful)
Re:Pretty naive (Score:4, Interesting)
If they wanted their freedom, they'd be crying out over the mass imprisonment of Americans for victimless crimes. They'd also realize that being able to acquire health insurance when you're between jobs, or starting your own consulting business makes you MORE free. Nothing I've seen suggests that they actually care one bit about freedom.
Re:Pretty naive (Score:4, Insightful)
[...] where were they when Bush and the Republican congress took Clinton's balanced budget and ran up the biggest deficit in history? [...] Where was their cry to vote against incumbents when the Republican held the majority and were running up the debt?
So because they are late you're going to discount their argument? Doing the right thing in spite of it of it being timely is still better than not doing the right thing because you think you may have taken too long.
Re: (Score:2, Interesting)
Then we have these "tea party" idiots. loudmouths simply looking for 10 minutes of fame who really have no desire to protect freedom.
I suppose you attended one of their rally's and spoken with a few of them about their views before making that judgment. Oh whats that? You didn't? Oh, you heard from someone that tea partiers are all ultra conservative nutbags. You saw on tv how redneck they all look? right.
Two serious points I have to make. One is that these "tea parties" have become the only outlet that many conservatives have for expressing ourselves politically. Many of us feel totally disillusioned by Republican party, and are
Re: (Score:3, Informative)
Re: (Score:2)
Yes, people used to do that to fix systems.
Re:Pretty naive (Score:5, Insightful)
WE have the best court system money can buy!
Re:Pretty naive (Score:5, Interesting)
matter if he doesn't have the big $ needed to hire lawyers
Thank you. I ran an open source project for a few years and came home one night to find to find that my webhost had taken its site down after being contacted by a company with a similar name. The company claimed they'd tried to contact me, explained how my project was causing them harm, but the simple fact of the matter was that my project's name did not infringe on theirs.
I ended up renaming the project. I've told the story dozens of times, and the response is always the same. "That's BS! They can't do that! Go to court!" People don't understand that $20 a month in unmanaged Google ads doesn't cover lawyers the same way that company's actual paying customers do.
Re:Pretty naive (Score:5, Interesting)
American justice might be blind, but it know what money smells like. One more reason why we need judicial reform to prevent abuses like this. Of course fighting it wouldn't be worth it, as even if you won, your "winnings" would have only been the ability to continue using the name. Another good example is http://www.nissan.com [nissan.com], where he actually fought and won, at a great price. His name is Nissan, and his computer business and name existed back when the cars were called "Datsun", but they sued anyway. This is another one of those "We are bigger than you, thus more deserving of the domain name than you" cases.
Re: (Score:3, Insightful)
Re: (Score:3, Insightful)
And then if your lawyer loses the case, you get to pay for the company's team of 20 $1000/hr lawyers?
Re: (Score:2)
Right. Consequently, the courts are the exclusive preserve of the rich.
US courts can and sometimes do award legal fees to defendants who can prove that a suit brought against them was frivolous. IMHO they should do it more often, but it does happen.
Re: (Score:2)
Declaring bankruptcy when in debt to the tune of 50K and declaring bankruptcy when in debt to the tune of a million makes little difference.
On the other hand losing 50K or losing a million of your personal fortune are 2 very different things.
Loser pays hurts he rich far more and it's vastly superior to the current US system.
It also means lawyers are far less willing to work on bullshit cases suing poor people unless they have a contract ensuring they get paid anyway even if they win and the poor person gets
Re: (Score:3, Informative)
Re: (Score:2)
Such insurance is available in the USA.
Re: (Score:2)
yea, but I'm betting you need to have the insurance before someone comes to sue you.
If you knew that someone was going to come after you for the name of your open source project you probably wouldn't have used that name in the first place.
Its tough to justify paying for insurance to ensure your own rights, at least before you have experienced being the little guy in a lawsuit and by then it is too late.
His startup "Mailana" is "Anal I Am" in reverse (Score:2, Funny)
FTFA:
All Facebook is doing is nailing has "anal".
Re: (Score:2)
Re:Pretty naive (Score:5, Interesting)
It takes a babe in the woods to think he can just waltz in and take that away with a "But your robot.txt didn't say I *couldn't* do it" defense, without expecting a big legal fight.
Yes. Apart from anything else, he's just about entirely missing Facebook's point. Facebook don't give a shit how he accesses their site; this has nothing to do with the fact that he spidered it in a way that their robots.txt file allows, and everything to do with the fact that he was *redistributing their data* without consent.
Now, the question becomes whether what he was distributing falls under fair use. This is a very tricky question, and has nothing to do with how he acquired it.
Re: (Score:3, Informative)
> *redistributing their data*
No one owns data. Data is not protected by copyright in the US.
Re:Pretty naive (Score:4, Insightful)
It aint their data, it's the owners data. they are simply hijacking ownership.
Re: (Score:2)
> It aint their data, it's the owners data.
Under US law data cannot be owned.
Re: (Score:2)
Just as a point of clarification - Does FaceBook claim to own the copyright to information entered by the users of their site?
If not, the strongest claim FB could make would seem to boil down to theft of service ("stealing" their bandwidth via his spider). And the very fact that they have a robots file defines what they consider fair game in that regard (if they allow Google and Yahoo etc to do it, tough to say
Re:Pretty naive (Score:4, Informative)
Actually, I believe all of this data was publicly accessable, even without an account. This is part of the updated privacy controls (which set most everything to public by default if someone never adjusted their privacy). Thus it seems a ToS would never have applied, though FB obviously wants it to.
Re:Pretty naive (Score:5, Interesting)
Facebook got lucky - the data was gathered by just an average Joe without the backing to fight a legal battle. Had it been someone significantly larger, the result may have been "go ahead and sue - we'll see you in court." And, quite frankly, I'd be shocked if Facebook would win that sort of battle. And that's a battle that Facebook decidedly does not want to lose - it would mean the end of their business...
I'd be curious to learn if that information is still available (as I am certain it is...) because someone/some company might decide that's pretty valuable _PUBLIC_ information and might, just might, decide they're willing to battle Facebook's legal team for it... Expensive legal battle over very valuable marketing data... If you have the resources for the fight, it might be a fight worth waging...
Facebook may have gotten lucky once but they may not be so lucky next time...
Re: (Score:2)
that's a pretty dead on first post. He absolutely has a case though, in fact it's quite solid. It's public information. If it was private would be another story. I do agree he probably doesn't have the money but the gamble is the fact that if it's solid enough the judge might prevent facebook's lawyers from going after him - aka ANTI-SLAPP or equivalent.
I have no idea if that would happen or not, but it's certainly possible. Depends on how clued in the judge is to the interwebs.
Yea he could. (Score:2)
because, they put a robots.txt file in their root folder which allowed him to crawl everything.
its facebook's fault.
Re:Pretty naive (Score:4, Insightful)
Ballsy. (Score:3, Funny)
Stupid, but ballsy. Gotta give credit where it's due.
Re:Ballsy. (Score:5, Insightful)
Not really ballsy considering he didn't actually let Facebook's challenge of "The only legal way to access any web site with a crawler was to obtain prior written permission" go to court. Maybe he should have gone to the EFF for help as the repercussions of a judge actually deciding in Facebook's favor would have been devastating to the web.
Re: (Score:2)
I meant it was ballsy to assume a beast as huge as Facebook would let him do this.
I can understand not wanting to go bankrupt, but I agree with you and others...he likely could have found someone willing to work on this case at no charge. Still, the guy seems quite talented and capable...I'm sure he will find a way to get the professional recognition he deserves.
Re: (Score:2)
I'm bummed that Facebook are taking a legal position that would cripple the web if it was adopted (how many people would Google need to hire to write letters to every single website they crawled?)
I was just thinking the same thing... that looks like an excellent dragon for google to slay to the benefit of the entire internet community. And they're in a good position to do it. One way or another, someone's going to have to do it. This is a battle that either google is going to go looking for, or that is
Mark Zuckerberg (Score:5, Interesting)
If he is the face of the next generation entrepreneurs, then god saves the industry.
Re: (Score:2)
He's a CEO. Being a sociopath is pretty much a requirement for the job.
Re: (Score:2, Offtopic)
Capitalizing god's name means applying a human characteristic to an omnipotent and all-powerful force...in other words, it's as silly as applying one sex or the other to god.
Re: (Score:3, Funny)
If he is the face of the next generation entrepreneurs, then Bugs Bunny save the industry.
Happy now?
Re: (Score:2, Troll)
"If he is the face of the next generation entrepreneurs, then [insert imaginary friend(s)] save the industry"
There. Fixed that for ya.
Annoying having someone tell you about your own beliefs isn't it?
Re: (Score:2)
There.
Re:Mark Zuckerberg (Score:4, Informative)
I never read much into it, but Slashdot covered this story a while back: Facebook Founder Accused of Hacking Into Rivals' Email [slashdot.org].
Publicity (Score:3, Interesting)
The guy's work looks somewhat interesting. I don't see why he can't just make it a facebook app or something that just happens to crossover onto the rest of the internet as well, maybe that would have helped him fly under their radar if it was seen as something that enhanced facebook.
But seems like his problem all along was lack of publicity, which /. will surely help with.
That said, call me old-school, but I've had more fun with things like ircstats [humdi.net]. So I'm mostly still waiting for this new social crap to catch up.
Arachnophobia (Score:4, Insightful)
I might be alone here but spiders revolt me to a point where I simply respect them and leave them alone.
But that said, Google operates a spider, pretty much. So we have to look at any potential spider on the internet like we might look at Google. If he followed the Robots.txt as Facebook set it up and he didn't try to misunderstand it, then there isn't anything they can do. Although, I'm pretty sure the Facebook EULA says you can't spider them so he's SOL anyway if that's the case. This should be a long and drawn out case unless there is a settlement.
Facebook is ripe. People put up EVERYTHING about themselves on there. I never accept a friend request unless I know the person and I offer a challenge question often. If it's not responded to adequately, I simply ignore them. But in the end there isn't much you can do. If you put it on Facebook -- consider it public, like if it was in the phone book.
Re: (Score:2, Informative)
Disregard this, he settled.
Re: (Score:2)
I might be alone here but spiders revolt me to a point where I simply respect them and leave them alone.
Spiders in my house are OK; spiders in my bathroom must die. Incidentally, this is why I don't run google desktop :D
If he followed the Robots.txt as Facebook set it up and he didn't try to misunderstand it, then there isn't anything they can do.
Would that this were true.
This should be a long and drawn out case unless there is a settlement.
Too true.
Re: (Score:2)
Yes it is, but spiders in my bathroom is a more specific declaration and so will override the spiders in my house clause.
Re: (Score:2)
I may be going out on a limb, but I doubt that it matters if he operated the spider in a legal manner. Selling data from Facebook isn't the same thing as the attempt at fair use that Google engages in.
Re: (Score:3, Funny)
Google sells our information by what we like. They do it in a way that somewhat protects our privacy and it's part of their service. Gmail targets adds directly to you based on keywords in your emails. If you had enough money you could know what people are talking about by how the adds played out. Therefore there is no real privacy on Google email, and Google reads our emails.
Google collects all kinds of websites and offers search. They build stats and sell off residual information based on information coll
Re: (Score:2)
"Therefore there is no real privacy on Google email, and Google reads our emails."
Actually, most email systems are setup that there is no "Privacy" at all in any of them. And if you mean "read" email, having computer automated bots skim emails looking for key words and phrases, then every time you do a search, set up a filter or otherwise, then your computer is reading ever email you ever wrote "OH NOES"
And the Post Office, they read all those mail envelopes they deliver to your house! GASP THE HORROR. And
Re: (Score:2)
Yahoo has my email, does that mean they own my email account? I don't think so. When people say that somehow Facebook owns this data it is a load of crap. They no more own that data than Google owns the content of the web sites they have crawled. They provide a place to host information, they provide a way to relate users to each other and users have a way of sharing it. In fact all the users of the system enter the information, not Facebook. So at what point do you people believe that this magically bec
Re: (Score:2)
Because their TOS says that they own whatever you put into it. As far as I know, that would stand up in court.
Re: (Score:2)
No one owns data under USA law. Their TOS may get them some sort of license for any copyrightable content (creative expression on Facebook? I suppose there is some...) but it very unlikely that it can get them ownership of the copyrights: that requires an explicit instrument of conveyance.
Re: (Score:2)
Ok. Sure. But you still illustrate the point. The "Facebook Crawler" doesn't have permission of Facebook's users to sell this copyrighted data. Facebook does.
Re:Arachnophobia (Score:5, Informative)
From the Statement of Rights and Responsibilities [facebook.com], Section 3 "Safety":
2. You will not collect users' content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.
The question then becomes how enforceable is the agreement? Sure, if he has an account Facebook can close it, but if he is just accessing Facebook without an account do they have a case? Last I saw you can browse parts of profiles without being logged in, and without ever agreeing to any terms.
Re: (Score:2)
> ...if he is just accessing Facebook without an account do they have a case?
No.
Re: (Score:2)
There's something I don't understand (Score:3, Interesting)
Assuming what he did produces a valuable result.
If it's defensible in court by an entity with enough cash or lawyer might, why is there no such entity doing the same thing and then fighting facebook in court?
If it isn't defensible in court, why does it matter that he didn't fight because he didn't have the money?
Re: (Score:2)
obviously this is abusive (Score:5, Insightful)
this is what the guy should do:
1. engage the lawsuit
the downside is financial exposure. so incorporate your work in such a way that it can't hit your personal finances. the upside is massive exposure. you will achieve some level of fame: the guy who finally gave the robots.txt convention a legal status quo. this will help you professionally, as well as make your life story
2. whine to google
you are completely right that google shouldn't have to get permission every time it wants to crawl the site. therefore GET GOOGLE TO DEFEND YOU
Re:obviously this is abusive (Score:5, Interesting)
brilliant (Score:2)
mod +6
Re: (Score:2)
how about if he rejigged his crawler to get the data from the google cache instead? So he'd never get anything from facebook or enter into any implied agreement with them.
That's a good one. And he though he had a problem with Facebook lawyers? I hear that Google lawyers eat Facebook lawyers by the bowlful.
Re: (Score:2)
When have you ever had to agree to a TOS to view a cached page from google?
Re: (Score:2)
Either that, or there's another tactic that you could use.
Do step 1.
But then, instead of doing step 2, get a crap lawyer. Intentionally lose the case.
Then, Google will lobby Congress to push through a law legalizing robots.txt, which will trump the case law.
Re: (Score:2)
Then, Google will lobby Congress to push through a law legalizing robots.txt, which will trump the case law.
Only if the case hinges entirely on robots.txt
The real "infringement" isn't crawling to collect this data, it's actually collecting it. If you were insane enough to collect usage, friendship network, and other statistics by hand-clicking Facebook pages and tallying numbers with a pad of paper and a pencil, Facebook would still be down your throat.
Those numbers, in Facebook's ego-inflated universe, b
Re:obviously this is abusive (Score:4, Insightful)
Re: (Score:3, Insightful)
Welcome to the new corporate feudal system where the top 2% of the people own half the resources and the bottom half of the people own 1% of the resources (resources includes the law in this case). Do not offend the the corporate liege lords, for they have unlimited legal irresponsibility and a virtually unlimited supply of lawyers and judges in their pocket.
Re: (Score:2)
Good lord, I hope not. There's two sides to that coin.. if you're going to give legal clout to "it wasn't listed in robots.txt therefore it's legal to index it" you give the same legal clout to the notion "it was listed in robots.txt, so your crawler which disregards it is in violation of legal statutes". Next they would suggest that the pages behind the little "Terms of use" links hidden away somewhere should get legal clout as well.
rob
Re: (Score:2)
so incorporate your work in such a way that it can't hit your personal finances
This is not going to happen. Incorporate away, but when push comes to shove they'll be able to 'pierce the corporate veil' make the guy personally liable and take everything he owns especially if one side has expensive lawyers and the other side has few or none. You have to have a legitimate business in the eyes of the court plus fulfill other requirements to have that protection work.
"Don't be evil"... (Score:2)
I suspect this was totally legal (Score:4, Interesting)
Ooo, deja vu (Score:5, Insightful)
Pre-Facebook, Zuckerberg created a site that let Harvard students compare each other, a bit like Hot or Not. Obviously nobody was going to go to a site that wasn't populated with their classmates, so he basically crawled the websites of the various residential houses that put their students info online (but behind passwords and auth) and copied it into his own site.
He actually got into a fair bit of trouble for this, and ended up being sent to Harvard's ad-board for discipline (I think he got put on probation, but I'm not entirely sure).
The key difference here is that this guy actually did everything by the book and followed robots.txt, whereas Mark Zuckerberg didn't.
Facebook's privacy policy (Score:5, Informative)
I'd also like to point out in their terms [facebook.com]:
Re: (Score:2)
"we may not have control over what they do with it."
Haha - guess Facebook has everything covered. You can't sue Facebook if your info gets into the wrong hands. But the info can't get into the wrong hands because Facebook won't allow it. Unless they give it out. That's why we shouldn't use Facebook.
Privacy ... (Score:4, Funny)
Facebook did not sue. (Score:3, Interesting)
Threats of legal action are not a lawsuit. He didn't get sued. He got bluffed. I don't blame him for caving in, but he shouldn't mislead people by referring to the receipt of threats from lawyers as being sued (this is the sort of error I expect from the Slashdot editors, of course).
All the more reason why copyright should go (Score:3, Informative)
imagine - you put a robots.txt in your root directory, allow crawlers to crawl everything, and then sue those who crawled your stuff.
facebook is not even an established, long standing part of the big capital elite, they are startups, who are from the new generations and from the new tech age.
but see, when they became big capital, they are similarly trying to stomp down others by their copy'right' and big money, despite they come out from our own lot in the recent decade.
this shows, regardless of generation, or culture, having copyrights and big capital eventually cause intellectual feudalism favoring the rich elite, EVEN if they are in the wrong.
Loser pays... (Score:3, Interesting)
Missed Point (Score:3, Interesting)
I've read through the visible comments, and all of them seem to miss the point: the legal system has just operated in reverse. Rather than preventing the stronger entity from stealing from the weaker, it was actually the means by which the stronger DID the stealing.
Here, so far as I can tell, is what happened: The guy pulled a bunch of PUBLICLY AVAILABLE data from Facebook, connected it in new and interesting ways, offered to sell the product of his hard work to other entities, and then had to delete it all because Facebook got antsy and sued him, and he didn't have the money to defend himself. And of course, Facebook will now take the same ideas, and build up and sell their own datasets.
This is akin to bullies using school rules to steal homework from nerds and turn it in under their own names.
Re: (Score:3, Interesting)
Re: (Score:3, Informative)
It's important to realise that media / arts can be copyrighted, as they are ostensibly physical products (although that tends to include digital media these days) that have been created by someone, so your MP3 can be copyrighted and the rights holder protected.
Your name, address and phone number are NOT copyrightable, because they are not considered artforms with a physical manifestation, they are merely facts.
I am a human male, 42 years old, living in Philippines. These facts can NOT be copyrighted.
Now app