Yahoo Releases Largest Ever Machine Learning Dataset To Researchers (tumblr.com) 41
An anonymous reader writes: Yahoo Labs has released a record-breaking dataset containing 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data is intended for research initiatives in artificial intelligence, including user-behavior modeling, collaborative filtering techniques and unsupervised learning methods.
Garbage in... (Score:2, Insightful)
Re: (Score:2)
because, you know, typical humans are intellectual powerhouses.
Re: (Score:2)
because, you know, typical humans are intellectual powerhouses.
Yup - which is why I often suggest to some of our dumber users to go to Yahoo and comment there.
The whole issue with website commentary is that most postings are based on disagreeing with whatever was being commented on. In Yahoo's case, someone gets shot and killed, probably 90 percent of the comments are about anti-Gun control. A very small number is about any sympathy for the dead person's family.
A negative story about Donald Trump is immediately responded to by supporters who believe that our cons
Re: (Score:2)
Re: (Score:2)
>
So you will see a lot of backlash in their comments, NOT just to be contrary though, but to show just how out of touch the SJW/Regressive faction of the left is to the mainstream, its pretty fascinating from a cultural point of view.
You are proving my point. You are seeing and replying to stuff that pisses you off, and using that as a universal attribute.
People who might have a liberal bent will probably go to conservative looking articles to troll, and those with a conservative or Neocon bent will probably go likewise to any story that sounds liberal and troll there.
I mostly go to the sports section, particularly NHL Ice Hockey. As A Pittsburgh Penguins fan, the comments after every story are largely fans of other teams whining
Re: (Score:2)
Only if student or faculty at university... (Score:2)
Otherwise no access is granted. Which means I'll have to wait a few hours for a torrent to appear, fine...
Re: (Score:1)
Troll, incorrect. I'm downloading my first 423Mb chunk now. Requires Yahoo! account, which I just created.
Re:Only if student or faculty at university... (Score:5, Informative)
Wait wait wait... mod me down... it made me sign up, then it made me fill more forms, then agree to alsorts of EULA's, THEN it demanded a university email address.... Sorry everyone. My download is stopped. And I just corrected the GP, wrongly. Sorry! (ducks and prepares to lose karma)
Re: (Score:2)
At a guess, it's because they've provided the data set for academic research purposes.
It's their data, it's a reasonable restriction and if there isn't already a torrent available with it anyway then I'll be surprised.
Re: (Score:2)
"is intended for research" (Score:2)
Something Useful/Relevant?! (Score:2)
Holy crap! Yahoo released something actually useful and arguably innovative? I'm genuinely surprised.
This could be an interesting direction for Yahoo.
ML is the bee's knees.
PS-I just looked up the etymology on 'the bee's knees' and it's moderately interesting:
https://en.wiktionary.org/wiki... [wiktionary.org]
Re: (Score:2)
>> What kind of news company is Yahoo when they have to make their own press releases on Tumblr.
I think Couric was busy covering the Kardashians' newest pet.
Somewhat outdated (Score:3)
Haven't been to yahoo since. My reasons for going have been either A) removed; or B) made untrustworthy.
Icing on the cake? For about a week I kept trying to get the comics page, hoping it was a mistake. Then my google newsfeed told me that yahoo had deliberately deleted it. Not yahoo news, google news. Good job, yahoo.
Re: (Score:2)
Forbes and Yahoo seem to be the leading attack point for virus entry. I consistently read about, so you might be very lucky
and to cite sources :
Forbes https://www.hackread.com/forbe... [hackread.com]
and yahoo's https://blog.malwarebytes.org/... [malwarebytes.org]
SideNote: Yahoo's finance page was considered on of the best until recently ( sorry no source to cite ), so I am going to guess that a new attack point will show up in due time
Re: (Score:2)
Um.... that DOES sound like a good idea. Maybe not for Yahoo stockholders, but for anyone who wants free video without those annoying ads...
Not going to be anonymous for long. (Score:2)
My evil AI machine learning algorithms should have this problem licked post haste.
how_not_to_build_a_news_site.zip (Score:2)
The file is named "how_not_to_build_a_news_site.zip"
I'm guessing the university email address requirement is because they don't want someone using the data for commercial purposes, and ending up becoming as successful as Yahoo currently is...
It's nice of them to look out for us like that.
I humbly request the wisdom of the Cube and Xalton (Score:1)
How many SOMADs will this dataset create? I shudder to contemplate what pure depravities will be distilled from these "interactions."
"Anonymized Data" (Score:2)
>> 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data
Which will be DE-anonymized in 3...2...1...
Re: (Score:2)
Yeah, I recall when AOL released some anonymized data about 10 years ago, and it was de-anonymized pretty quickly.