Yahoo Releases Largest Ever Machine Learning Dataset To Researchers

An anonymous reader writes: Yahoo Labs has released a record-breaking dataset containing 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data is intended for research initiatives in artificial intelligence, including user-behavior modeling, collaborative filtering techniques and unsupervised learning methods.
  • Garbage in... (Score:2, Insightful)

    by Anonymous Coward
    Garbage out. Enjoy your 1.5tb of crap.
  • Otherwise no access is granted. Which means I'll have to wait a few hours for a torrent to appear, fine...

  • best of intentions, road paved.
  • Holy crap! Yahoo released something actually useful and arguably innovative? I'm genuinely surprised.

    This could be an interesting direction for Yahoo.

    ML is the bee's knees.

    PS-I just looked up the etymology on 'the bee's knees' and it's moderately interesting:
    https://en.wiktionary.org/wiki... [wiktionary.org]

  • by Snotnose ( 212196 ) on Thursday January 14, 2016 @09:56PM (#51304635)
    For the last couple years I've been hitting their comics page daily, from there I'd sometimes go to finance and then regular news. Last month they nuked the comics page, and when I went to the finance page they had one of those annoying floating opaque ads that want you to click in them to make them go away. No thanks.

    Haven't been to yahoo since. My reasons for going have been either A) removed; or B) made untrustworthy.

    Icing on the cake? For about a week I kept trying to get the comics page, hoping it was a mistake. Then my google newsfeed told me that yahoo had deliberately deleted it. Not yahoo news, google news. Good job, yahoo.
    • Forbes and Yahoo seem to be the leading attack point for virus entry. I consistently read about, so you might be very lucky

      and to cite sources :

      Forbes https://www.hackread.com/forbe... [hackread.com]

      and yahoo's https://blog.malwarebytes.org/... [malwarebytes.org]

      SideNote: Yahoo's finance page was considered on of the best until recently ( sorry no source to cite ), so I am going to guess that a new attack point will show up in due time

  • My evil AI machine learning algorithms should have this problem licked post haste.

  • The file is named "how_not_to_build_a_news_site.zip"

    I'm guessing the university email address requirement is because they don't want someone using the data for commercial purposes, and ending up becoming as successful as Yahoo currently is...

    It's nice of them to look out for us like that.

  • How many SOMADs will this dataset create? I shudder to contemplate what pure depravities will be distilled from these "interactions."

  • >> 110 billion interactions from 20 million Yahoo News users in 1.5TB of zipped data. The anonymized data

    Which will be DE-anonymized in 3...2...1...

    • Yeah, I recall when AOL released some anonymized data about 10 years ago, and it was de-anonymized pretty quickly.

