Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
The Internet AI

RSS Co-Creator Launches New Protocol For AI Data Licensing 26

A group led by RSS co-creator Eckart Walther has launched a new protocol designed to standardize and scale licensing of online content for AI training. Backed by publishers like Reddit, Quora, Yahoo, and Medium, Real Simple Licensing (RSL) combines machine-readable terms in robots.txt with a collective rights organization, aiming to do for AI training data what ASCAP did for music royalties. However, it remains to be seen whether AI labs will agree to adopt it. TechCrunch reports: According to RSL co-founder Eckart Walther, who also co-created the RSS standard, the goal was to create a training-data licensing system that could scale across the internet. "We need to have machine-readable licensing agreements for the internet," Walther told TechCrunch. "That's really what RSL solves."

For years, groups like the Dataset Providers Alliance have been pushing for clearer collection practices, but RSL is the first attempt at a technical and legal infrastructure that could make it work in practice. On the technical side, the RSL Protocol lays out specific licensing terms a publisher can set for their content, whether that means AI companies need a custom license or to adopt Creative Commons provisions. Participating websites will include the terms as part of their "robots.txt" file in a prearranged format, making it straightforward to identify which data falls under which terms.

On the legal side, the RSL team has established a collective licensing organization, the RSL Collective, that can negotiate terms and collect royalties, similar to ASCAP for musicians or MPLC for films. As in music and film, the goal is to give licensors a single point of contact for paying royalties and provide rights holders a way to set terms with dozens of potential licensors at once. A host of web publishers have already joined the collective, including Yahoo, Reddit, Medium, O'Reilly Media, Ziff Davis (owner of Mashable and Cnet), Internet Brands (owner of WebMD), People Inc., and The Daily Beast. Others, like Fastly, Quora, and Adweek, are supporting the standard without joining the collective.

Notably, the RSL Collective includes some publishers that already have licensing deals -- most notably Reddit, which receives an estimated $60 million a year from Google for use of its training data. There's nothing stopping companies from cutting their own deals within the RSL system, just as Taylor Swift can set special terms for licensing while still collecting royalties through ASCAP. But for publishers too small to draw their own deals, RSL's collective terms are likely to be the only option.

RSS Co-Creator Launches New Protocol For AI Data Licensing

Comments Filter:
  • How will this benefit the share holders of our trillion dollar corporations?
    • How would it benefit anyone but the share holders of our trillion Dollar corporations?
      • Something I wrote on this in 2001 and posted to gnu.misc.discuss: https://groups.google.com/g/gn... [google.com]
        "... I definitely do not want to see a future world of only proprietary intellectual property where basically everything I want to do requires agreeing to endless licenses and royalty payments, such as described in [Richard Stallman's essay] "right-to-read". ...
        However, on a practical basis, living in our society as it is right now, any software develop

  • the goal was to create a training-data licensing system that could scale across the internet.

    Except people don't want that, only AI companies want that. Specifically, they want a "you didn't say, so we don't have to comply" excuse that will allow them to violate copyright.

    No normal website is going to agree to this bullshit.

    • Not even Al companies want that. They claim to have the technology capable for understanding content on the internet, and are thus able to understand the license associated with said content, and yet they take everything irrespective of the license.

    • No, indeed.
    • Reminds me of the old pay to spam proposals from the first dot com crash. Economics majors seriously proposed that we should change SMTP and MUAs so that spammers could pay (fractions of a penny) to put unsolicited messages in your inbox guaranteed to bypass the filters. You'd have to open the message to collect the cash. The market knows what's best for you, don't you know.

      I think those guys ended up at Facebook.

      • My memory was that the amount the spammers would have to pay was supposed to be set by the recipient, and they could set the amount they were willing to pay, then if you marked the email not spam they would get the money back, and if not then the recipient would get it.

        • That wouldn't actually reduce the recipient's labour scanning the messages. The reason we have automatic spam filters is precisely because humans doing the scanning is labour intensive. Getting paid for scanning still requires scanning, so remains labour intensive.

          In effect, it was proposing to solve the spam problem by forcing every human being to have a second gig on the side (or pay someone to handle it for them). Either way, the recipients are paying to receive unsolicited messages, with the option of

  • " the RSL team has established a collective licensing organization, the RSL Collective, that can negotiate terms and collect royalties, similar to ASCAP for musicians or MPLC for films"

    Anyone who has looked at organizations like ASCP/BMI and the like (most countries around the world have similar copyright licensing collectives) can see how well that works out for the average musician/author/whatever.

    The middleman (the collective) takes their cut, then the remainder is doled out using an arbitrary formula th

    • by pjt33 ( 739471 )

      So if a creator can sign up with one of these collectives (a significant feat in itself) then their royalty cheque may be 37 cents at the end of the year.

      You omitted the bit where they've paid a thousand times as much to play their own music in their concerts. Truly an impressive scam.

  • AI is so much smarter and betterer than the human according to its peddlers, let it come up with its own stuff.

  • by oumuamua ( 6173784 ) on Wednesday September 10, 2025 @11:30PM (#65652596)
    In the current skewed capitalist system, Taylor Swift accounted for 1.79% of the entire music market in 2023. https://www.reddit.com/r/Taylo... [reddit.com]
    Now add in a few hundred more superstars and you have the typical case where 1% of the people get 99% of the wealth. Meanwhile the vast majority of authors, musicians, sport players, artists, etc. scrape by on peanuts. Instead of fighting for your peanuts, why not fight to ensure AI ushers in a post-scarcity future?
    • by mccalli ( 323026 )
      Because for music, we're in a post-scarcity future. The world is not short of new music, and the tools for producing it get better and better and better. There's no shortage of people wanting to write, you can reasonably easily self-publish (and on a completely unrelated note...check out my two albums and my singles...)...there's no scarcity here.

      The problem isn't availability. The problem is gaining an audience.
  • by topham ( 32406 )

    Have you ever looked at the specs?

    Not something I'd claim ownership in, even if I had contributed.

  • No, indeed, just because you have a lot of money--it does not mean you can steal everything from everyone, and set your own penalty.
  • We do not need anyone underselling people's legal rights.
  • Now that Google has escaped justice and has not more fu*ks to give, watch them require sites to be in the AI or they don't get in the index. Worse yet, Google will make no demarcation between search and AI. We are basically here already.
  • by madbrain ( 11432 ) on Thursday September 11, 2025 @09:50AM (#65653390) Homepage Journal

    Not at the robots.txt level, as this gives all the licensing power to the entity that owns the server hosting that file.

    Instead, the protocol should allow for AI licensing at the individual document level, so that content creators can choose their terms. This would for example include Youtube videos, blog posts, news articles, social media posts, etc. In other words, micro-licensing.

  • by whitroth ( 9367 )

    Published emails from AI companies' CEOs said "it's just too hard to deal with licensing all those book, use the stolen files."

    You think they'll have their scrapers look at robots.txt?

  • "aiming to do for AI training data what ASCAP did for music royalties"

    That is to say, extort the people consuming the data, keep most of the money, and screw the people who generate the data. I'm not saying there shouldn't be some better definition of how content creators should be compensated by the AI trainers, but modeling after ASCAP is about the worst idea I've heard.

  • Nobody will give a damn about conditions given as RSL as long as they deem copyright exemptions to be valid. The standard will become relevant when (or if) someone decides that copyright IS applicable and the crawlers need to obtain a license. Then it will be hard to negotiate with every little website a license and a standard will be needed. But without anything to enforce obtaining a license nobody will ask for what conditions for a license you offer.

EARTH smog | bricks AIR -- mud -- FIRE soda water | tequila WATER

Working...