Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Social Networks AI

Bluesky Says It Won't Train AI On Your Posts 50

Bluesky, the social network surging in popularity, says it has "no intention" of training AI tools on users content. "The social network made the announcement on the same day that X (formerly Twitter) is implementing its new terms of service that allow the platform to use public posts to train AI," notes TechCrunch. From the report: "A number of artists and creators have made their home on Bluesky, and we hear their concerns with other platforms training on their data," Bluesky said in a post on its app. "We do not use any of your content to train generative AI, and have no intention of doing so." The company went on to note that it uses AI internally to help with content moderation and that it also uses the technology in its "Discover" algorithmic feed. However, Bluesky says "none of these are Gen AI systems trained on user content."

Bluesky Says It Won't Train AI On Your Posts

Comments Filter:
  • by Baron_Yam ( 643147 ) on Friday November 15, 2024 @07:27PM (#64949211)

    Everyone else with a scraper will train their AI on your posts, and good luck catching them at it. And if you do, good luck trying to reverse that or get a judgement against them.

    Anything you post on the Internet as an individual is available for a corporation to steal and there is almost nothing you can do about it. And they WILL steal it.

    • by dfm3 ( 830843 )
      If it can be seen it can be copied, yes, but there's a world of difference between someone scraping a service, and that service using your raw data via the back end along with all the additional metadata and account data that's associated with it.

      FWIW, there's been a mass exodus of artists and creative types from X/Twitter over to Bsky lately, along with their followers, causing their numbers to grow quickly from the 6 digits over a million and now into the tens of millions in just a few short months, and
      • The piece that is currently missing and needs to be created by the community is a way to automate DMCA takedowns on AI owning corporations. The tool needs to communicate with the AI, save the output of the interaction, and run it against a community database of artworks and forum comments with attributions. If a close enough match is found to suspect infringement, the artist is automatically alerted and a DMCA takedown is generated, ready to be sent out if and only if the artist gives the go-ahead.

        This AI

        • That is a brilliant idea - automate search for infringement. But have you thought what would happen if you do that? Every text written anywhere could be AI generated, we don't know. So we'd have to test all your messages as well, you might be secretly infringing copyright with AI. But that means you could also be falsely accused by incidental similarity. Prepare to get sued or forever shut up from now on!
      • by Bongo ( 13261 )

        I have the impression that there is always a mass exodus to a new service. Only time will tell.

      • Don't you agree it is silly to accuse LLMs of infringement? They are the worst tool possible for that task. Not large enough to properly memorize, expensive to use for long form texts, and we already have internet for exact copying and its already faster and free.
      • ATProto is a fully open protocol. All the data in the network can be downloaded. It's not even hard - just run an ATProto relay, and every PDS will push all their data at you in realtime.

        They're looking into ways to try to add protocol-level privacy, but it's nontrivial due to the distributed architecture. But I suspect they'll get it eventually.

    • Anything you post on the Internet as an individual is available for a corporation to steal and there is almost nothing you can do about it. And they WILL steal it.

      While I get that it's popular here to bash the evil corporations for any reason you can fathom, even when it doesn't make sense to do so. I can't help but wonder...this is stealing...how...? Reminds me of stuff like this:

      https://www.youtube.com/watch?... [youtube.com]

      You know you're an asshole when, any time somebody does something that pisses you off, you feel you need to invent a law against it. Sure, I'm not a fan of relying on AI models that rely on data harvested from shitposting trolls like me. But at the same time

    • Fully expecting them to change their terms of service to allow AI training on your data once they get a large enough data training set.

    • "Steal" is the wrong word. What they do is to load all the text in the world and put it in the parameters of a model. But the thing is the model is 1000x smaller so it can only record abstract patterns not exact books. Each text generates a small update, and updates compose up, so that "water is wet" idea comes from a million sources you can't tell which one influenced the model most. When you use the model, your own words and prompt will steer it in such a way as to produce novel combinations of concepts,
    • I do not think a scrapper/spider knows my username and/or password.

    • By this point, social media platforms, blog platforms, etc., are awash with GPT LLM generated content. The trick now is to try to distinguish between people who post human generated content versus that of GPT LLMs to build future models. One will be more valuable than the other. I'll leave you to guess which.

      Once more, we see that "content is king" & whoever owns the biggest, highest quality libraries of content will have the most valuable assets.
    • What's to stop anyone from scraping Slashdot? Seems to me like there's more "meat" here.

  • If the posts are visible - they will be scraped and used for any purpose whatsoever... including training AI. Maybe they are saying that nobody can see posts made on BlueSky yet....
    • They're not even saying that. They're only saying that *they* won't do it or facilitate it.

      (Not that they have the resources to do so anyway - it's a small team with a small budget)

      • Eh, not too small. bsky is Jack Dorseys pet project, and Jack got a *lot* of money off Musks goonish aquisition of twitter.

        Its not monster-scale social media level financing, but its not nothing either. Its well funded.

        • by Rei ( 128717 )

          Jack is not involved with Bluesky in any way, shape or form. This is a common misconception.

          He started a chatroom to discuss a new protocol (several hundred devs took part; Jack didn't) and gave seed funding for the winning RFP (which happened to be Jay's). The initial plan was that it would be run inside Twitter but Jay negotiated to retain majority control as an independent entity. Jack gained one board seat (of three). However, he quickly discovered that most people at Bluesky at the time didn't like

  • Partial truth? (Score:5, Insightful)

    by GoJays ( 1793832 ) on Friday November 15, 2024 @08:06PM (#64949311)
    While Bluesky says it won't train AI with your posts... it doesn't mean Bluesky won't sell your data to companies that WILL use it to train AI. Technically Bluesky isn't the company training the AI so it is true when they say; "Bluesky has no intention of using user data to train AI."
  • This must be in the TOS or it doesn't count. WHile where at it make sure that when the TOS changes to allow AI, its opt in
  • by Mr. Dollar Ton ( 5495648 ) on Friday November 15, 2024 @11:04PM (#64949539)

    How is it better than Mastodon, which is completely free, federated and not reliant on any one company? And how does it get traction here?

    • by Cyberax ( 705495 ) on Saturday November 16, 2024 @03:49AM (#64949797)
      Bluesky is also completely free, with a protocol that is better than Mastodon, federated, and doesn't rely on any company. The biggest advantage over Mastodon: opt-in moderation and curated channels.

      You can run the entire Bluesky stack locally, if you want. And it will work just fine, interoperating with the bluesky.social
      • How is it "better"? And how come all media outlets keep talking about that one instance run by a private company that has or has had some rock'n'rolling with the fucktard from twitter, whatever his name was, the one before rocket man?

        • by Cyberax ( 705495 )
          The AT Protocol ( https://atproto.com/ [atproto.com] ) is just better designed to deal with large-scale data delivery. It supports proper streaming from the start, and crucially, it supports federated moderation ("labeling"). And it's not linked to one instance, so you can mix-and-match content and moderation.

          Right now, the Bluesky.social server is the largest one, but you absolutely can host it on your own. You actually can even use the official Bluesky client apps, including the web app, with your private PDS server.
    • My guess is Catchier name

  • I hadn't realized one had to subscribe to the political philosophy of the owner in order to post on a social media forum.
    • I've been on Bluesky for over a year, and I still don't know what Jay's "political philosophy" is, and nor do I care. She never talks politics. If you want a CEO obsessed with US politics, go to Twitter.

      • > I've been on Bluesky for over a year, and I still don't know what Jay's "political philosophy" is, and nor do I care. She never talks politics. If you want a CEO obsessed with US politics, go to Twitter.

        You just unintentionally and unironically made my point. But tell me this, seeing as billg is making a bundle out of selling a vaccine to Africans that is banned in the West because of its mortality rate. Shouldn't we all boycott the Microsoft product?
        • I can't even tell your goal in this conversation. You claimed that you have to subscribe to the owner's ideology to use Bluesky, which makes no sense when *the owner never talks about ideology at all*.

          I'd say most *users* are on the left, and that might mean that a conservative feels they're outnumbered in an argument, or get blocked a lot. And Bluesky's moderation is based on labels, and users can choose to subscribe to whatever labelers they want to block whatever type of content they want, so if that p

          • (Also, I'd add, with the increasing influx from ex-Twitter, the mean ideology is moving more toward thr centre. Though again, the *mean* is left of centre, IMHO - which should be unsurprising, given ex-Twitter's shift to the far right)

            • Great point, but I'll hazard a guess that the mean position is not the same as the median position, and the gap may continue to widen. Outliers y'know.

              Then again, maybe the outliers will stay on Twitter/X.

        • > I've been on Bluesky for over a year, and I still don't know what Jay's "political philosophy" is, and nor do I care. She never talks politics. If you want a CEO obsessed with US politics, go to Twitter.

          You just unintentionally and unironically made my point.

          Did Rei actually do that? Let's review your point:

          I hadn't realized one had to subscribe to the political philosophy of the owner in order to post on a social media forum.

          [Emphasis mine.] That's not what Rei said. Not at all. S/he said if you want to be on a platform with a politically-outspoken CEO, join Twitter/X.

          But tell me this, seeing as billg is making a bundle out of selling a vaccine to Africans that is banned in the West because of its mortality rate. Shouldn't we all boycott the Microsoft product?

          And now I'm sure you're a troll. [apnews.com]

  • Hopefully this won't just devolve into another Twitter dominated by piss poor governance and greed.

    • If it does, we will move on to a new thing. And they are normalizing domain-based usernames (if you want a username that ends in something other than bsky.social, you need to either put a file on a web [v]host or add a DNS record, and then you can be named after a FQDN) so hopefully other sites will take that up and then it will be easier to find people when that happens.

  • Sure we believe you...... wink wink !
  • As long as they don't lock down the APIs they are scraped nevertheless.

    Do you think the Fediverse with all it's ideals about such things is safe? You can just tap into the global stream and get everything your instance sees. And there is not way to lock that down that doesn't affect users. And no way at all to lock it down against instance admins, that doesn't prevent federation itself completely.

    Just get used to it. Public posts are public and you probably should worry less about a part of them being mashe

  • in a few years when it's valuable enough
  • How hard would it be to create a "related entity" called We Are Not BlueSky, and give them access to all your data to train AI.
  • I think that's the operative word. At some future time, a pointy hair will discover that they can squeeze a few dollars from monetizing "their" data.
  • If I'm posting on a publicly visible site, why would I give a shit if my posts are being read by some AI bot? I have fairly good grammar and punctuation skills. Maybe I can teach it something.

    What I do care about is not participating on a platform that lends influence to Elon Musk. That bootlicker's personal propaganda platform needs to shrivel and die.

Avoid strange women and temporary variables.

Working...