Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Operating Systems

NetBSD Bans AI-Generated Code (netbsd.org) 64

Seven Spirals writes: NetBSD committers are now banned from using any AI-generated code from ChatGPT, CoPilot, or other AI tools. Time will tell how this plays out with both their users and core team. "If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution," reads NetBSD's updated commit guidelines. "Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code. Code generated by a large language model or similar technology, such as GitHub/Microsoft's Copilot, OpenAI's ChatGPT, or Facebook/Meta's Code Llama, is presumed to be tainted code, and must not be committed without prior written approval by core."
This discussion has been archived. No new comments can be posted.

NetBSD Bans AI-Generated Code

Comments Filter:
  • why is this a thing? AI can't write code. no matter how much people who can't write code tell you it can...
    • It does alright copying code it has seen before, but without attribution. That's a pretty good reason to consider anything an LLM generates to be unoriginal and tainted.

      • So wait, would this also apply to anyone getting code from say Stack Exchange? Or from some other location where a human wrote it? What about auto-complete code? Most LLM at the moment with code is just glorified auto-complete with a pretty big DB behind it. Additionally, what's the degree of taint here? Would lifting a code example from a UNIX manual and adapting some lines of it to modern style be taint?

        I think the entire point is that LLM is an "unknown" at the moment for how copyright applies to it

        • by aldousd666 ( 640240 ) on Thursday May 16, 2024 @10:32PM (#64478273) Journal
          Yes, it applies to everyone getting code from stackexchange. And it's not about legal or not, per se, it's that some code snippets are so well known they've been quoted in a lot of places on the internet and are likely to appear nearly verbatim in generated snippets with the right prompt. They may be misattributed by that point. They regularly err on the side of caution. Whatever your own personal theory about how LLM code turns out in court, it's not worth them taking the untested legal risk at this point in history. They could just do what they usually do and treat unknowns as tainted. And move on. If that changes in the future, then let the people running the place in the future decide.
          • Whatever your own personal theory about how LLM code turns out in court, it's not worth them taking the untested legal risk at this point in history

            That's literally what I said.

            I think the entire point is that LLM is an "unknown" at the moment for how copyright applies to it

            • by Guignol ( 159087 )
              It looks like you are complaining
              The guy just agrees with you, it starts with 'Yes', that was a big hint :)
              Also, you phrase it like it's 'just' what you think, not 100% convinced, happy to be shown otherwise, great.
              So this guy just helps you feel more comfortable I suppose, agreeing with you, saying it a bit differently, bringing some nuance... I don't see why you complain, really.
              • The guy just agrees with you....

                slack_justyb: It's about legal issues.
                aldousd666: It's not about legal issues per se, but rather it's about legal issues.
                slack_justyb: That's what I said.

                I Hope that helps.

                • Honestly I think this whole conversation is wasted, we shouldn't have come down from the trees my opinion be known.

                  I have learned from this that I need to a lot less ready to randomly press keys on my keyboard and sign it off as my opinion, when my opinion living solely within my head is good enough.

    • by Pieroxy ( 222434 )
      I just asked ChatGPT the following:

      Can you write the function "String reverseWordsInString(String arg)" in Java 17 ?

      Here is the result:

      public class StringUtil {

      public static String reverseWordsInString(String arg) {
      if (arg == null || arg.isBlank()) {
      return arg;
      }

      // Split the string into words using whitespace as the delimiter
      String[] words = a
      • Did ChatGTP tell you where it copied that from? And the license it was published under (which could be "no re-use without BigCorp Prior written permission", and ChatGPT has already illegally copied it.

        As NetBSD say, this is presumed "tainted" until proven clean. As the person submitting it, you are the one required to prove it clean. Not them. It's your job, as the submitter.

        I struggle to see where the difficulty is, unless you object to actually doing your job as a code-submitter.

        • Every possible combination of keywords has probably been written by someone. Variable names are probably the only difference. This is like melody. Every possible melody (in a single octave) has been created https://www.vice.com/en/articl... [vice.com]

          Unless your AI is writing full programs, having it write basic boilerplate functions shouldn't taint anything. Otherwise I submit that all code is owned by whoever has the largest git repo.

          • I can't say I understand the musical analogy, but for any particular language you should be able to work out the number of possible programmes for a set code length. Elimination of duplicates would be the hardest bit, but the branching of syntax diagrams quickly gets to pretty large numbers of possible programmes. "Large" as in, combinatorial numbers rather than science's small number exponential notation.

            Regardless of which, a significant part of your job as code-submitter remains to prove that the code y

        • by Pieroxy ( 222434 )
          I was responding to a comment claiming that AI can't write code. They do.

          Stop reframing the discussion to your own question.
          • Prove that whatever AI you used wrote that, as opposed to copying it from elsewhere. Which seems to be precisely the point you want to emphasise, but you're missing the point of proving it. I assert that your example is actually copied from somewhere. Prove me wrong.

            That's "prove" in the mathematical sense - not the limp version of the word that lawyers use.

            • by Pieroxy ( 222434 )

              Asking me to prove some piece of code cannot be found anywhere is disingenuous at best. Proving something doesn't exist is impossible, and I suspect you know it. Nice trolling.

              Why don't you provide an example of ChatGPT spouting more than a line of text verbatim from anywhere? If what you claim is true it should be trivial. And countless lawsuits would be cast by now.

              What I suggest is for you to go play half an hour with ChatGPT. While severely limited in many aspects, it is clearly more than what you suspe

              • I have no idea how to log into ChatPGT and even less interest in giving it my details to sell. I don't see any reason to use it.

                As in previous decades, the option exists of composing your own code, asserting that it is your own composition, making that declaration and submitting it as code for an Open Source project under those terms. If it's your composition, you're OK ; if you're a thieving liar (viz, someone who uses AI and claims it as their own work), then you've every chance of being proved wrong, in

                • by Pieroxy ( 222434 )

                  You really don't get it. ChatGPT is free. If you want to test it, just do. You won't give details to anyone. Free emails are a dime a dozen these days. But if you have no interest in it, shut the fuck up and don't give your opinion as to the quality of its responses as you have never ever tried it. You don't know.

                  I'm not advocating its use. I'm not saying it's great. I was just answering a comment claiming it cannot write code. It can and it does.

                  How about you stick to posting about stuff you have at least

                  • ChatGPT is free.

                    As the saying goes - and has for decades - if the product is free, then YOU are the product. And when they've got you addicted to it, the price is going to start going up. See also "enshittification" [wiktionary.org].

        • In one of those attacks of hilarious coincidence the internet is prone too, while opening several tabs to reply to these Slashdot comment, another tab was following a different (vague) interest, and exposed a "faker" of a content writer :

          Theyâ(TM)re very easy to apply, requiring only a thorough degreasing before application. For fun I compared the finish from two Brownellâ(TM)s cold bluing products: You did not provide any text to rewrite. Please provide the text to proceed. Dicropan T-4

          (from t [cnccookbook.com]

  • by Mr. Dollar Ton ( 5495648 ) on Thursday May 16, 2024 @09:46PM (#64478199)

    The code you copy-pasted from the glorified databases of scraped examples on the web that is called "AI" is quite obviously admissible, it just needs to undergo some additional scrutiny. And there's nothing wrong with this approach.

    • This is almost certainly more about legal issues. Not all github code is licensed to copy, but if AI reproduces it too closely, that could have a catastrophic effect on its destination if the lawyers sniff around too closely.

      • Yes, as it is spelled out in the summary already.

      • if the lawyers sniff around too closely.

        I think you mean

        if the lawyers do the job they're paid to do

        A subtle difference, I grant, and I'm quite cynical about lawyers in general. But in this case, that is their job, and they're doing it.

        Unlike people who post tainted code whose license they have chosen to not investigate and determine if it's acceptable to the place the code is being submitted to.

    • Envision the SCO lawsuit being redone today after LLM-generated code has entered the kernel.

      If you remember what happened then, they found ludicrous examples of code copying that were easily disposed of. Imagine what they might find if LLM-generated code was rampant in the space?

      • Well, that's the risk of scrapping shit from the Internet, when you could have used your documentation, properly licensed code and your brain.

  • by LindleyF ( 9395567 ) on Thursday May 16, 2024 @09:50PM (#64478203)
    Auto-complete is getting really good. If I need to make similar but not identical changes in a dozen places, it figures out what I'm doing after the third or so. That's AI too, but no one would complain about it.
    • by AmiMoJo ( 196126 )

      That's just algorithmic, not a large language model.

      The core issue is that AI generated code tends to ignore the licence of the source material, so it's hard to know if it just copied a large chunk that can't be incorporated under the BSD licence.

      • The core issue is that AI generated code tends to ignore the licence of the source material, so it's hard to know if it just copied a large chunk that can't be incorporated under the BSD licence.

        So does the human mind. Did I just come up with Hello world by myself, or did I inadvertently "remember" it from code published elsewhere. That is the question. There's a pretty finite way of coming up with a solution to a problem. You will likely "invent" something that someone else has already "invented". You will likely "invent" something that you actually "remembered" seeing somewhere.

        If you ask me to write a Hello World bit of code now, I won't be able to come up with something that isn't already in a

        • by AmiMoJo ( 196126 )

          Microsoft Copilot often reproduces code verbatim, with the same variable names, the same comments, even the copyright and licence notice.

          • If you can reproduce verbatim code then you can reproduce the attribution. Not all LLMs work like that.

        • by Bongo ( 13261 )

          A line has to be drawn somewhere, otherwise you could spend years developing a work, only for it to be copied and sold before you get any credit, obviously. Another angle, why should you be paid for any of your work when you were simply taught every thing you know, by society? We're always both individuals and a collective. Also OpenAI et al are profiting from copying everything, so they as individual orgs profit from the collective. So it's where to draw the lines.

          • So what I'm getting from your post is that AI, such as a LLM, is basically just massive copyright infringement and should be therefore banned. Is that correct?

            • by HBI ( 10338492 )

              I'm sure the lawyers had precisely this conversation and were banking on making the argument that sufficient transformation was happening that it wasn't infringement, but rather fair use.

              I doubt it.

      • The autocomplete I'm using invents class methods just enough that I'm pretty sure there's an LLM back there somewhere.
    • About your comment "too broad", an argument read on lwn when Gentoo banned AI is that it is intended to streamline the rejection process for when it is needed (low quality code), while a good faith actor can continue to use auto-complete without problem:

      "If it's allowed [to use AI code generation] provided the quality is good enough, you [Gentoo moderators] could end up spending way too much time arguing with bad-faith actors about whether the contributions they submitted are good enough or not. Whereas bei

  • by jenningsthecat ( 1525947 ) on Thursday May 16, 2024 @10:58PM (#64478297)

    It strikes me that LLM outputs are an environmentally persistent, often toxic part of the modern Web. Like micro-plastics and PFOAs it's showing up everywhere, including in unexpected places. While it's useful and beneficial in some ways, it's also clearly dangerous and damaging in other ways. And it's appearing in computer code, literature, music, visual arts, scientific papers, legal documents, probably government legislation, and all sorts of other places.

    And like micro-plastics, once it's there, it's often difficult to detect, and it's going to be a bugger to get rid of. We knew better - or should have - yet we've let it contaminate just about everything anyway. I think the net result is going to be the kind of 'interesting' that most sane people don't welcome.

    I know this analogy is far from exact, but I think it may be a useful way of looking at the consequences of letting this particular genie out of the bottle.

    • I figure either they get sued out of existence for copyright infringement or anything they generate cannot be copyright. Either one seems like a win to me, given our draconian copyright laws.

  • I wonder how difficult it can be to tell apart NI-generates code from AI's.

    I bet it can (and will) pretty hard.
    Any idea?

  • by vbdasc ( 146051 ) on Friday May 17, 2024 @01:20AM (#64478425)

    It's not a ban, just a friendly reminder that the NetBSD project views matters of copyright seriously and responsibly, and an appeal to developer community to adhere to the same standards.

    • It's not a ban, just a friendly reminder that the NetBSD project views matters of copyright seriously and responsibly, and an appeal to developer community to adhere to the same standards.

      About as seriously as all those developers stealing movies, music, and other software, right?

      • That's not stealing, it is copyright infringement. Also, not sure anyone really cares if you infringe on a copyright if you aren't redistributing stuff. Download a movie? Nearly impossible you'll get noticed or held accountable. Start redistributing those same files? Much different.

        It's kind of like drug possession versus being a dealer. Without a traffic stop (essentially an illegal search..) vast majority of drug users are never getting busted but someone actively distributing looks significantly differen

    • Re:Not a ban (Score:4, Interesting)

      by HiThere ( 15173 ) <[ten.knilhtrae] [ta] [nsxihselrahc]> on Friday May 17, 2024 @07:54AM (#64478771)

      I think it's effectively a ban at this point. After a few court cases are decided that may change. At the moment it's just playing safe...which is normal NetBSD activity.

  • I guess even HelloWorld.c is tainted.

    • by HiThere ( 15173 )

      I'm not sure. Copyright isn't supposed to apply to functionally required elements. And I don't think "hello" can be copyrighted.

  • Somebody has to hold the line, where are new mid-level architects going to come from if they didn't start writing code from scratch? AI seems to be able to at first glance write code that seems plausible, but it never works "out of the box" it always needs tweaking and debugging because AI doesn't have an understanding of configuration, thus far anyway. At least what I've been able to get from it. To me it seems like a supercharged search engine, definitely an order of magnitude shift, but far from being a
    • It's only a matter of time before we "nail" merging AI and robotics to replace most field work. Sure, there will always be edge cases but we will probably shift to a more modular way of doing those edge cases specifically so they can be addressed by our future AI+robotics.

      Going to be quite interesting to see how people learn to become experts when most of the junior stuff is automated away.

  • by cascadingstylesheet ( 140919 ) on Friday May 17, 2024 @06:00AM (#64478629) Journal

    This is getting to be like those non-compete agreements, where supposedly a company can keep you from going elsewhere and using that Java or C or php special sauce that, er, they supposedly taught you? That they invented?

    You know, oh so proprietary and secret stuff ...

  • I'm a Microsoft Evangelist but I am concerned that in 2026 or sometime soon, Microsoft will claim that CoPilot output is copyrighted... and in 2046 the Gen AI industry could be crippled when the lawsuit finally runs it's course.
  • "If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution,"

    Seems like the major concern is copyright, which is valid. That would seem to imply that code assistants that indemnify users against copyright issues with whatever is generated would be ok? e.g., IBM's Code Assistants?

    (Full disclosure: I currently work for IBM)

  • Everyone knows that developers copy code from "less privileged" code contributers. It's even encouraged with the reasoning being there is no oversite or way to prove from the outside that the poacher even seen the code. That's it. So, expect this AI question to resolve the same way in the end. Some metaphor about lying in one's made bed comes to mind.....
  • by glum64 ( 8102266 )
    About a year ago I and a colleague of mine at work sat down to discuss a decision we had to make. Before getting to the business we indulged in a small talk that was not related to the issue at hand. My colleague was very enthusiastic about AI. Jokingly, I proposed him to ask ChatGPT to solve the issue that we were about to discuss. Mind you, we both knew the correct answer to the problem (a boring access control thing). ChatGPT proudly presented us as many as four answers. All different. All equally
  • Better ban people who get code from Google searches, Reddit forums, and any other random forum out there. The conceit of this is incredible. No one writes completely unique code. Bits of code are all "stolen" from someone and reassembled into something new.

    Besides, how the hell are they going to police such a ridiculous decree?

  • Do compliler optimizations count as ai-generated code?

  • I use CoPilot to assist in coding. It does a decent job of picking up on boilerplate stuff and I appreciate it when it gives me a decent tip too.

    I do have VSCode set to alert me if CoPilot copies code from an OSS project. It's alerted me of it exactly once. It gave me a line of code from an open source project.

    That I wrote.

    And it was the one I was working on.

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...