NetBSD Bans AI-Generated Code (netbsd.org) 64
Seven Spirals writes: NetBSD committers are now banned from using any AI-generated code from ChatGPT, CoPilot, or other AI tools. Time will tell how this plays out with both their users and core team. "If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution," reads NetBSD's updated commit guidelines. "Check with the author(s) of the code, make sure that they were the sole author of the code and verify with them that they did not copy any other code. Code generated by a large language model or similar technology, such as GitHub/Microsoft's Copilot, OpenAI's ChatGPT, or Facebook/Meta's Code Llama, is presumed to be tainted code, and must not be committed without prior written approval by core."
what they're calling "AI" today can't write code (Score:1)
Re: what they're calling "AI" today can't write co (Score:3, Insightful)
It does alright copying code it has seen before, but without attribution. That's a pretty good reason to consider anything an LLM generates to be unoriginal and tainted.
Re: (Score:2)
So wait, would this also apply to anyone getting code from say Stack Exchange? Or from some other location where a human wrote it? What about auto-complete code? Most LLM at the moment with code is just glorified auto-complete with a pretty big DB behind it. Additionally, what's the degree of taint here? Would lifting a code example from a UNIX manual and adapting some lines of it to modern style be taint?
I think the entire point is that LLM is an "unknown" at the moment for how copyright applies to it
Re: what they're calling "AI" today can't write co (Score:4, Informative)
Re: (Score:2)
Whatever your own personal theory about how LLM code turns out in court, it's not worth them taking the untested legal risk at this point in history
That's literally what I said.
I think the entire point is that LLM is an "unknown" at the moment for how copyright applies to it
Re: (Score:2)
The guy just agrees with you, it starts with 'Yes', that was a big hint
Also, you phrase it like it's 'just' what you think, not 100% convinced, happy to be shown otherwise, great.
So this guy just helps you feel more comfortable I suppose, agreeing with you, saying it a bit differently, bringing some nuance... I don't see why you complain, really.
Re: (Score:3)
The guy just agrees with you....
slack_justyb: It's about legal issues.
aldousd666: It's not about legal issues per se, but rather it's about legal issues.
slack_justyb: That's what I said.
I Hope that helps.
Re: (Score:2)
Honestly I think this whole conversation is wasted, we shouldn't have come down from the trees my opinion be known.
I have learned from this that I need to a lot less ready to randomly press keys on my keyboard and sign it off as my opinion, when my opinion living solely within my head is good enough.
Re: (Score:3)
Can you write the function "String reverseWordsInString(String arg)" in Java 17 ?
Here is the result:
public class StringUtil {
public static String reverseWordsInString(String arg) {
if (arg == null || arg.isBlank()) {
return arg;
}
String[] words = a
Re: (Score:3)
As NetBSD say, this is presumed "tainted" until proven clean. As the person submitting it, you are the one required to prove it clean. Not them. It's your job, as the submitter.
I struggle to see where the difficulty is, unless you object to actually doing your job as a code-submitter.
Re: (Score:2)
Every possible combination of keywords has probably been written by someone. Variable names are probably the only difference. This is like melody. Every possible melody (in a single octave) has been created https://www.vice.com/en/articl... [vice.com]
Unless your AI is writing full programs, having it write basic boilerplate functions shouldn't taint anything. Otherwise I submit that all code is owned by whoever has the largest git repo.
Re: (Score:2)
I'd argue that every possible melody has been written by a person. That article shows someone do it with an algorithm, but that is still them creating music using tools. A synth, a computer, a guitar, still them creating music.
Re: (Score:2)
Regardless of which, a significant part of your job as code-submitter remains to prove that the code y
Re: (Score:2)
Stop reframing the discussion to your own question.
Re: (Score:2)
That's "prove" in the mathematical sense - not the limp version of the word that lawyers use.
Re: (Score:2)
Asking me to prove some piece of code cannot be found anywhere is disingenuous at best. Proving something doesn't exist is impossible, and I suspect you know it. Nice trolling.
Why don't you provide an example of ChatGPT spouting more than a line of text verbatim from anywhere? If what you claim is true it should be trivial. And countless lawsuits would be cast by now.
What I suggest is for you to go play half an hour with ChatGPT. While severely limited in many aspects, it is clearly more than what you suspe
Re: (Score:2)
As in previous decades, the option exists of composing your own code, asserting that it is your own composition, making that declaration and submitting it as code for an Open Source project under those terms. If it's your composition, you're OK ; if you're a thieving liar (viz, someone who uses AI and claims it as their own work), then you've every chance of being proved wrong, in
Re: (Score:2)
You really don't get it. ChatGPT is free. If you want to test it, just do. You won't give details to anyone. Free emails are a dime a dozen these days. But if you have no interest in it, shut the fuck up and don't give your opinion as to the quality of its responses as you have never ever tried it. You don't know.
I'm not advocating its use. I'm not saying it's great. I was just answering a comment claiming it cannot write code. It can and it does.
How about you stick to posting about stuff you have at least
Re: (Score:2)
As the saying goes - and has for decades - if the product is free, then YOU are the product. And when they've got you addicted to it, the price is going to start going up. See also "enshittification" [wiktionary.org].
Re: (Score:2)
(from t [cnccookbook.com]
Not "banned", submittard. (Score:3, Informative)
The code you copy-pasted from the glorified databases of scraped examples on the web that is called "AI" is quite obviously admissible, it just needs to undergo some additional scrutiny. And there's nothing wrong with this approach.
Re: (Score:2)
This is almost certainly more about legal issues. Not all github code is licensed to copy, but if AI reproduces it too closely, that could have a catastrophic effect on its destination if the lawyers sniff around too closely.
Re: (Score:2)
Yes, as it is spelled out in the summary already.
Re: (Score:2)
I think you mean
if the lawyers do the job they're paid to do
A subtle difference, I grant, and I'm quite cynical about lawyers in general. But in this case, that is their job, and they're doing it.
Unlike people who post tainted code whose license they have chosen to not investigate and determine if it's acceptable to the place the code is being submitted to.
Actually, there is a problem (Score:2)
Envision the SCO lawsuit being redone today after LLM-generated code has entered the kernel.
If you remember what happened then, they found ludicrous examples of code copying that were easily disposed of. Imagine what they might find if LLM-generated code was rampant in the space?
Re: (Score:2)
Well, that's the risk of scrapping shit from the Internet, when you could have used your documentation, properly licensed code and your brain.
Too broad (Score:3)
Re: (Score:3)
I use codeium every single day at work. It's intelligent auto complete. It knows what I'm likely to do next, uses the variable names I suggested, understands the purpose of the code (somewhat), writes my docstrings, and even 'refactors' functions when asked. If people want to argue that the resulting code is stolen, then all code is stolen and we might as well stop writing code.
Re: (Score:2)
That's just algorithmic, not a large language model.
The core issue is that AI generated code tends to ignore the licence of the source material, so it's hard to know if it just copied a large chunk that can't be incorporated under the BSD licence.
Re: (Score:3)
The core issue is that AI generated code tends to ignore the licence of the source material, so it's hard to know if it just copied a large chunk that can't be incorporated under the BSD licence.
So does the human mind. Did I just come up with Hello world by myself, or did I inadvertently "remember" it from code published elsewhere. That is the question. There's a pretty finite way of coming up with a solution to a problem. You will likely "invent" something that someone else has already "invented". You will likely "invent" something that you actually "remembered" seeing somewhere.
If you ask me to write a Hello World bit of code now, I won't be able to come up with something that isn't already in a
Re: (Score:2)
Microsoft Copilot often reproduces code verbatim, with the same variable names, the same comments, even the copyright and licence notice.
Re: (Score:2)
If you can reproduce verbatim code then you can reproduce the attribution. Not all LLMs work like that.
Re: (Score:2)
A line has to be drawn somewhere, otherwise you could spend years developing a work, only for it to be copied and sold before you get any credit, obviously. Another angle, why should you be paid for any of your work when you were simply taught every thing you know, by society? We're always both individuals and a collective. Also OpenAI et al are profiting from copying everything, so they as individual orgs profit from the collective. So it's where to draw the lines.
Re: (Score:2)
So what I'm getting from your post is that AI, such as a LLM, is basically just massive copyright infringement and should be therefore banned. Is that correct?
Re: (Score:2)
I'm sure the lawyers had precisely this conversation and were banking on making the argument that sufficient transformation was happening that it wasn't infringement, but rather fair use.
I doubt it.
Re: Too broad (Score:2)
Re: (Score:3)
About your comment "too broad", an argument read on lwn when Gentoo banned AI is that it is intended to streamline the rejection process for when it is needed (low quality code), while a good faith actor can continue to use auto-complete without problem:
"If it's allowed [to use AI code generation] provided the quality is good enough, you [Gentoo moderators] could end up spending way too much time arguing with bad-faith actors about whether the contributions they submitted are good enough or not. Whereas bei
The abstract equivalent of 'Forever Chemicals'? (Score:5, Insightful)
It strikes me that LLM outputs are an environmentally persistent, often toxic part of the modern Web. Like micro-plastics and PFOAs it's showing up everywhere, including in unexpected places. While it's useful and beneficial in some ways, it's also clearly dangerous and damaging in other ways. And it's appearing in computer code, literature, music, visual arts, scientific papers, legal documents, probably government legislation, and all sorts of other places.
And like micro-plastics, once it's there, it's often difficult to detect, and it's going to be a bugger to get rid of. We knew better - or should have - yet we've let it contaminate just about everything anyway. I think the net result is going to be the kind of 'interesting' that most sane people don't welcome.
I know this analogy is far from exact, but I think it may be a useful way of looking at the consequences of letting this particular genie out of the bottle.
Re: (Score:2)
I figure either they get sued out of existence for copyright infringement or anything they generate cannot be copyright. Either one seems like a win to me, given our draconian copyright laws.
Isn't it difficult? (Score:2)
I wonder how difficult it can be to tell apart NI-generates code from AI's.
I bet it can (and will) pretty hard.
Any idea?
Not a ban (Score:3)
It's not a ban, just a friendly reminder that the NetBSD project views matters of copyright seriously and responsibly, and an appeal to developer community to adhere to the same standards.
Re: (Score:2)
It's not a ban, just a friendly reminder that the NetBSD project views matters of copyright seriously and responsibly, and an appeal to developer community to adhere to the same standards.
About as seriously as all those developers stealing movies, music, and other software, right?
Re: (Score:2)
That's not stealing, it is copyright infringement. Also, not sure anyone really cares if you infringe on a copyright if you aren't redistributing stuff. Download a movie? Nearly impossible you'll get noticed or held accountable. Start redistributing those same files? Much different.
It's kind of like drug possession versus being a dealer. Without a traffic stop (essentially an illegal search..) vast majority of drug users are never getting busted but someone actively distributing looks significantly differen
Re:Not a ban (Score:4, Interesting)
I think it's effectively a ban at this point. After a few court cases are decided that may change. At the moment it's just playing safe...which is normal NetBSD activity.
HelloWorld (Score:2)
I guess even HelloWorld.c is tainted.
Re: (Score:2)
I'm not sure. Copyright isn't supposed to apply to functionally required elements. And I don't think "hello" can be copyrighted.
Flesh and Blood (Score:1)
Re: (Score:2)
It's only a matter of time before we "nail" merging AI and robotics to replace most field work. Sure, there will always be edge cases but we will probably shift to a more modular way of doing those edge cases specifically so they can be addressed by our future AI+robotics.
Going to be quite interesting to see how people learn to become experts when most of the junior stuff is automated away.
Meh (Score:3)
This is getting to be like those non-compete agreements, where supposedly a company can keep you from going elsewhere and using that Java or C or php special sauce that, er, they supposedly taught you? That they invented?
You know, oh so proprietary and secret stuff ...
Could Microsoft pull an Oracle on us? (Score:2)
What About Indemnified Code Assistants? (Score:3)
"If you commit code that was not written by yourself, double check that the license on that code permits import into the NetBSD source repository, and permits free distribution,"
Seems like the major concern is copyright, which is valid. That would seem to imply that code assistants that indemnify users against copyright issues with whatever is generated would be ok? e.g., IBM's Code Assistants?
(Full disclosure: I currently work for IBM)
But intellectual theft is artisan? (Score:1)
Heh (Score:1)
So they're banning Stack Overflow code as well? (Score:2)
Better ban people who get code from Google searches, Reddit forums, and any other random forum out there. The conceit of this is incredible. No one writes completely unique code. Bits of code are all "stolen" from someone and reassembled into something new.
Besides, how the hell are they going to police such a ridiculous decree?
compiler (Score:2)
Do compliler optimizations count as ai-generated code?
CoPilot has an alert (Score:2)
I use CoPilot to assist in coding. It does a decent job of picking up on boilerplate stuff and I appreciate it when it gives me a decent tip too.
I do have VSCode set to alert me if CoPilot copies code from an OSS project. It's alerted me of it exactly once. It gave me a line of code from an open source project.
That I wrote.
And it was the one I was working on.