Again, IP is an outdated concept in this day and age. In all honestly there shouldn't even be the notion of fair use, any transformative work should be allowed. There is nothing about LLM training that isn't transformative, just as, well, grinding meat from a steak into stuffed sausages transforms it.
I'm not even talking about big corporations with proprietary models, in fact I oppose their not being open source or weight, I want more open models not fewer as that at least democratizes the value of LLMs. The worst case is having copyright hawks allowing regulatory capture by big AI corps by pushing regulations about licensing content, which, of course, no open model company will be able to afford in the future. I find that infinitely worse than having more lax copyright laws, where only a few corporations can tell you want to think via usage of their LLMs.
Lastly, no one can tell me from first principles why LLM training is bad, on the copyright side, other than, it just is, because copyright law dictates it so. Perhaps copyright law is what needs to be abolished, not LLMs.
"Transformative" has a specific meaning under the fair use doctrine. You can't just Rot13 or gzip someone's novel and call that transformative.
> Perhaps copyright law is what needs to be abolished, not LLMs.
Sure, now that it's inconvenient for some billionaires --- who themselves have nothing to protect, because everything they offer is a service the user can only access through the network, while they have a subscription.
I'm talking about the concept of transformation, not the specific legal language, which, again, I said is not worth discussing, because the legal concept of intellectual property is not useful.
No, not just now, since forever. I suppose Stallman being right all along is about this concept. And just to be clear, I'm not a supporter of current closed source AI companies, like I said I want to see open models succeed.
As I asked above, it really does look like no one can explain why LLM training is bad, besides saying it's bad. Therefore I will continue to reject IP as a concept.
Obviously, since you reject IP, presumably you would be okay to copy and paste code out of some GNU program into your own program, without attribution, and then, if you feel like it, release that program under the least restrictive terms possible (as close to the public domain as you could practically get away with).
So discussions revolving about doing so less directly through training a model just add distracting details that don't matter.
If everyone did that (due to there not being any rules against that), then fewer people would write programs under free licenses. Many such developers are volunteers, whose only payment is that the work product is theirs to license how they want.
Having that taken away from us is discouraging.
We haven't done anything to deserve such a "fuck you".
I'm not even talking about big corporations with proprietary models, in fact I oppose their not being open source or weight, I want more open models not fewer as that at least democratizes the value of LLMs. The worst case is having copyright hawks allowing regulatory capture by big AI corps by pushing regulations about licensing content, which, of course, no open model company will be able to afford in the future. I find that infinitely worse than having more lax copyright laws, where only a few corporations can tell you want to think via usage of their LLMs.
Lastly, no one can tell me from first principles why LLM training is bad, on the copyright side, other than, it just is, because copyright law dictates it so. Perhaps copyright law is what needs to be abolished, not LLMs.