I asked ChatGPT4 to write a synth using juce!

it’s not that clear actually. when music is “transformative” the sample is considered “fair use”. typically that means the person with more money wins, but whatever. copyright laws are all garbage anyway

1 Like

Are patents being used in the training data? Probably. You can’t usually get away with patent infringement by just writing code in your own style, or claiming ignorance of the source (which at least would be true), so this is going to be a minefield in future when patent holders start seeing their protected ideas and techniques cropping up in unexpected places.

3 Likes

Well - ChatGPT’s or CoPilot’s knowledge of how to use the JUCE framework can only be
based on code examples that are publicly available.
Snippets in a forum (like this forum) usually don’t compile, so it is safe to assume that
almost everything that ChatGPT or CoPilot know about JUCE is based on JUCE-based
open source code available on Github & Co.
But JUCE is dual-licensed (proprietary and GPL) and all publicly available JUCE code
on Github or in other places necessarily uses the GPL license - because there is no other
option for using JUCE code in public repos.
GPL, of course, is a Copyleft license, so any use of the sources, even in modified form,
must be GPL licensed again.
And using the source code for building the language model is also a form of
“using” the code.
If nearly all JUCE examples on which the model is based are GPL licensed,
I cannot see how the Copyleft license should “magically” go away simply by mashing
them up into a snippet shown to the user.
In my view all generated JUCE snippets should be GPL licensed because this applies
to nearly all sources they were extracted and mashed up from.
ChatGPT or CoPilot should not be a back-door to whitewash Copyleft code so that it can be
integrated into closed-source applications.
At least each user should know that there is a risk in using the generated snippets
because there is usually no information as to the primary sources involved, which in turn
means that one cannot know which licenses (or even patents) are involved.
In my view ChatGPT and CoPilot should offer separate models for each kind of license -
trained only with conforming sources and made available under the respective license.

That only describes why it’s technically illegal to create or use those tools, but not why it’s morally bad, and that’s why I have a problem with the argument. You should ask yourselves less what the GPL license says, but instead what it’s function is and why it was made an option for JUCE projects in the first place.

Making and maintaining a big framework like JUCE is expensive, so people have to pay JUCE. Still JUCE is super generous and lets us use the entire framework for free, but only if we accept to open-source our code, because those repositories are free advertisement for JUCE and also great learning material for new JUCE developers, so JUCE benefits from them as well. The GPL license makes sure that people comply with the rules and don’t use the framework for closed-source applications without paying for it. That is the function.

Do any AI tools interfere with this function? When Github Copilot picks up some information from someone else’s code base and provides you with this information, you can use the information in your own code base and finish your project with it. You might have one of the commercial JUCE licenses, so you don’t need GPL, but the code that you were given was using GPL. Technically that’s illegal, but just because you use the code, doesn’t mean that this code suddenly doesn’t have a beneficial effect for JUCE anymore. Quite the contrary, it has let you finish your project so you will probably recommend JUCE due to the positive experience you made with it.

That’s why I’d say there is just no way to get around making entirely new rules for all this copyright stuff. At the moment it is just a mess and it doesn’t reflect reality anymore.

Some might say it’s stupid to talk about why a license was created instead of what the license says directly, but in my opinion it’s important to always question tradition, because it could always be that we just believe in something to be the right way, only because we learned it is, but not because it actually is.

1 Like

If you take away the license, i.e. that the author can decide the terms of how it is shared and used, you take away the incentive to share code at all. This is not a liberation, but the end of all free knowledge exchange.

2 Likes

There’s a huge amount of publicly available JUCE projects which are MIT or BSD licensed. Just take a look at the awesome-juce list. About a third or at least a fourth of all projects have such a license.

Yes! And this shouldn’t be too difficult to implement: one mode is trained exclusively with code from MIT/BSD-licensed projects and the other exclusively with GPL-licensed ones. (The JUCE codebase itself can be used to train both, since the JUCE license will necessarily “match” the user’s distribution license).

The problem of attribution should also not be impossible to solve, if the algorithm can somehow list the most relevant sources.

It would be interesting if such an option is ever offered and if it actually works. Nevertheless I think that at this stage it’s the patents which are the tricky part, as @Vorbonaut pointed out.

1 Like

I don’t trust this new AI stuff. They are already politically and ideologically biased. Who can tell what kind of code they spit out?

3 Likes

Thanks for clarifying that, I just couldn’t imagine that escaping the GPL would be possible.
But its true, the Projucer-generated “JuceLibraryCode” sources for a project are not actually
checked in so it’s irrelevant that this part of the code is GPL- (or dual) licensed.

I don’t think that the problem of attribution and license tracking is really difficult to solve.

All modern Large-Language Models are based on Multi-Head Attention networks. Each head
is single-layer feed-forward network with soft-max activation. In effect, each head is a
“winner-takes all” network where a single attention vector is chosen for each input token.

By simply labeling each attention vector with a unique id, you can compute a profile for
each source document used for training (= the bag of all winning attention vector labels).

You can store these profiles in a traditional information retrieval (IR) system using a TF/IDF metric.

In the same way, you can determine the bag of labels for the current user session.
These labels can be used for querying the IR system for the most relevant sources.

The TF/IDF metric ensures that specific attention vectors (taylored to particular kinds
of sources) automatically get higher weight than unspecific attention vectors that are
active for a very high number of sources.

Alternatively, you can simply store the frequency of each license of a source document
with each attention vector label. As a result, you get a license frequency profile for each
attention vector.
This could be used to disable all GPL-dominated attention vectors if someone asks
for code snippets suitable for closed-source use.
In this way, you could dynamically taylor the model to specific licensing constraints.

These are just two straightforward solutions that I came up with in 5 minutes.

I am sure the smart researchers at OpenAI can do this in a much better way!

But it seems they just don’t want to attribute sources and respect licenses because
they can generate better content with more training data and this seems to be all
that matters.

2 Likes

It’s as simple as this:

  • Without CoPilot, you cannot use ANY of the GPL licensed sources on Github
    in your closed-source project because the GPL is a copyleft license and forbids
    closed-source reuse in direct but also in modified form.
  • If you pay for CoPilot, however, you can use ALL of the GPL licensed sources
    in your closed-source project because CoPilot will resynthesize the code
    and remove all these annoying license restrictions for you.

I cannot see how this constitutes “fair use”.

You only focus on the perspective of the makers of the JUCE framework.
But what about the open-source developers who have intentionally chosen
the GPL for their project?
If you develop something and make it open-source with a GPL license then
you want to contribute something to the public but you expressly do not permit
this work or derived works to be integrated into closed-source applications.
From this perspective, not allowing the generation of camouflaged code
from GPL licensed sources follows exactly the intent of the developers who
have chosen that specific license.

5 Likes

now that’s a good argument. thanks. it’s finally getting tricky.

open source code has the property that people can look at and learn from it. so in some way that already contradicts the license. i mean does it matter which method of copying people use?

ctrl+c ctrl+v
plain remembering what has been seen
copilot/gpt

since neural networks work a lot like brains they are actually more comparable to plain remembering than to copy-pasting. people only compare it more to copy-pasting usually because a computer rather than an organic entity does it.

I totally agree with that.

Actually I just wanted to give some weight to the perspective of open-source programmers (and specifically those who deliberately use the GPL).

Personally I am not an open-source programmer and I think it makes sense to experiment with this new style of AI-assisted “pair programming”.

This is probably how most programming will look like anyway in the near future.
People who refuse to adopt this technique likely won’t be efficient enough anymore because they can’t keep up with the pace of those who use it.

Perhaps there should be specific variants of the GPL license that explicitly permit or explicitly forbid the use of the corresponding sources for training a language model that will drop licensing information.

In other words, there should be a way to opt out the use of your public source code in a language model (or at least in models that silently skip license restrictions).

Then every developer can decide which usage they want to allow.

This would finally clarify these “gray area” legal issues.

1 Like

The spirit of GPL is most definitely that the one who benefits from others’ work should also let others benefit from theirs. The whole point of using GPL instead of just sharing code with no license requirement is for the sharing to be “inherited”. It is paying forward with the explicit requirement that you do the same.
That is the reason GPL exists.

JUCE, like Qt, probably allows GPL for 2 reasons: 1. to encourage devs to pay for a license if they intend to sell their works;
2. to allow devs to practice for free until they reach the point of being able to pay for a license and sell their products.

3 Likes

Are you sure about that?
I have seen no restriction mentioned that developers who have a paid license may not share their code as they see fit. If you wrote it, you may distribute it under whatever conditions you wish. Is there something in the JUCE paid license terms prohibiting people from sharing their own code in any way?

Incidentally, I believe you said you live in Germany, right?
Interesting times.

You are right (see also @aamfs earlier post).
There is no such restriction - but the majority of open source repos using JUCE are GPL licensed.

I agree with you.
This terminology is a little insidious: “train” and “learn” and even “neuron” encourage equivocation. They named them like that as analogies, but they all imply personhood, like the AI is doing something that its authors aren’t.
All AI so far is software, a computer program in the sense that it’s a list of instructions that run on a processor. They use the same instruction set, the same AND and NAND and NOR, etc. AI isn’t doing anything computers can’t do, and the process of creating the program to run already has a name, “programming”. The “training” phase is part of the computer programming process that is partially automated with complex algorithms that copy, process and transform large amounts of data, resulting in finalizing the program (adjusting the “weights,” aka variables).
The AI structure without those variables having been set is nothing. It will output noise. I would contend that the authors of all of that “training” data are in fact coauthors of the AI model. The values of the weights came as much from their work as it did from the AI dev’s work.

Yes, I’m from germany, but what does that mean to you?

Just regarding talk of banning ChatGPT.
I’ve had some English students who work for the Competition Council (not in Germany) and have had conversations about Europe’s approach to monopolization; albeit not directly about AI, because it hasn’t yet made itself very felt. But I get the impression that the EU and UK are more of what Americans might call “authoritarian” regarding “the market”. Handing out enormous fines to Microsoft, Google, oil and power companies! Where I come from, that’s straight-up communism :slight_smile:

That a company might program their closed-source software with published but not completely free works from millions of authors and then offer a service - not even the software itself - for money, producing output similar to those millions of works, seems unlikely to be accepted with no conditions in the EU.

1 Like

The EU is very protective about the people’s privacy (almost tiktok ban) and rights (italy’s chatGPT ban). Some decisions are a bit overkill but I wouldn’t put any specific label, like communism, on it

Neither would I, but people where I come from would. I once got harassed by an old-timer in a laundromat because I was wearing a t-shirt with a hammer and sickle on it saying CCCP. He said, “You Russian?” and I understood “rushing”. Amusing misunderstanding, but I felt like an outsider.
Other Americans might use a term like “interventionism”. Americans are probably fairly split, but policy always seems to favor letting corporations make their own decisions.

Wandered pretty far here. Apologies. I’m curious to see how the rest of Europe will handle this.

1 Like