when cloning the whole juce repo with the whole history, we end up with 260MB
I had a look at the biggest files, below is the result.
I’m no git expert at all, but I guess it should somehow be possible to “fully” remove some of those files (AnimationAppExample.app…) from git to reduce the size a bit?
*unless you know how to take advantage of the collisions recently found in SHA by Google With those, you could theoretically manage to craft commits that have different content but still the same final checksum.
That would make for an excellent April Fool’s joke
Maybe juce5 could start as a fork --depth=1, so users can decide themselves, if they want the original with almost 10 years of history, or having only the latest development.
But I am also only an average git user, don’t know if that would work…
The problem with doing a fork is that it’d be a new repo, with none of the existing github forks or stars. We’ll have a think about ways we might be able to do this without it messing up everyone’s repos.
That would be bad because that would forget all previous history that has led to the current code.
It would be very hard to know what a certain patch changed, and why.
A better approach would be to still create a new fork, but with rewritten history, where all those huge files are simply not tracked but the commit history and the majority of changes can still be reconstructed if necessary.
Yes you are absolutely right, my idea was more having both in parallel, merging all commits from the juce5 repo to the original one, so you can work with the full history or the new one. But I just realise, that it is probably not much benefit for having to maintain two repos. Also everybody can clone with depth anyway…
For that, I think you could do it safely by taking advantage of the fact that you can actually create a new commit with no parent in the same repo.
So to say, that would be a new “Initial commit” for a new branch which is completely unrelated to the others. There, you can do what you want while still keeping the current develop and master intact.
When the result in this “purged” branch is satisfactory, you can move the develop and master branch labels to it (while perhaps keeping a “legacy” branch label on the old line of development for some time, just in case…)
…but if by “shallow” you mean with truncated history, then I’d advice against it for the reasons mentioned above.
It would be better to have the “purged” branch be the same as the original, but with the references to the big files removed. This would be possible by fiddling with the rewrite history commands of git, like git filter-branch
Ideally I think it’d be good for us to keep just e.g. 5 years history and have an archived copy of the full one for people who need to go through the ancient stuff. That’d shrink it considerably.
I’m still not entirely convinced that would be a good idea, however since that won’t stop you from considering it, then I’d suggest retaining the history since around commit aa6e9d38deca22d661218cabcbb745f6a0fea64b.
That is the commit that brought the modules structure into the main line, it dates February 2012 which also seems to fit the 5 years timeframe that you have in mind.
That being said, also be aware that altering the history of the repo in a destructive way will break all projects that reference JUCE as a submodule, so I think the commits of the “legacy” line should be kept living in the main repo for quite some time before being dropped completely (and possibly moved to a different, “archive” repo)
However, unless I’m missing something clever, I think any process which modifies the history of a branch will change the hashes of the commits on it - which will, in turn, break submodules using it.
Personally, I don’t have a problem with a full Juce clone being 260 MB.
I usually only clone it once on a machine anyway (and then sometimes copy the folder + switch branch if I need multiple simultaneous versions for some reason).
Shallow clones seem to be a good solution if full clones really can’t be used (I never had to use it though). “Shallow clones used to be somewhat impaired citizens of the Git world as some operations were barely supported. But recent versions (1.9 and above) have improved the situation greatly, and you can properly pull and push to repositories even from a shallow clone now.”
I wonder if the newly-discovered SHA1 vulnerability would let us fake-up the ash for an early commit so that we could do a git filter-branch that keeps all subsequent hashes the same?
Mine above was just a joke however if this ever gets applied, it’s one of the very exceptional cases in which something that appeared as a bug turns out to be a useful feature