Summary
In the post, I will introduce a common usage of large file storage in Git together with a tool bfg.
Background
Say one day you found an engineer in your team accidentally commit a binary file (for example a shared library) to the remote repository. He argued the file must be accessible in the repository. You are trying to find a better way to do it.
Conclusion
Since it is bad to keep large files with codes, the first thing we want to do is reverting all related commits. After it, we need to use another method to store large files.
The conclusion is that we could use bfg
and lfs
to do the job. bfg
is a faster version of git branch cleaner
and lfs
is an extension of Git to do large file storage.
1. Install bfg
Visit the official website of bfg. Download the jar file. Then you are all set with bfg
.
2. Clean the Remote Branch
We will use the --delete-files
option.
## delete corresponding files
cd /path/to/your/repo
rm lib/libxxx.so
git add .
git commit -m "rm shared lib"
## use bfg
java -jar bfg.jar --delete-files libxxx.so /path/to/your/repo
## since old hash values might be changed, you should force remote to update
git push origin your-branch -f
3. Install lfs
Visit here for detailed instructions.
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
4. Track Large Files
cd /path/to/your/repo
git lfs install
cd lib
git lfs track '*.so'
# you will see a .gitattributes file in the current directory
git add .gitattributes
git add lib/libxxx.so
git commit -m "add libs using lfs"
# after this, the `.so` files in the lib will use lfs to do tracking rather than the origin git
Make sure your co-workers also installed lfs
correctly. Then just do push or pull as you normally would. After these steps, instead of storing large binary files, lfs
stores file pointers in the repo and store files in separate servers.