{"id":615,"date":"2019-10-18T20:13:57","date_gmt":"2019-10-19T03:13:57","guid":{"rendered":"http:\/\/35.243.195.209\/?p=615"},"modified":"2019-11-18T22:39:52","modified_gmt":"2019-11-19T06:39:52","slug":"git-large-file-storage","status":"publish","type":"post","link":"https:\/\/nanzhou.cc\/index.php\/2019\/10\/18\/git-large-file-storage\/","title":{"rendered":"Git Large File Storage"},"content":{"rendered":"<h2>Summary<\/h2>\n<p>In the post, I will introduce a common usage of large file storage in Git together with a tool <a href=\"https:\/\/rtyley.github.io\/bfg-repo-cleaner\/\">bfg<\/a>. <\/p>\n<h2>Background<\/h2>\n<p>Say one day you found an engineer in your team accidentally commit a binary file (for example a shared library) to the remote repository. He argued the file must be accessible in the repository. You are trying to find a better way to do it. <\/p>\n<h2>Conclusion<\/h2>\n<p>Since it is bad to keep large files with codes, the first thing we want to do is reverting all related commits. After it, we need to use another method to store large files. <\/p>\n<p>The conclusion is that we could use <code>bfg<\/code> and <code>lfs<\/code> to do the job. <code>bfg<\/code> is a faster version of <code>git branch cleaner<\/code> and <code>lfs<\/code> is an extension of Git to do large file storage. <\/p>\n<h3>1. Install bfg<\/h3>\n<p>Visit <a href=\"https:\/\/rtyley.github.io\/bfg-repo-cleaner\/\">the official website of bfg<\/a>. Download the jar file. Then you are all set with <code>bfg<\/code>.  <\/p>\n<h3>2. Clean the Remote Branch<\/h3>\n<p>We will use the <code>--delete-files<\/code> option.<\/p>\n<pre><code class=\"language-bash\">## delete corresponding files\ncd \/path\/to\/your\/repo\nrm lib\/libxxx.so\ngit add .\ngit commit -m &quot;rm shared lib&quot;\n## use bfg\njava -jar bfg.jar --delete-files libxxx.so \/path\/to\/your\/repo\n## since old hash values might be changed, you should force remote to update\ngit push origin your-branch -f<\/code><\/pre>\n<h3>3. Install lfs<\/h3>\n<p>Visit <a href=\"https:\/\/github.com\/git-lfs\/git-lfs\/wiki\/Installation\">here<\/a> for detailed instructions. <\/p>\n<pre><code class=\"language-bash\">curl -s https:\/\/packagecloud.io\/install\/repositories\/github\/git-lfs\/script.deb.sh | sudo bash\nsudo apt-get install git-lfs\ngit lfs install<\/code><\/pre>\n<h3>4. Track Large Files<\/h3>\n<pre><code class=\"language-bash\">cd \/path\/to\/your\/repo\ngit lfs install\ncd lib\ngit lfs track &#039;*.so&#039;\n# you will see a .gitattributes file in the current directory\ngit add .gitattributes \ngit add lib\/libxxx.so\ngit commit -m &quot;add libs using lfs&quot;\n# after this, the `.so` files in the lib will use lfs to do tracking rather than the origin git<\/code><\/pre>\n<p>Make sure your co-workers also installed <code>lfs<\/code> correctly. Then just do push or pull as you normally would. After these steps, instead of storing large binary files, <code>lfs<\/code> stores file pointers in the repo and store files in separate servers. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary In the post, I will introduce a common usage of large file storage in Git together with a tool bfg. Background Say one day you found an engineer in your team accidentally commit a binary file (for example a shared library) to the remote repository. He argued the file must be accessible in the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25,44],"tags":[],"class_list":["post-615","post","type-post","status-publish","format-standard","hentry","category-software-engineering","category-version-control"],"_links":{"self":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/615","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/comments?post=615"}],"version-history":[{"count":1,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/615\/revisions"}],"predecessor-version":[{"id":617,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/posts\/615\/revisions\/617"}],"wp:attachment":[{"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/media?parent=615"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/categories?post=615"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nanzhou.cc\/index.php\/wp-json\/wp\/v2\/tags?post=615"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}