Learning About Git Large File System (LFS)

That is where Git LFS comes in.

What is Git Large File Storage (LFS)?Git LFS logo from Atlassian: https://wac-cdn.

atlassian.

com/dam/jcr:83e62b1a-36ee-4fc1-82de-7a06640f7c87/hero.

svg?cdnVersion=340Git LFS is a git extension, programmed in Go, which was created by developers at Atlassian and Github as well as other open source collaborators to circumvent the file size restriction in git.

It does this by storing large files in a separate location from your repository and placing a pointer file in your repository directing to its location.

The best way for me to understand how this works was to first forget about Github, Bitbucket, Gitlab, and remote repositories for a second.

Let’s just focus on the local computer, which is depicted as a monitor below with three sections (“Working Copy”, “Local Repository”, “LFS Cache”).

Git LFS structure image from Tower: https://www.

git-tower.

com/learn/content/01-git/01-ebook/en/02-desktop-gui/05-advanced-topics/07-git-lfs/03-setup-git-lfs.

png.

The local repository is a directory or folder you see on your computer, which has been initialized as a git repository using git init or cloned from a remote repository.

The working copy is the representation of the files or folders that are being edited in the local repository.

The LFS cache is the separate storage for large files once they are pushed through git.

Keep these terms in mind — they will come in handy as we go through the steps on how to work with Git LFS in the next section.

Working with Git LFSThe great thing about Git LFS is that we can continue to use the usual git commands and workflow we all know and love with it.

The only changes are a few additional commands and another storage location to keep in mind.

Ok, now that we have some information about git and Git LFS, let’s walk through how to use it.

I will go through two possible scenarios but first, download Git LFS via Homebrew (brew install git-lfs) or through their website (https://github.

com/git-lfs/git-lfs/releases).

Scenario 1: Using Git LFS after getting an error message with the usual git commands.

Here I have a new repository in which I placed a large data file (1.

9GB).

I wanted to make sure any changes to the data file are tracked and eventually backed up remotely.

First, I go through the usual git commands to stage the file (git add), save a copy of the changes on the local repository (git commit), and push the copy to the remote repository (git push).

This is the output I get:Error message after going through the usual git commands — note the error is because the file is too largeHow should I resolve this error?.One option is to undo the changes using git reset and either forget about saving the file, zip the file to compress it to a smaller size, or restart with Git LFS.

Another option would be to stay where you are and integrate Git LFS so you can continue the process, which is what we will focus on here.

Step 1: Once Git LFS is installed, enable the specific repository with Git LFS by running git lfs install.

Although we have installed Git LFS on your computer, we will need to tell it which repositories need its service.

A great analogy is a storage company.

Storage companies are available throughout the city where we can choose to store your items but they do not automatically knock on the door and start storing the items.

Instead, the first step is to start a relationship with the company by calling and setting up an agreement.

It is the same here.

To enable Git LFS “services” in a specific repository or to tell Git LFS the repository to initialize its “services”, run git lfs install.

Run git lfs install to initialize your repository with Git LFS.

Successful initialization of Git LFS in the repository.

Step 2: Tell Git LFS which files to track with the command: git lfs track “*.

file_extension”.

Again, we need to tell Git LFS what files or what types of files we would like it to track so the files can be stored at a separate location instead of in git to avoiding getting the error message again.

To do so, run git lfs track “*.

file_extension”.

For example, if all csv files need to be tracked, run git lfs track “*.

csv” or if all jpeg image files need to be tracked, run git lfs track “*.

jpg”.

The asterisk (*) represents all files.

The quotes (“ ”) are necessary when running this code.

Without them, there will be an error later.

Run git lfs track “*.

csv” to tell Git LFS to start tracking all csv files.

It is also possible to tell git lfs to track specific files like in the image above.

Same as how a receipt will be received when an order is placed with a storage company to start storing an item, when we track a file with Git LFS, a .

gitattributes file will be created.

If there is already a .

gitattributes file, the file is added as a new line in it.

Once a file is tracked by Git LFS, a .

gitattributes file is created and the file being tracked will be listed in this file.

Step 3: Git add, commit, & push your .

gitattributes file to your repo.

Similar to the .

gitignore file, as Git LFS tracks new files, updates are automatically made to the .

gitattributes file.

To make sure the changes are being tracked, each time the .

gitattributes file is updated, it needs to be staged and commited, otherwise issues may occur later on.

Step 4: Now the real secret in this scenario is to use git lfs migrate to move your commits from git to Git LFS.

What allows us to stay in the current state, not have to undo our commits and restart, is a nifty line of code that lets us move or “migrate” our commits from git to Git LFS.

To move our commits, we can run git lfs migrate import — include “*.

file_extension”.

In order to see what file types are in the commits and can be tracked by Git LFS, we can run git lfs migrate info.

By moving our commits over, we can continue to the next step: pushing our changes Github.

More details in the next section.

Run git lfs migrate info to obtain information on the types of files that are in the commits being moved over to Git LFS.

Run git lfs migrate import — include “*.

csv” to move the commits of csv files to Git LFS.

Important Note: Moving commits involves rewriting the history.

A tag can be added to prevent overwriting the changes listed in the history but this will prevent this line of code from running.

Step 5: Lastly, run git push to push the changes to Github and the large commits (ie.

large files) to Git LFS.

After migrating the commits to Git LFS, currently we have a local git repository which has been updated with a change (in this case, added a new data file, which is indicated by a pointer file directing to Git LFS) and a local Git LFS cache, which now stores the data file.

In the next step, we push the changes to Github.

The local git repo with files within the file size criteria (ie.

source code, pointer file) will be stored to Github, which is the Git host indicated in the image below, and the Git LFS cache will be stored in the Git LFS store on the cloud.

With git push, a copy of the git repo is pushed to Github and the Git LFS cache is pushed to Git LFS store on the cloud.

Image source: https://www.

atlassian.

com/git/tutorials/git-lfsScenario 2: Using Git LFS from the beginning.

If it is known that there are large files in the repository, we can use Git LFS from the beginning by going through steps 1 to 3 listed above.

After going through these steps, return to the usual git commands (git add, git commit) to stage and save the changes in the local repo.

Then, complete step 5 listed above, to push the changes to Github or other Git host and the remote Git LFS store.

ConclusionSeems pretty simple right?.Just remember the five steps above and we should be good to go.

Pulling down the changes from a remote repository is also straightforward.

It is the same set of git commands we typically use: git pull or git fetch and git merge.

Using git pull to pull down the changes which include the data file saved in Git LFS.

When pulling down changes from the remote repo, changes from the remote repo will be pulled (as usual) and any objects saved in Git LFS will be replace the pointer file in the pull down to the local computer.

Well, it was actually pretty confusing to learn at first.

Here are some notes on what I learned as I fumbled my way through this:For those uncomfortable with git, this can add another layer of complexity.

This was my biggest challenge when learning about Git LFS.

Learning more about the git commands, git workflow, and how Git LFS fits in with git was key to learning the steps.

This video on Atlassian’s website gave me the “epiphany” I needed to put it all together.

Even with Git LFS, there is still a file size limit of 2GB, which is a restriction placed by Github.

Anything bigger, it is probably time to look into cloud storage.

Git LFS is an active open source project which is continuously being improved.

Their github keeps a running list of current issues here.

There are still issues when trying to resolve merge conflicts.

It is best to communicate within the team before pushing any changes and merging.

Larger files can still be a bit slow when being pushed to the remote repo.

Overall, I really enjoyed learning about this resource and it was very helpful in making me more comfortable with using git.

ResourcesI would not have been able to understand anything about this topic without the knowledge gleaned from the below resources.

I recommend checking them out.

Git LFS Website: https://git-lfs.

github.

com/Atlassian Git LFS Tutorial: https://www.

atlassian.

com/git/tutorials/git-lfsGitLab Git LFS Documentation: https://docs.

gitlab.

com/ee/workflow/lfs/manage_large_binaries_with_git_lfs.

htmlDzone — What is Git LFS: https://dzone.

com/articles/learning-git-what-is-git-lfsOh Shit Git: https://ohshitgit.

com/Visualizing Git: https://git-school.

github.

io/visualizing-git/.. More details

Leave a Reply