Diff Merge binary files how to open

Use binary files in Git repositories more efficiently with Git LFS

File types that cannot be used efficiently in Git repositories are binary files. In other words, exactly those files that cannot be opened with a common text editor. With Git LFS you can manage binary files more efficiently, Git LFS is an extension for the Git client, but a corresponding counterpart is also required on the server.

But before looking at Git LFS, understand how Git stores files internally to understand why you need a solution like Git LFS.

Git internals #

Those who know how Git manages files internally can safely skip this section.

Git is a distributed version control program, so all changes made in the repository must also be available locally. To do this, Git saves every new or changed file in its entirety in the repository. For speed reasons, not applicable diffs are saved, but the complete files.

Specifically, it says that if a new file is added in a commit in a new and empty repository, then Git compresses the file and simply put it somewhere in the folder. If there are plain text files, which is definitely the case with software source code, it works quite well because so little storage space is used. If the said text file is changed and the change is committed to the repository, then Git saves the file from scratch as well. When checking out the commit before that, Git just needs to decompress the other file and move it into the working directory. It usually happens very quickly.

Now you should see why you shouldn't use LFS binary files without Git: Git also saves binary files in the repository with every change. Since binary files usually cannot be compressed as efficiently as text files can, the compression is almost useless. With a binary file that is 50mb in size and has been changed 10 times (each with commits), a repository is already occupied with around 500mb. The more often you change binary files, the larger the repository becomes, which also takes up a lot of space on the data carrier.

And this is exactly where Git LFS comes in.

Git LFS #

LFS stands for "Large File Storage" and is an extension of Git, which affects both the client and the server. Unfortunately there are no packages for Git LFS in the package repositories of the common Linux distributions, which is why Git LFS has to be installed via packagecloud.io.

After you have installed the Git LFS, it can be used in the repositories. As mentioned before, the server on the other side also needs support for Git LFS. The good thing is that the popular and well-known Git hosting services offer it. This applies in particular to GitHub, GitLab, Bitbucket and Gogs / Gitea.

The use of Git LFS must be configured once in each repository. This is very easy:

Not much has happened at this point, because it still has to be configured which files should be taken into account by Git LFS. If you deal with images more often, then it is files, for example. Alternatively, individual file names can of course also be specified.

Git LFS stores the file in the working directory, which you definitely have to move into the repository. The content of the file looks like this in this example:

Then you can create a remote repository, which Git LFS supports, in order to subsequently version JPG files. The whole thing can then be easily tested: Simply take a (larger) picture with you, commit and push it and then look at the file size in the directory. The whole thing should be repeated and the image adjusted more often in order to create several versions. If you do this, you can quickly see that the size of the repository does not increase significantly. My tests also showed that the local repository sometimes got bigger anyway, but not all versions of the binary file are downloaded with a fresh clone or when pulling.

Git LFS works quite simply, you only have to specify the files or file types that should be tracked by Git LFS. Then you can make commits and push as usual and Git LFS works in the background. The whole thing is not complicated.

Functionality#

So far we have discussed the problems of binary files in Git repositories and shown how to use Git LFS. However, it has not yet been shown how this works roughly internally. Instead of storing the binary file itself in the repository, Git LFS only saves a pointer to the large file that points to the LFS server. These pointers are quite small because it is not the whole binary file. In the case of a push, the binary files are then uploaded to the LFS server and the pure repository from the normal Git server. When checking out a branch or a specific commit, the versions are then loaded in the background from the LFS server and replaced in the working directory. This means that only the binary files are loaded as you really need them at the time.

By using Git LFS, you can also manage binary files quite easily, but you also lose the ability to access all revisions of a binary file offline. Since you are often online anyway, that shouldn't be too much of a problem in most cases. Nevertheless, one should be aware of when to use Git LFS and for which files or file types. Ideally, you still only manage pure source code in software projects. For large photos or other binary files, Git LFS can be a good alternative.