top of page
  • Writer's pictureplusforum. in

04 - Git Version control System - How does it work?

Updated: Aug 29, 2023

At core of GIT, it is a map. It is a table with key and value control.

Value is any sequence of bytes. It is converted into a hash code (a key) with SHA1 algorithm.

A ‘sha1’ code can be generated using below syntax on the git bash prompt,

$ echo “string” | git hash-object --stdin

Object Modelling.

Every object in GIT has its own SHA1 value.

SHA1 values are unique in the Universe. There are very unlikely chances that there are two identical SHA1 code for different string value.

Example:

$ echo “hello” | git hash-object --stdin –w ….. This will write the sha1 value to repository by creating an object.

If we dig inside the .git/objects directory in the local git repository, we get to see an object as shown under,

git repository objects folder contents





Here in the above image the ‘objects’ folder is the object database directory the file starting with '013' is called the blob data file.

$ git cat-file <ce013….> -t …. Displays the file data type

git cat-file data type for a git database object



$ git cat-file <013….> -p … displays the file contents as shown in below image.

git hash object contents


When a file is added to staging area, that’s when the object related to each file is created in the .git directory. This is as shown in the below image.

each file has a related database object in git repository









If a file is changed and committed to the repository, a new entry is added to the object folder for the new version of file. This means that, GIT creates a snapshot or a blob object with a ‘SHA1 hash’ for each version of file and preserves it.

To reinstate a file to it earlier version the SHA1 has can be referred.

For every action of update to the git repository, GIT creates a SHA1 (snapshot) for the file version.

Relational objects in Git Database

Take an example as shown in below image.

relational database object in git database
















If we dig inside the directory '27' as shown above and query the SHA1 code with git cat-file command, we get below output.

git cat-file -p




The commit ‘sha1’ code includes information about the committer, author and also the tree information as to which blob this commit is related to.

If we try to get information about the tree sha1 code, we get details about all the commits that included in it.

For example,

git cat-file -p results on tree data object




./ commit (Tree) --> (blob) file1 --> blob content --> (Tree) --> …….

Relation of Blob Hash Code with the file contents

If we create files with same content the blob object hash code for these files is the same. As shown in below image file1 and file2 has same contents thus the SHA1 hash code is also the same.

the hash objects are same for files having similar contents




So, a blob object hash code is not about the file but relates to the contents of the file. This is what the snapshot concepts means. The author and file permission information about a file is stored in the tree type object SHA1 hash of the file.

GIT stores the file contents in the form of the hash code and because of this, the size of the Git repository database is very light weight as compared to other version control system.

Tree type data object in Git repository

Git stores file contents in a manner similar to a UNIX file system, but a bit simplified. All contents are stored as tree and blob objects, with tree type data objects corresponding to UNIX directory entries and blob data type objects corresponding to more or less inodes or file contents. A single tree object contains one or more entries, each of which is the SHA-1 hash of a blob or sub tree with its associated mode, type, and filename.

As shown in image below, one can query the repository files for object data type with git cat-file command.

all object datatypes under a certain tree object in a branch










The master^{tree} syntax specifies the tree object that is pointed to by the last commit on your master branch. Notice that the src, target subdirectory isn’t a blob but a pointer to another tree.

tree and blob object relation













If we follow all the internal pointers, we can get an object graph something like this,


git database object tree

Comentários


bottom of page