At core of GIT, it is a map. It is a table with key and value control.
Value is any sequence of bytes. It is converted into a hash code (a key) with SHA1 algorithm.
A ‘sha1’ code can be generated using below syntax on the git bash prompt,
$ echo “string” | git hash-object --stdin
Object Modelling.
Every object in GIT has its own SHA1 value.
SHA1 values are unique in the Universe. There are very unlikely chances that there are two identical SHA1 code for different string value.
Example:
$ echo “hello” | git hash-object --stdin –w ….. This will write the sha1 value to repository by creating an object.
If we dig inside the .git/objects directory in the local git repository, we get to see an object as shown under,
Here in the above image the ‘objects’ folder is the object database directory… the file starting with '013' is called the blob data file.
$ git cat-file <ce013….> -t …. Displays the file data type
$ git cat-file <013….> -p … displays the file contents as shown in below image.
When a file is added to staging area, that’s when the object related to each file is created in the .git directory. This is as shown in the below image.
If a file is changed and committed to the repository, a new entry is added to the object folder for the new version of file. This means that, GIT creates a snapshot or a blob object with a ‘SHA1 hash’ for each version of file and preserves it.
To reinstate a file to it earlier version the SHA1 has can be referred.
For every action of update to the git repository, GIT creates a SHA1 (snapshot) for the file version.
Relational objects in Git Database
Take an example as shown in below image.
If we dig inside the directory '27' as shown above and query the SHA1 code with git cat-file command, we get below output.
The commit ‘sha1’ code includes information about the committer, author and also the tree information as to which blob this commit is related to.
If we try to get information about the tree sha1 code, we get details about all the commits that included in it.
For example,
./ commit (Tree) --> (blob) file1 --> blob content --> (Tree) --> …….
Relation of Blob Hash Code with the file contents
If we create files with same content the blob object hash code for these files is the same. As shown in below image file1 and file2 has same contents thus the SHA1 hash code is also the same.
So, a blob object hash code is not about the file but relates to the contents of the file. This is what the snapshot concepts means. The author and file permission information about a file is stored in the tree type object SHA1 hash of the file.
GIT stores the file contents in the form of the hash code and because of this, the size of the Git repository database is very light weight as compared to other version control system.
Tree type data object in Git repository
Git stores file contents in a manner similar to a UNIX file system, but a bit simplified. All contents are stored as tree and blob objects, with tree type data objects corresponding to UNIX directory entries and blob data type objects corresponding to more or less inodes or file contents. A single tree object contains one or more entries, each of which is the SHA-1 hash of a blob or sub tree with its associated mode, type, and filename.
As shown in image below, one can query the repository files for object data type with git cat-file command.
The master^{tree} syntax specifies the tree object that is pointed to by the last commit on your master branch. Notice that the src, target subdirectory isn’t a blob but a pointer to another tree.
If we follow all the internal pointers, we can get an object graph something like this,
Comments