- A Brief Introduction#
What is ?#
Repository
A data structure containing a set of files along with some metadata.
Main Features#
Conceptually, is a tool to track, distribute and manage changing content.
Tracking Changes#
Several techniques are combined to document the course of a repository thoroughly:
Unique identifiers for files and directories using SHA-1
Details
Git uses a cryptographic hash function (SHA-1) to create a unique identifier for each file and directory in the repository. This hash is used to identify the file’s contents, and any changes to the file will generate a new hash.
Entire repository state captures (snapshots) at each commit
Details
When you commit changes to a Git repository, Git creates a snapshot of the entire repository at that point in time. This snapshot includes the hashes of all the files and directories in the repository.
Store differences between old and new file versions
Details
When you make changes to a file, Git doesn’t store the entire new version of the file. Instead, it stores the differences (or “deltas”) between the old and new versions of the file. This allows Git to efficiently store multiple versions of a file without storing the entire file multiple times.
Store history as a graph of commits and relationships
Details
Git stores the history of the repository as a graph database, where each commit is a node in the graph, and the edges represent the relationships between commits. This setup allows Git to navigate the repository’s history efficiently and determine the relationships between different commits.
The cache of changes ready to be committed
Details
Git uses an index (also known as the “staging area”) to keep track of the changes that are about to be committed. The index is a cache of the changes that are ready to be committed, and it’s used to optimize the commit process.
generate a history of changes in a repository,
maintain several versions of a single repository,
1A state is identified by a hash of the content and each state contains the hash of its preceding state(s), also called parent(s), leading to a unique identifier of the entire history.verify the integrity of the history.1A state is identified by a hash of the content and each state contains the hash of its preceding state(s), also called parent(s), leading to a unique identifier of the entire history.
Content Distribution#
What can be shared in ?
The state of the repository at a given point in time
The relationships between snapshots, including parent-child relationships
The relationships between snapshots, including parent-child relationships
Binary files containing compressed and delta-encoded data for a set of commits
Not shared are:
Local files that have not been added for tracking
A collection of changed files that should be added to the next commit
supports several protocols to send and receive content:
Secure protocol for connecting to remote repositories
URL Scheme
Used for SSH connections to remote repositories
ssh://user@host:port/path/to/repo.git
Authentication
Use a public/private key pair to authenticate with the remote
Provide username and password to authenticate with the remote
Widely used (secure) protocol for connecting to remote repositories
URL Scheme
Used for HTTPS connections to remote repositories
https://github.com/user/repo.git
Authentication
Provide username and password to authenticate with the remote
Usually obtained through the remote’s API or web interface
Native protocol for Git, optimized for read-only access
URL Scheme
Follows the scheme for ssh or https but prepends git+
git+ssh://user@host:port/path/to/repo.git
git+https://host/user/repo.git
Connecting to a repository on the same machine
URL Scheme
Similar to accessing local files.
file:///path/to/repo.git
file:../path/to/repo.git
-
2This is what makes it a distributed, and thus more resilient, version control system.
propagate entire repositories (including history),2This is what makes it a distributed, and thus more resilient, version control system.
deploy simple states of a repository,
share changes to a repository.
Managing Changes#
In addition to distributing changes and states, also allows for precise control over content integration and/or recombination.
Synthesize different versions to a single state
Events in the history can be reordered or combined
Decide which version should have changes applied
Treat the difference between versions as applicable changes
Another feature of is its ability to act across repositories. Similar to changes within a repository, changes can also be propagated between repositories:
Exchange changes between two repositories
Apply changes from one repository to another
Treat the difference between repositories as applicable change
consolidate changes from different sources,
identify conflicting changes and allow for transparent resolution,
maintain multiple versions concurrently,
exchange incremental changes between versions.
Limitations#
’s design paradigm imposes certain constraints that are worth having a closer look at.
Automatic Synchronization#
No coupling between changes and distribution
Synchronization is not coupled to change events in a repository.
This means that a change, such as a commit, in the repository doesn’t automatically trigger synchronization across all copies (clones) of the repository.
Therefore:
Collaboration mediated by happens asynchronously, i.e., changes are propagated when explicitly requested (e.g., by pushing or pulling).
Simultaneous editing is not supported. If two users edit the same file at the same time, the changes must be merged manually.
Synchronizing the state of a repository is part of the workflow.
Functional Consistency#
tracks only the content of files
Git tracks only the content of files, not the functionality of the code within them.
There is no guarantee that:
-
3This is rather obvious, if you think about it. But it is something that might not be thought about if is not reporting any conflicts when merging changes from different sources.
Git will identify breaking changes as a conflict;3This is rather obvious, if you think about it. But it is something that might not be thought about if is not reporting any conflicts when merging changes from different sources.
Two functioning versions combined will not result in a dysfunctional version.
Variety of Trackable Files#
Not all file types and sizes are well supported
tracks changes line-by-line which can be problematic.
This leads to limited support for:
Binary files,
File types that include metadata (e.g.
.docx
),
4Git LFS (Large File Storage) is a solution to this problem. It allows to store binary files outside of the repository and only track their location and version.Large files (’s size limit for a single file is 2 GB).4Git LFS (Large File Storage) is a solution to this problem. It allows to store binary files outside of the repository and only track their location and version.