- A Brief Introduction#

What is ?#

Repository

A data structure containing a set of files along with some metadata.

is …
  • a distributed version control system (VCS),

  • a software providing functionalities to …

    • track

    • propagate

    • manage

    changes in a repository,

  • open source,

  • supported on all major operating systems   .

Main Features#

Conceptually, is a tool to track, distribute and manage changing content.

Tracking Changes#

Several techniques are combined to document the course of a repository thoroughly:

Unique identifiers for files and directories using SHA-1

provides version control functionalities to:
  • generate a history of changes in a repository,

  • maintain several versions of a single repository,

  • 1A state is identified by a hash of the content and each state contains the hash of its preceding state(s), also called parent(s), leading to a unique identifier of the entire history.
  • verify the integrity of the history.1A state is identified by a hash of the content and each state contains the hash of its preceding state(s), also called parent(s), leading to a unique identifier of the entire history.

Content Distribution#

What can be shared in ?

Snapshots

The state of the repository at a given point in time

Relationships

The relationships between snapshots, including parent-child relationships

Deltas

The relationships between snapshots, including parent-child relationships

Packfiles

Binary files containing compressed and delta-encoded data for a set of commits

 Not shared are:

Working Directory

Local files that have not been added for tracking

Staging Area

A collection of changed files that should be added to the next commit

supports several protocols to send and receive content:

Supported protocols
SSH

Secure protocol for connecting to remote repositories

HTTP(S)

Widely used (secure) protocol for connecting to remote repositories

Git Protocol

Native protocol for Git, optimized for read-only access

Local File System

Connecting to a repository on the same machine

allows to easily propagate content:
    2This is what makes it a distributed, and thus more resilient, version control system.
  • propagate entire repositories (including history),2This is what makes it a distributed, and thus more resilient, version control system.

  • deploy simple states of a repository,

  • share changes to a repository.

Managing Changes#

In addition to distributing changes and states, also allows for precise control over content integration and/or recombination.

Within a repository
Integrate

Synthesize different versions to a single state

Reorder & Combine

Events in the history can be reordered or combined

Selectively apply

Decide which version should have changes applied

Use relative changes

Treat the difference between versions as applicable changes

Another feature of is its ability to act across repositories. Similar to changes within a repository, changes can also be propagated between repositories:

Between repositories
Synchronize

Exchange changes between two repositories

Send/Receive changes

Apply changes from one repository to another

Use relative changes

Treat the difference between repositories as applicable change

can manage changes in several ways:
  • consolidate changes from different sources,

  • identify conflicting changes and allow for transparent resolution,

  • maintain multiple versions concurrently,

  • exchange incremental changes between versions.

Limitations#

’s design paradigm imposes certain constraints that are worth having a closer look at.

Automatic Synchronization#

No coupling between changes and distribution

Synchronization is not coupled to change events in a repository.

This means that a change, such as a commit, in the repository doesn’t automatically trigger synchronization across all copies (clones) of the repository.

Therefore:

  • Collaboration mediated by happens asynchronously, i.e., changes are propagated when explicitly requested (e.g., by pushing or pulling).

  • Simultaneous editing is not supported. If two users edit the same file at the same time, the changes must be merged manually.

  • Synchronizing the state of a repository is part of the workflow.

Functional Consistency#

tracks only the content of files

Git tracks only the content of files, not the functionality of the code within them.

There is no guarantee that:

    3This is rather obvious, if you think about it. But it is something that might not be thought about if is not reporting any conflicts when merging changes from different sources.
  • Git will identify breaking changes as a conflict;3This is rather obvious, if you think about it. But it is something that might not be thought about if is not reporting any conflicts when merging changes from different sources.

  • Two functioning versions combined will not result in a dysfunctional version.

Variety of Trackable Files#

Not all file types and sizes are well supported

tracks changes line-by-line which can be problematic.

This leads to limited support for:

  • Binary files,

  • File types that include metadata (e.g. .docx),

  • 4Git LFS (Large File Storage) is a solution to this problem. It allows to store binary files outside of the repository and only track their location and version.
  • Large files (’s size limit for a single file is 2 GB).4Git LFS (Large File Storage) is a solution to this problem. It allows to store binary files outside of the repository and only track their location and version.