Submodules #
What are Submodules ?#
A submodule is essentially a repository embedded within another repository. It allows you to include and manage external repositories within your main project. This is particularly useful for integrating third-party libraries or dependencies that are maintained separately.
Note
Once a submodule is initialized and updated, its content appears as a regular folder inside the parent repository.
Why are Submodules Useful?#
When working on a project, you may need to incorporate another project, such as a library developed by someone else or a tool you’re building for use in multiple projects. A common challenge in these situations is maintaing separation between the two projects separate while being able to use one with the other.
For example, imagine you’re conducting research and want to use a data analysis tool you previously developed for another project. You have two options: you could copy the code from the old project into your new one, or you could include it from a shared source. The problem with copying the code is that if you make any changes, it can be difficult to merge those changes back into the original tool later. Conversely, including it from a shared source may limit your ability to customize it, and ensuring that all collaborators have access to it can be challenging.
addresses this issue with something called submodules. Submodules allow you to keep a repository as a subfolder within another repository. This setup enables you to selectively include specific versions of external repositories in your main project. This is particularly useful for incorporating your own tools or third-party resources that are maintained separately, while allowing you to keep your changes separate.
Features of Submodules #
Separation: Submodules remain independent repositories, so their versioning history is separate from the parent repository. This allows for better modularity and organization of code, as each submodule can evolve independently.
Pinning: Submodules are usually pinned to specific commits, ensuring reproducibility by locking them to a particular state. This means that when you clone the parent repository, you get the exact version of the submodule that was used at the time of the last commit.
Updates: Submodules can be updated independently or synchronized with the parent repository. You can choose to pull the latest changes from the submodule’s repository without affecting the parent repository, or you can update the submodule reference in the parent repository to point to a new commit.
Benefits of Submodules for Reproducibility#
Pinning Submodules: By pinning submodules to specific commits, you ensure the same version of an external library, dataset (remember LFS!), or tool is always used, which is crucial for reproducibility in complex projects. This helps avoid issues that arise from changes in dependencies.
Independent Versioning: Each submodule has its own versioning history, allowing it to be maintained separately from the parent project. This means that updates or changes in a submodule do not directly impact the parent repository unless explicitly updated.
Flexibility: Submodules can be easily updated or switched to different versions without affecting the rest of the project. This flexibility allows developers to experiment with new features or fixes in a submodule while keeping the main project stable.
Use Cases for Submodules#
Third-party Libraries#
If you are developing a project that relies on third-party libraries, you can use submodules to include these libraries in your project without merging them directly into your project. This makes it easier to manage updates and changes. For example, you might use a submodule to include the code of a research paper you want to integrate into your analysis.
Separate Repositories#
For projects that consist of multiple repositories, submodules allow you to link these repositories together while maintaining separate version control for each. This allows for modular project management while still working on them cohesively. For example, this course is organized into several repositories, each containing different sections of the course material.
using-git-in-accademia
├── ci-cd-workflows
├── git-and-science
├── git-and-its-remotes
├── working-with-git
Essential Commands for Submodules#
Here’s a simple overview of the basic commands for working with submodules:
Command |
Description |
---|---|
|
Add a new submodule to the project. |
|
Initialize & update, fetching the latest changes from remote repo. |
|
Check the status of the submodules (in [path]). |
|
Update the submodule to the latest commit on the tracked branch. |
|
Remove a submodule from the parent repository. |
|
Initialize submodules in a cloned repository. |
|
Update the submodules to the commit specified in the parent repo. |
Working with Submodules#
Remember that a submodule usually does not track a branch, so before you start working in a submodule, checkout the branch you want to work on!
Once the submodule is initialized, you can work inside the submodule folder as if it were an ordinary repository.
After making any changes in a submodule, simply add the path to the submodule to a commit in the parent repository to update the commit that the parent repository should track.
Gotchas for Submodules#
Submodules Do Not Update Automatically ⚠️: When you clone a repository that contains submodules, the latter are not automatically updated to the latest commit. You need to run
git submodule update
or use the--recurse-submodules
option when cloning to ensure they are initialized and updated.Repository Resides in the
.git
Folder of the Parent Repo 🔒: The metadata for submodules is stored in the parent repository’s.git
folder.
This means that the actual repository for the submodule is not in its own separate .git
folder, which can lead to confusion. Be cautious that providing access to the parent repository’s .git
folder grants access to the history of all its submodules!
Submodule Commits Are Detached 🤔: submodules are designed to be pinned to a specific commit and do not track a branch.
When you check out a submodule, it is usually in a “detached HEAD” state, generally meaning it is not on a branch. This can be confusing if you try to make changes directly in the submodule without creating a new branch first.
Tip
You can set up a submodule to track a branch with the -b
option:
git submodule add -b <bname> https://gitlab.com/...
Alternatively, navigate into the directory of an existing submodule (e.g., mySub
) and run:
git checkout bname
git branch --set-upstream=origin/bname
cd ../ # you leave the submodule
git add mySub
git commit -m "Tracking branch bname in mySub"
Submodule URLs Can Change 🔗: If the URL of a submodule repository changes, you must update the
.gitmodules
file in the parent repository. Failing to do so can lead to broken links when trying to update or clone the submodule.Cloning with Submodules Requires Extra Steps 🛠️: When cloning a repository with submodules, you need to use the
--recurse-submodules
option or rungit submodule init
andgit submodule update
afterward. Forgetting these steps can lead to missing submodule content.Submodules Can Increase Complexity 🌀: Using submodules can add complexity to your project structure. If not managed properly, it can lead to confusion about which version of a submodule is being used and how it relates to the parent repository.
Handling Submodules in CI/CD Pipelines#
When using submodules in your GitHub Workflows or GitLab Pipelines, you need to ensure that the submodules are properly initialized and updated on the runner.
GitHub Workflow Example: Handling Submodules#
In GitHub Actions , you can use the actions/checkout
action with the submodules
option set to true
to ensure submodules are cloned and updated as part of the workflow.
Example: GitHub Workflow (.github/workflows/ci.yml
)
name: CI with Submodules
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repository with submodules
uses: actions/checkout@v2
with:
submodules: true # Initialize and update submodules
fetch-depth: 0 # Ensure full history is fetched
- name: Build Project
run: |
# Example build command
make build
This ensures that the submodules are checked out and updated as part of your CI workflow.
Pipeline: Handling Submodules#
In GitLab CI, you can add the GIT_SUBMODULE_STRATEGY
option to ensure submodules are fetched during the CI pipeline.
Example: GitLab Pipeline (.gitlab-ci.yml
)
variables:
GIT_SUBMODULE_STRATEGY: recursive # Or 'normal'
GIT_SUBMODULE_FORCE_HTTPS: "true" # Rewrite url to use HTTPS
GIT_SUBMODULE_DEPTH: 0 # Fetch full history
stages:
- build
build_project:
stage: build
script:
- make build
You only need to set the GIT_SUBMODULE_STRATEGY
variable and the submodules will be fetched.