Project Scaffolding#
- 📄
- 📄
- 📄
- 📄
- 📄
- 📄
- 📄
- 📄
- 📄
- 📄
- 📄
-
- 📄
- 📂
- 📂
- 📂
-
- 📂
-
- 📂
- 📂
- 📂
-
- 📂
- 📂
-
- 📂
- 📂
- 📄
- 📂
- 📂
- 📂
-
- 📂
🏠 Project Root
The root folder of a project should only contain generic files and foldernames.
📖 README.md
The front page of a repository.
It should serve as nexus for all metadata related to a project.
📦 pyproject.toml
The modern standard for defining build systems, dependencies, and tool settings (like ruff, pytest, or black).
📄 .env.example
A safe template for the .env file. Unlike the actual .env file, this template must be committed to version control.
🙈 .gitignore
Specifies intentionally untracked files that Git should ignore (e.g., .env, __pycache__, local data, build artifacts).
🔐 .env
Local environment variables and secrets (such as API keys, database passwords, and personal access tokens). This file should never be added to version control. It should be strictly ignored by Git to prevent leaking sensitive security credentials.
We recommend to add the .env file to .gitignore and provide a .env.example file along with a repository so to document how exactly the .env file should be structured.
⚙️ .gitlab-ci.yaml
The CI/CD pipeline definition for GitLab. It declares stages, jobs, and rules that automatically run tests, build artifacts, and deploy your project whenever changes are pushed to the repository.
⚙️ .github/
The hidden configuration directory for GitHub-specific features: CI/CD workflows, issue templates, pull request guidelines, and more.
⚙️ workflows/
GitHub Actions workflow files (YAML) that define automated CI/CD pipelines. Each file describes a workflow triggered by events such as pushes, pull requests, or scheduled runs to test, build, and deploy your project.
🤖 AI_USAGE.md
Transparency document detailing how LLMs were used in the creation of code or documentation.
⚖️ LICENSE
The legal framework for a project. It explicitly defines how others can use, modify, and distribute the code and data. Even for internal or proprietary projects, having a clear license is essential.
📝 CITATION.cff
Plain text YAML file specifying how this project should be cited.
🤝 CONTRIBUTING.md
The onboarding guide for new developers. It outlines the step-by-step process for submitting bug reports, requesting features, and creating pull requests (PRs) that adhere to the project's standards.
🛡️ CODE_OF_CONDUCT.md
Establishes the community standards and expected behavior for everyone interacting with the project. It helps ensure a welcoming, inclusive, and professional environment for all contributors.
📊 Data
The root data directory. We use a strict data pipeline separating raw inputs from processed outputs. Expand the folder and select a subfolder on the left to learn more about its specific rules.
📖 README.md
Information specific to the data related to this project.
📁 raw/
Immutable, original data. Do not edit these files. (Note: This also includes pointers/URL links to large external datasets that cannot be stored directly in Git.)
📁 interim/
Intermediate data that has been cleaned or transformed, but is not yet ready for final analysis.
📁 final/
Final, processed datasets ready for modeling, publication, or deployment.
🚀 Scripts
Executable scripts, job launchers, or data preprocessing entry points. These should generally be runnable from the command line.
📁 drafts/
A scratchpad folder for experimental scripts that are not yet production-ready or integrated into the main workflow.
📓 Notebooks
Jupyter notebooks for exploratory data analysis (EDA), prototyping, and interactive visualization.
📁 drafts/
A scratchpad folder for experimental notebooks that are not yet production-ready or integrated into the main workflow.
📦 Containers
Centralized location for all container definitions. Your *.def and Dockerfile files are here.
⚙️ Configuration
Centralized YAML/TOML files for parameters. This allows changing experiments without modifying code.
💻 Source Code (src/)
Location for all reusable and installable code.
📁 mypkg/
A particular package containing functions, and business logic.
📈 results/
A dedicated directory for the final outputs of of a project. If the project involves data analysis, research, or machine learning, this folder holds the generated artifacts such as figures, summary tables, trained models, and compiled reports.
🧪 Testing Suite
Ensures code reliability via automated testing. A well-structured test suite separates individual component tests from full workflow tests.
📁 unit/
Directory for unit tests. These tests are designed to verify that individual functions, classes, or methods operate correctly in strict isolation.
📁 integration/
Directory for integration tests. These tests verify that multiple modules, databases, or external services function together properly as a unified system.
📄 test_mypkg.py
A standard Python test file (often utilizing pytest). This file contains specific test cases and assertions designed to validate the functionality of the associated mypkg module.
💡 Examples
Minimal, end-to-end usage demos to help new users get started quickly.
📖 Documentation
Contains extended documentation specific content, like an online documentation (e.g., with Sphinx or MKDocs).
⏱️ Benchmarks
Performance tracking scripts to measure execution time, memory usage, and scaling.