Separation of Concerns (SoC)#
The separation of logic, parameters, environment, and data
my_data_science_project/
│
├── config/ # Centralized parameterizations
├── data/ # Input data
└── code/ # Code and scripts
.
.
Key Takeaways:
Foundational separation: Code is physically isolated from configuration parameters and datasets.
Portability: Hardcoded paths and parameters are explicitly eliminated from the source code.
Clarity: The core domains of the project are immediately identifiable to collaborators.
my_data_science_project/
│
├── config/ # Parameters (YAML/TOML)
├── .env # Environment/State (paths, secrets)
│
├── scripts/ # Logic (execution)
│
├── data/ # Data (inputs)
└── results/ # Data (outputs/deliverables)
.
.
Key Takeaways:
Environment isolation: Local paths and secrets are extracted to
.envfiles.Security: The
.envfile must never be committed to version control.Dynamic loading: Paths are fetched programmatically at runtime rather than hardcoded:
import os
from dotenv import load_dotenv
# Load local .env variables
load_dotenv()
# Fetch environment-specific paths
output_dir = os.getenv("OUTPUT_DIR")
my_data_science_project/
│
├── data/
│ ├── raw/ # Immutable data dumps
│ ├── interim/ # Intermediary data
│ └── final/ # Cleaned, tidy data
│
├── src/ # Reusable logic (Python package)
├── scripts/ # Executable logic (batch scripts)
│
├── config/ # Parameters
├── .env # Environment state
└── results/ # Outputs
.
.
Key Takeaways:
Logic division: Reusable modules (
src/) are strictly separated from executable routines (scripts/).Data lineage: Data is divided into discrete stages to document processing steps and transformations.
Immutability: Raw data is strictly preserved and never overwritten by analytical scripts.
my_data_science_project/
│
├── data/ # (raw, interim, final)
├── src/ # (reusable logic)
├── scripts/ # (executable logic)
├── results/ # (outputs)
├── config/ # (parameters)
│
├── docs/ # Documentation source files
├── pyproject.toml # Project metadata and dependencies
├── README.md # Project overview
├── LICENSE # Usage rights
├── .env.example # Environment template
└── .gitignore # Version control exclusions
Key Takeaways:
Self-description: Essential context and usage instructions are provided at the root level.
Dependency management: Required packages and metadata are defined centrally (e.g.,
pyproject.toml).Safe onboarding: Templates (
.env.example) are provided so collaborators can safely configure their local environments without sharing secrets.
Quality and Control#
my_data_science_project/
│
├── tests/ # Automated unit tests
├── benchmark/ # Performance tracking
│
├── .github/
│ └── workflows/ # GitHub CI/CD pipelines
│
├── .gitlab-ci.yml # GitLab CI/CD pipeline
.
.
Key Takeaways:
Automated Assurance: Code correctness and computational efficiency are systematically verified through dedicated
tests/andbenchmark/suites.Continuous Integration (CI): Automated pipelines (
.github/workflows/or.gitlab-ci.yml) execute tests immediately whenever the codebase is modified.Reliability: Bugs and performance regressions are caught prior to publication, ensuring that analytical results remain stable and reproducible.