Codebase Architecture#
Separation of Concerns (SoC) A structural foundation that physically isolates distinct operational domains: code, configuration, environment, infrastructure, and data.
src/ (Core Logic)Contains the generalized, reusable Python code. Functions are agnostic to state and wait to be imported.
scripts/ (Operational Execution)Procedural entry points (train.py).
They rely on environmental variables, parse configs, and apply them to src/ logic.
notebooks/ (Exploration)Reserved exclusively for EDA and prototyping.
Imports from src/, but is never imported itself.
containers/ (Infrastructure)Houses declarative blueprints (Dockerfile, Apptainer.def), keeping the repository root clean.
data/Strictly version-controlled directory containing raw and processed datasets, isolated from execution logic.
This structure allows the exact same src/ logic to be executed interactively in notebooks/ or automatically via scripts/, within a reproducible environment defined in containers/.