Environments 2.0#
Virtualized Environments#
Container#
Containerized Environments#
Computational reproducibility frequently requires environmental control that extends beyond language-specific package managers. When software dependencies include complex system-level binaries, specific compiler toolchains, or specific versions of core system libraries (e.g., glibc), language-bound environments (such as Python virtual environments) are structurally insufficient.
To achieve system-level reproducibility, containerization is utilized. Containers provide a mechanism to package an application alongside its entire required user-space operating system.
Principles of OS-Level Virtualization#
Containers operate via OS-level virtualization. This paradigm isolates the user space (that’s the segment of system memory where applications, libraries, and binaries are executed) while sharing a single Operating System kernel with the host machine.
Because containers execute directly on the host kernel, this architecture eliminates overhead, resulting in execution speeds nearly identical to native host processes.
While OS-level virtualization provides near-instant instantiation and minimal overhead, the shared kernel architecture implies a structural limitation. If a containerized application requires specific kernel modules or features (e.g., specific networking subsystems or eBPF capabilities1eBPF (extended Berkeley Packet Filter) is a technology that allows programs to run sandboxed within the Linux kernel without changing kernel source code or loading kernel modules. It is widely used for networking, security, and observability tasks.) that are absent in the host’s kernel, the execution will fail.
Extending Beyond Language Environments#
As previously established, a Python .venv isolates only Python-domain packages. It inherently relies on the host operating system to provide underlying C libraries, network protocols, and hardware drivers.
Containers encompass the entire system user space. A container manifest declares the base operating system (e.g., Ubuntu 22.04, Alpine Linux) and all subsequent system-level modifications. This ensures that non-Python dependencies (such as core system libraries, compiler toolchains, or standalone software tools) are strictly versioned and isolated alongside the Python interpreter. The container isolates the execution context from the host’s global variables, configuration files, and installed binaries.
Declarative Manifests and Runtime Execution#
A container is instantiated from an image, which is compiled from a declarative manifest (e.g., a Dockerfile for Podman/Docker, or an Apptainer.def file). This manifest serves as the exact blueprint of the system state.
Beyond merely packaging software, container manifests define default execution behaviors. A manifest is not strictly a passive storage mechanism; it actively instructs the container runtime on how to execute the payload.
For example, execution commands can be structurally bound to the container so that it functions identically to an executable binary. This is achieved differently depending on the runtime utilized:
Podman (via Dockerfile)
The ENTRYPOINT directive is utilized to define the default executable:
ENTRYPOINT ["python", "/opt/pipeline/main.py"]
Apptainer
The %runscript block is utilized to pass arguments directly to the internal pipeline:
%runscript
exec python /opt/pipeline/main.py "$@"
By executing these containers directly, arguments are passed to the internal logic, abstracting the internal Python environment entirely from the end-user.
Container Implementation#
Securing the Build (.dockerignore)#
Because the Dockerfile utilizes the COPY . /app directive to bring the repository context into the image, a .dockerignore file must be present at the repository root to prevent the .env file from being permanently baked into the container.
# .dockerignore
.env
.git/
.venv/
__pycache__/
containers/pipeline.Dockerfile#
# 1. Use an official, lightweight Python runtime
FROM python:3.13-slim
# 2. Install git so the build backend can determine the package version
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
# 3. Copy the pre-compiled uv binary from the official image
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# 4. Set the working directory inside the container
WORKDIR /app
# 5. Copy the project files into the container (respecting .dockerignore)
COPY . /app
# 6. Install the package and dependencies using uv
RUN uv pip install --system --no-cache --compile-bytecode .
# 7. Create a non-root user for security
RUN useradd -m appuser && chown -R appuser /app
USER appuser
# 8. Run the script when the container launches
CMD ["python", "scripts/drafts/hello.py"]
Building and Executing via Docker#
Execute the following commands from the repository root. The --env-file flag is utilized during the run command to inject the local variables directly into the container’s isolated runtime.
# Build the image.
docker build -t my-pipeline-image -f containers/pipeline.Dockerfile .
# Execute the containerized script, dynamically injecting the .env file
docker run --rm --env-file .env my-pipeline-image
Apptainer Implementation#
Apptainer requires a structurally different blueprint (.def file).
Because Apptainer does not natively utilize an ignore file (like .dockerignore) during the %files block, it is best practice to explicitly copy only the necessary directories rather than the entire repository root . to ensure the .env file is not accidentally archived into the image.
l
containers/pipeline.def#
Bootstrap: docker
From: python:3.13-slim
%files
# Explicitly copy required directories to avoid capturing local .env files
pyproject.toml /app/
LICENSE /app/
README.md /app/
config /app/config
src /app/src
scripts/ /app/scripts/
%post
# 1. Install necessary system dependencies
apt-get update && apt-get install -y git curl
rm -rf /var/lib/apt/lists/*
# 2. Install uv package manager
curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | env UV_INSTALL_DIR="/bin" sh
# 3. Install the project in the system environment utilizing uv
cd /app
uv pip install --system --no-cache --compile-bytecode .
%runscript
# 4. Define the execution behavior
cd /app
exec python scripts/drafts/hello.py "$@"
Building and Executing via Apptainer#
Unlike Docker, Apptainer compiles the blueprint directly into a flat binary file (.sif).
During execution, Apptainer also supports the --env-file flag (in versions 1.1.0+), allowing for symmetric runtime injection.
Execute the following commands from the repository root:
# Build the .sif binary image.
apptainer build containers/pipeline.sif containers/pipeline.def
# Execute the containerized script, explicitly passing the environment file.
apptainer run --env-file .env containers/pipeline.sif
Apptainer Runtime Guide: run, exec, and shell#
In Apptainer, the interaction between the host and the container filesystem depends on the command used. While all commands inherit the %environment block, their treatment of the %runscript and the working directory differs.
Command Overview#
Command |
Primary Purpose |
Executes |
Default Working Directory |
|---|---|---|---|
|
Execute default app logic |
Yes |
Host |
|
Run specific host tools |
No |
Host |
|
Interactive debugging |
No |
Host |
Behavior and %runscript#
run: Triggers the code defined in the%runscriptsection of the.deffile. If the script containscd /app, the application will execute from that directory.exec: Bypasses the%runscriptentirely. It runs the command provided in the arguments (e.g.,apptainer exec image.sif python script.py). It ignores anycdcommands or logic defined in the image’s entry point.shell: Opens a terminal (usuallybash). Likeexec, it ignores the%runscript.
Bind Mounts and Isolation#
By default, Apptainer facilitates a “transparent” bridge. This means the container behaves as if it is a part of the host system.
Default Bindings#
For all three commands (run, exec, shell), Apptainer automatically binds:
$HOME: Your host home directory.$PWD: Your current host directory./tmp: The host temporary directory.
Forcing Internal Paths#
Because $PWD is bound by default, apptainer exec will always start in your host path. To override this and use an internal directory like /app, you must use the --pwd flag:
apptainer exec --pwd /app image.sif pwd
Total Isolation (--contain)#
To prevent the host’s $HOME and $PWD from being visible inside the container, use the --contain (or -c) flag. This ensures the environment is reproducible and unaffected by host files.
apptainer run --contain image.sif
Summary of Usage#
Use
runwhen you want the container to act as a pre-configured executable (utilizing the internalcd /applogic).Use
execwhen you need to run a specific command or script that is not the container’s primary purpose.Use
shellto enter the container and manually inspect files in/appor/opt/venv.
Image Layering and Union Filesystems#
Container images are not monolithic data blobs; they are structurally organized via layered filesystems, typically utilizing a Union Filesystem implementation (e.g., OverlayFS). This architecture dictates how images are constructed, stored, and executed.
Principles of Layered Construction#
When a container runtime processes a declarative manifest (e.g., a Dockerfile), each sequential directive (such as RUN, COPY, or ADD) generates a discrete filesystem diff.
This diff is committed as an immutable, read-only layer.
At runtime, the storage driver stacks these independent read-only layers into a single unified view. To permit application execution, a thin, ephemeral read-write layer is superimposed on top of the stack. Any modifications made by the application during runtime (e.g., writing log files) occur exclusively within this top ephemeral layer.
System-Level Resource Optimization#
This architecture provides two structural optimizations:
Deduplication: If multiple distinct container images rely on the same base image (e.g.,
ubuntu:22.04), the host system stores the base read-only layers exactly once. Multiple running containers concurrently reference the same underlying files on disk, drastically reducing storage consumption.Build Caching: During the image build process, if a specific layer’s directive and inputs remain unchanged, the runtime reuses the cached layer rather than recompiling it.
Container Runtimes#
While the underlying principle (Linux namespaces and cgroups) remains consistent, the implementations of container runtimes vary significantly, primarily regarding security and isolation policies.
Docker and Podman#
Docker is the historical standard for containerization. It relies on a background daemon running with high privileges (root). While standard in web microservices, Docker is widely banned on High-Performance Computing (HPC) clusters due to the inherent security risks of granting users access to a root-level daemon.
Podman functions as an open-source, daemonless alternative to Docker. It implements rootless containers by utilizing user namespaces, allowing unprivileged users to build and execute containers without root access. Podman is a drop-in replacement for Docker, parsing the same Dockerfile manifests and utilizing identical command-line syntax.
Apptainer (formerly Singularity)#
Apptainer is the industry standard container runtime for HPC and scientific computing environments. It diverges from Docker and Podman in several critical architectural aspects:
Image Format: Instead of managing a layered filesystem cache via a daemon, Apptainer compiles the entire container into a single, immutable flat file (Singularity Image Format,
.sif). This allows containers to be transferred, archived, and executed identically to standard binary files.Execution Privilege: Apptainer natively executes the container payload as the invoking user. No privilege escalation occurs. If
user_Aruns the container, the processes inside the container are owned byuser_Aon the host system.
Distribution via Container Registries#
Container images are distributed via OCI-compliant (Open Container Initiative) registries. This infrastructure eliminates the need to distribute source code and installation instructions, replacing them with a pre-compiled, verifiable system state.
Standard public registries (e.g., Docker Hub) are utilized for base OS images. For research software, modern version control platforms (GitHub and GitLab) provide integrated Container Registries natively attached to repositories. This allows Continuous Integration (CI) pipelines to automatically build a new container image upon every repository commit, tagged with the exact Git SHA. This provides a rigorous audit trail, linking a specific compiled execution environment directly to the version-controlled source code that defined it.
Hardware Limitations and Operational Pitfalls#
Researchers frequently encounter specific architectural limitations and operational pitfalls when adopting containers, especially when migrating from standard VM environments.
Hardware and State Limitations#
1. Hardware Architecture Binding
Like virtual environments, containers are bound to the CPU architecture upon which they are built.
A container image built on an ARM64 processor (e.g., Apple Silicon) will not natively execute on an x86_64 HPC cluster.
Cross-architecture execution requires instruction-set emulation (e.g., via QEMU), which incurs severe performance degradation.
Images must be explicitly cross-compiled or built natively on the target architecture.
2. Hardware Pass-through (GPUs)
By default, containers cannot access physical hardware accelerators on the host system.
To utilize GPUs for machine learning, the container runtime must be explicitly instructed to map host-level driver interfaces into the container namespace.
This necessitates specific runtime flags (e.g., docker run --gpus all or apptainer run --nv) and requires the host system to possess specialized toolkit integrations (e.g., the NVIDIA Container Toolkit).
3. State Persistence and Immutability
Container filesystems are ephemeral.
Any data written to the container’s internal filesystem during execution is destroyed when the container terminates.
Persistent data generation (e.g., saving model weights or processed datasets) requires the explicit configuration of bind mounts (or volumes), which map a directory on the host filesystem directly into the container’s isolated namespace.
Operational Pitfalls#
Discrepancies in Filesystem Layers#
1. Persistent Image Bloat (The Deletion Fallacy)
Because intermediate container layers are strictly immutable, deleting a file in a subsequent layer does not reclaim disk space.
Instead, the union filesystem creates a “whiteout” marker in the upper layer, which hides the file from the unified view.
The actual file payload remains permanently archived in the lower read-only layer, increasing the total image transfer size.
Mitigation: Temporary files generated during installation (e.g., apt-get caches or source code tarballs) must be downloaded, utilized, and deleted within a single RUN directive (a single layer) to prevent storage bloat.
2. Cryptographic and Credential Leaks
The deletion fallacy extends to security.
If SSH keys, API tokens, or proprietary datasets are copied into a container during a build step and subsequently removed in a later step, those secrets remain fully extractable.
Any user with access to the container registry can download the image, inspect the intermediate layer history, and extract the deleted credentials.
3. Architectural Discrepancy in Apptainer
While Docker and Podman maintain this layered architecture at rest and during runtime, Apptainer treats layers differently.
Apptainer utilizes the layered OCI cache during the build phase but ultimately “squashes” the final image into a flat, monolithic Singularity Image Format (.sif) file.
This design choice explicitly abandons host-level deduplication to optimize for High-Performance Computing (HPC) environments.
Parallel filesystems (e.g., Lustre, GPFS) exhibit severe performance degradation when managing thousands of small overlay files.
By squashing the layers into a single .sif binary, Apptainer reduces I/O metadata operations, ensuring the container loads efficiently across thousands of distributed compute nodes.
Discrepancies in Isolation Policies#
A major operational pitfall arises from the discrepancy in default isolation policies between Docker/Podman and Apptainer.
By default, Docker fully isolates the container filesystem from the host.
Conversely, to prioritize scientific workflows, Apptainer implicitly mounts the user’s home directory ($HOME), the current working directory ($PWD), and /tmp into the container at runtime.
While this facilitates immediate access to research data, it breaches environmental isolation.
If a user has a localized Python package installed in ~/.local/lib/python on the host, the Apptainer container will mount the host’s home directory, detect those host packages, and potentially override the container’s internal dependencies.
To enforce strict isolation identical to Docker, Apptainer must be explicitly invoked with the --containall flag.
Virtual Machines#
VM Environments#
Exact system states, custom kernels, and specialized drivers can be reliably versioned and replicated using Hardware Virtualization. Virtual Machines (VMs) provide a mechanism to emulate an entire physical computer system. Each VM operates a complete, independent guest operating system, ensuring absolute computational reproducibility and environmental control.
Principles of Hardware Virtualization#
Hardware virtualization is orchestrated by a hypervisor (or Virtual Machine Monitor). The hypervisor sits between the physical hardware (host) and the virtualized environments (guests), actively intercepting and allocating CPU, memory, and peripheral requests.
Hypervisors are structurally categorized into two types:
Type 1 (Bare-Metal): The hypervisor is installed directly on the physical hardware, replacing a traditional host operating system. Examples include KVM (Kernel-based Virtual Machine), VMware ESXi, and Xen. This architecture is the foundation of modern cloud computing and High-Performance Computing (HPC) virtualization.
Type 2 (Hosted): The hypervisor runs as a standard application atop an existing host operating system (e.g., Oracle VirtualBox, VMware Workstation). This is typically utilized for local desktop development rather than production deployments.
Because a full guest kernel must be booted and hardware calls must be translated by the hypervisor, VMs incur a computational overhead. However, this strict boundary provides absolute environmental determinism and enhanced security isolation.
Infrastructure as Code (IaC)#
The deployment of Virtual Machines in cloud environments is typically managed via Infrastructure as Code (IaC). IaC allows VMs and their supporting infrastructure (comprising subnets, routers, and storage volumes) to be built declaratively using configuration files.
In OpenStack environments, this is natively managed by the Heat orchestration engine, or via third-party tools such as Terraform (for infrastructure provisioning) and Ansible (for configuration management).
# Example Terraform block defining an OpenStack VM instance
resource "openstack_compute_instance_v2" "research_node" {
name = "data-processing-vm"
image_name = "Ubuntu 22.04 LTS"
flavor_name = "m1.large"
key_pair = "researcher-ssh-key"
security_groups = ["default", "allow-ssh"]
}
By version-controlling these declarative templates, the exact hardware allocation and network topology of a research environment can be reliably reproduced.
Virtual Disks, Storage, and State Management#
The filesystem of a Virtual Machine is managed fundamentally differently than the layered union filesystems utilized by containers. VM storage relies on virtual disk images and block storage abstractions.
Virtual Disk Images and Formats#
A VM’s primary filesystem is encapsulated within a single file known as a virtual disk image. When a VM boots, the hypervisor mounts this file, and the guest OS treats it as a physical hard drive.
The QCOW2 (QEMU Copy On Write) format is the standard for OpenStack and KVM environments. Unlike raw disk images, which immediately consume their fully allocated size on the host disk, QCOW2 images are dynamically expanding. Furthermore, QCOW2 inherently supports internal snapshotting, allowing the exact state of a VM to be frozen and reverted.
Ephemeral vs. Persistent Storage#
In cloud architectures like OpenStack, storage state is divided into two distinct paradigms:
1. Ephemeral Storage (Nova) When a standard VM is instantiated from an image, the boot disk is often ephemeral. It is directly tied to the lifecycle of the compute instance. If the VM is explicitly terminated or permanently crashes, the ephemeral disk and all data written to it are permanently destroyed. Ephemeral disks are typically utilized for the base OS installation and temporary scratch space.
2. Persistent Block Storage (Cinder) For research data, databases, and critical state preservation, persistent block storage is required. In OpenStack, the Cinder service provisions block storage volumes. These volumes are highly available, physically detached from the compute node, and attached to the VM over the network (typically via iSCSI or Ceph).
If the attached VM is destroyed, the Cinder volume persists independently. It can subsequently be detached and reattached to a newly provisioned VM, ensuring data lineage is maintained across compute lifecycles.
Snapshots and Image Capture#
To capture a reproducible state of a virtual machine (including its installed libraries, kernel modifications, and configurations) a snapshot is generated.
In OpenStack, snapshotting a VM creates a new image of the root disk and uploads it to the image registry. This snapshot can then be utilized as a base image to spawn identical clone VMs, facilitating rapid horizontal scaling for distributed computing tasks.
Cloud Orchestration, Tooling, and Security Models#
Managing hardware virtualization at scale requires a cloud orchestration platform. OpenStack serves as the standard open-source paradigm for institutional and private research clouds, modularizing compute, networking, and security into discrete APIs.
Core Orchestration Components#
An OpenStack environment is composed of interacting microservices, each managing a specific domain of hardware virtualization:
Nova (Compute): The primary engine that communicates with the underlying hypervisors (e.g., KVM) to provision, schedule, and terminate virtual machines across a cluster of physical physical nodes.
Glance (Image Service): The central registry for virtual disk images. It stores base operating system images (e.g., Rocky Linux, Ubuntu) and custom snapshots created by researchers. It serves a similar architectural purpose to a container registry (like Docker Hub), but distributes full OS payloads.
Neutron (Networking): Manages Software-Defined Networking (SDN). It provisions virtual switches, routers, subnets, and floating IP addresses, allowing isolated private networks to be dynamically constructed for specific research groups.
Virtualized Security Models#
Because VMs operate as full network citizens with complete operating systems, their security model is managed via network-level firewalls and cryptographic access, rather than Linux namespaces.
1. Security Groups Traffic to and from a VM is strictly governed by Security Groups. These are virtualized, stateful firewalls that are evaluated at the hypervisor level. By default, all ingress (incoming) traffic to a newly provisioned VM is implicitly denied. Explicit rules must be defined to permit access.
For example, to allow remote administration, a Security Group rule must be configured to permit ingress TCP traffic on Port 22 (SSH). To host a web dashboard (e.g., JupyterHub), Port 443 (HTTPS) must be explicitly opened.
2. Keypair Authentication
Password authentication is universally disabled on standard cloud images to prevent brute-force attacks. Access to a VM is granted exclusively via asymmetric cryptography. During instantiation, a public SSH key (the Keypair) is injected into the VM’s ~/.ssh/authorized_keys file via a metadata service (often cloud-init). Only the holder of the corresponding private key can authenticate.
VM Limitations & Pitfalls#
While hardware virtualization provides ultimate flexibility and isolation, it introduces specific operational burdens that must be systematically managed to prevent systemic failures and reproducibility loss.
Operational Pitfalls#
Configuration Drift and “Snowflake” Servers
The most significant threat to VM reproducibility is configuration drift. Because a VM behaves exactly like a physical computer, users frequently SSH into the instance to manually install packages, tweak configuration files, or update dependencies. Over time, the VM becomes a “snowflake”: a unique, fragile environment whose exact state is undocumented and impossible to replicate.
Mitigation: VMs should be treated as Immutable Infrastructure. Manual SSH modifications should be strictly prohibited. Instead, all configuration should be automated using provisioning tools (e.g., Ansible, Puppet, or cloud-init). If a change is required, the provisioning script is updated, and a entirely new VM is instantiated to replace the old one.
The Maintenance Burden
Unlike containers, which rely on the host OS for security patches and kernel updates, each Virtual Machine is an independent entity. The owner of the VM assumes full responsibility for securing the guest operating system, managing firewall configurations (e.g., ufw or iptables), and applying critical security updates (e.g., apt-get upgrade). Unmaintained VMs rapidly become vulnerable vectors within institutional networks.
Hardware Limitations#
Resource Overhead and Inflexibility
Virtual Machines require hard reservations of host resources. If a VM is allocated 16 CPU cores and 64GB of RAM, those resources are fully locked by the hypervisor, even if the guest OS is currently sitting idle. Unlike containers, which can burst and share idle CPU cycles seamlessly across the host, VMs inherently lead to lower overall utilization density on compute clusters.
Hardware Pass-through Complexity
Assigning physical host hardware (such as NVIDIA GPUs or high-speed InfiniBand network interfaces) directly to a virtual machine is structurally complex. It requires technologies like PCIe Passthrough or SR-IOV (Single Root I/O Virtualization).
The hypervisor must detach the physical PCIe device from the host kernel and map it directly into the guest VM’s memory space. This breaks advanced virtualization features; a VM with a passthrough GPU typically cannot be live-migrated to another physical compute node without being completely shut down, thereby limiting high-availability configurations.