High Performance Computing (HPC)#
HPC aggregates the processing power of many individual computers, servers, or specialized nodes into a coordinated system that delivers performance far beyond single workstation capabilities. By employing parallel architectures, high‑speed interconnects, and optimized software stacks, HPC enables the execution of large‑scale simulations, data‑intensive analyses, and other computationally demanding workloads.
Understanding the physical separation of roles in a cluster is critical for effective usage.
Login Node#
Redundancy
Multiple login nodes for fault-tolerance and load-balancing are common. In this case they share a filesystem (usually on a network‑attached storage).
The login node(s) are the only machines users connect to directly via SSH. They act as the “doorway” to the rest of the cluster.
Typical activities:
✅ Do edit scripts, compile code, manage files, and submit jobs (e.g.,
sbatch,srun).❌ Do not run long‑running or resource‑intensive calculations here. Heavy workloads belong on compute nodes; crashing a login node can block access for all users.
Controller#
The Controller, aka scheduler or workload manager, acts as a broker distributing the users requests onto the compute nodes. Usually, the controller is a dedicated machine (or machines) but the controller can run on a login node
Typically, users submit a “job script” to a scheduler software (commonly Slurm or PBS)
You cannot log into compute nodes directly. The controller acts as a broker:
You ask for resources (e.g., “I need 40 CPUs for 2 hours”).
The controller queues your request.
When resources are free, it allocates specific nodes to you and runs your script.
Compute Nodes#
These are the “muscle” of the system. They are often stripped-down machines (headless) optimized purely for number crunching. They access your files via a shared network filesystem.