Refactoring#
Definition#
Trying to write perfectly DRY, well-structured code on your first attempt is a recipe for paralysis. When drafting a script, your primary focus should be on solving the problem. The path from messy working code to clean reusable code is called refactoring.
Refactoring is the disciplined practice of restructuring existing code to improve its design, readability, and maintainability without altering its external behavior.
It generally targets four dimensions:
Usability: making code easier to read and understand.
Reusability: structuring logic so it can be applied in different contexts.
Compatibility: ensuring the code integrates well with libraries and standards.
Performance: making code faster or less resource-hungry.
Refactoring in practice#
A systematic approach to refactoring a script:
Identify repetitive code: look for copy-pasted blocks or logic that appears multiple times. Extract it into a function.
Create clear definitions: give each function a descriptive name and a single responsibility.
Implement with rubber ducking: explain to your duck what data the function needs. Those become your arguments.
Aggregate into modules: group related functions into Python files (modules) and folders (packages).
A simple project structure might look like this:
my_project/
├── src/
│ └── my_analysis/
│ ├── __init__.py
│ ├── cleaning.py
│ └── regression.py
└── pyproject.toml
You can then import your logic cleanly:
from my_analysis.cleaning import remove_nulls
from my_analysis.regression import fit_model