Author: María Victoria Vivas Gutiérrez
Submission Date: June 2025
This project was developed as the final evaluation for the course Programming for Data Science at the Universitat Oberta de Catalunya (UOC).
It was carried out during my gap year, while preparing and applying for a Master's program, as a way to consolidate my skills in Python, modular programming, testing, and data analysis.
The main objective is to build a modular Python package to analyze the water volume of the La Baells Reservoir using real open data from the Agència Catalana de l’Aigua (ACA).
The project is structured into five modules, each covering different analysis tasks and functionalities: data loading, cleaning, transformations, smoothing, and drought period detection.
Clone the repository and install dependencies:
git clone https://github.com/victoriavivass/ACA_analysis.git
cd ACA_analysis
pip install -r requirements.txtRun the complete analysis:
python main.pyRun a specific module (e.g. up to module 3):
python main.py -ex 3ACA_analysis/
│── img/ # Generated plots and images
│── test/ # Unit tests
│ ├── test_modulo1.py
│ ├── test_modulo2.py
│ ├── test_modulo3.py
│ ├── test_modulo4.py
│ ├── test_modulo5.py
│── modulo1.py # Data loading and exploration
│── modulo2.py # Data cleaning and filtering
│── modulo3.py # Temporal transformations & visualization
│── modulo4.py # Data smoothing & trends
│── modulo5.py # Drought period detection
│── main.py # Entry point for the analysis
│── requirements.txt # Project dependencies
│── README.md # Project documentation
- Fetch dataset from the ACA API and load into a Pandas DataFrame.
- Display the first five rows for an initial preview.
- Show the general structure of columns.
- Print a summary of the DataFrame using
df.info().
- Rename columns using a dictionary for clarity.
- Normalize reservoir names using regular expressions.
- Filter data specifically for the La Baells Reservoir.
- Convert the dia column to
datetimetype. - Sort dataset chronologically for easier interpretation.
- Identify the earliest and most recent dates in the dataset.
- Create
dia_decimalcolumn (decimal representation of dates). - Plot reservoir water volume over time and save it under
img/.
- Apply
savgol_filter(from scipy) to smooth fluctuations in reservoir volume. - Compare original vs. smoothed signals in a plot.
- Save the generated plot in
img/including the student’s name.
- Implement calcula_periodos() function to detect drought periods.
- Define droughts as periods when storage percentage drops below 60%.
- Print detected periods as a list with start and end dates.