Renew power hackathon

The project’s objective was to predict the rotor-bearing temperature of wind turbines from the yearly data. Following are the steps with a brief explanation that were followed in the submission notebook,

Imported training dataset
Created new features relevant to wind energy systems:
- Apparent power: hypotenuses to active and reactive power
- Power difference between active and average power: useful to detect surges
- Power from wind speed: P= 1/3 × air density × (wind velocity)^3
- Frictional factor: ∝ (generator speed)^2
- Difference between raw and convertor calculated power
- Temperature difference between inside and outside nacelle temperature
- Phase angle= tan^(-1)⁡(reactive power/active power)
Anomaly detection and removal was performed using PCA transformation:
- Divided df based on ‘turbine_id’ and separately saved the dependent/independent variables
- PCA transformation of independent variables with same number of features as original data
- Reconstructed original data using transformed features from the previous step
- Used z-score to eliminate data with an extreme reconstruction error
Visualized features separately for each turbine:
- Turbines behave differently and scaling all as same data will undermine values from the turbine with smaller values. Scaled each turbine independently and later concatenated all together.
- Many features depend on season/weather (different values in different months). As timestamps can’t be used to extract months, clustering was used to separate these different data points
Clustering based on climate/weather features to segregate data into two clusters (most likely hot and cold seasons):
- Scaled the data before clustering to avoid feature dominance
- k-mean and hdbscan clustering were used based on seasonal features from the visualization
- For each turbine, data were clustered into two labels, which were later used as a new feature for modelling
Other performed data transformations include:
- Skewness correction with PowerTransformer (yeo-johnson transformation)
- Concatenation of different turbines data into single df
- Outlier removal based on visualization wrt. target column
- OHE for turbine_id
- Feature selection:
  - Many columns were highly correlated (> 0.9). Dropped columns with multi-collinearity
  - Dropped columns which don’t impact the model by hit & trail
Split data into train-test with stratification on turbine_ids (train-test should have proportioned data of each turbine)
Model selection:
- Different models such as KNN, LinearRegression, RandomForest, XGBoost, Catboost were trained on train split of the data with default parameters
- A simplistic vanilla neural network was also trained with a fixed learning rate and SGD optimizer
- Models evaluated based on MAPE and R2 scores on test split of the data
- Neural network outperformed other models and was chosen for optimization
- Developed Tensorflow sequential neural network
  - Regularization techniques such as dropout and batch-normalization were employed to avoid overfitting
Defined learning rate schedule, optimizer and loss function:
For continuation of the learning process, decaying learning rate schedule was fed instead of a constant value
Different optimizers such as RMSProp, Adam and SGD were tried; Adam outperformed the others
Error functions such as MSE, MAE, MAPE and MSLE were used; Optimization on MAPE gave the best score for the competition
The model was trained on complete data with 600 batch size and 1000 epochs
Losses and evaluation metrics wrt. epochs were plotted for the inference
Trained model and weights saved as JSON and h5 files, respectively
Loaded saved model for prediction on the test dataset
Submission:
- Repeated the same transformations on the test df
- Prediction from the loaded model
- Saved the submission file
- Verified the predicted data

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data		Data
Model		Model
Presentation		Presentation
Reading materials		Reading materials
.gitignore		.gitignore
Profile_report.html		Profile_report.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Renew power hackathon

About

Languages

shubhankarm01/Renew-power-hackathon

Folders and files

Latest commit

History

Repository files navigation

Renew power hackathon

About

Topics

Resources

Stars

Watchers

Forks

Languages