Linear regressor ML model, based in numpy and pandas. The challenge of this project is to implement the training algorithms from "scratch", using only the facilities of matrix operations powered by the numpy's library.
- numpy >= 1.25.0
- pandas >= 2.0.3
The predictor of this linear regressor maps an n-dimensional entry x = (x_1, x_2, x_3, ..., x_n) in to a real value ŷ, by the function f:R^n -> R given by:
f(x) = w_0 + w_1 * x_1 + w_2 * x_2 + w_3 * x_3 + ... + w_n * x_n
Where the w_i factors are the weights of the regressor
This regressor implements the gradient descent algorithm to get the optimal weights w* = (w_0, w_1, w_2, ..., w_n). We start by an initial guess w^(0). Then, we iterativily update the guess with the bellow recursive formula
w^(n+1) = w^(n) - eta * grad L (w^(n))
Where
- w^(n) is the nth term of the sequence
- eta = learning rate
- grad L = gradient of the loss function
First, import your dataset, with pandas
dataset = DatasetHandler.get_dataframe(<path>)
Split it in to train an test datasets
train_dataset, test_dataset = DatasetHandler.train_test_split(dataset, <train/test ratio>)
Then, instantiate the linear regressor, with the training and the test datasets:
linear_reg = LinearRegressor(train_dataset, test_dataset)
Finally, train your model with your prefered method:
linear_reg.gradient_descendent_fit(<learning rate>, <iterations number>)
Now, the model is trained and you can infere the value of y for any given instance x:
linear_reg.predict(x)
You can also get the loss over the test dataset:
linear_reg.get_loss()
Done!