Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beginning with GPs #2

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions Concrete_article_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# -*-encoding: utf-8 -


import numpy as np
import pandas as pd
from sklearn import gaussian_process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When uisng as, we generally use a short alias name for the module... For example, gp.

import matplotlib.pyplot as plt

if __name__ == '__main__':
Copy link
Contributor

@jouvin jouvin Jun 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goal of this construct is to put everything in a funciton and avoid the potential issues with global code and especially global data... (the other goal is to make the file usable both as an imported module or an application). Thus the code should be something like:

import sys
....

def main():
    """
    Description of main()
    """
    Your current code

if __name__ == '__main__':
    sys.exit(main())


df = pd.read_excel('C:/users/Nicolas/PycharmProjects/Concrete_Data.xls', header=0).dropna()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have this line in a try... except... block as reading a file is an operation that can generate exceptions...

theta_0=1
theta_L=1e-4 # setting hyper-parameters
theta_U=1e-3
gp = gaussian_process.GaussianProcess(theta0=theta_0, thetaL=theta_L, thetaU=theta_U)

X = np.array(df.iloc[:30,[0]]) + np.random.normal(size=(30,1), loc=0, scale=5) # adding some noise because Gp don't allow # equality in input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to insert comment as a oneliner before the code...
Also the file is -> the file to be

y = np.array(df.iloc[:30,[8]]) # last column of the file is the value of interest

"""print(X.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code: remove it!

print(X)
print(y.shape)
plt.hist(X)
plt.hist(y.as_matrix())
plt.show()"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With docstrings, this is better to have the closing """ on a separate line.


""" Some trial to remove equality in X (got to try method .groupby of panda)

Xu = np.unique(X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark. Makes the code difficult to read.

print(Xu)
yu=np.extract([X[i]==x1 for i,x1 in enumerate(Xu)] , np.array(df.iloc[:30,[8]]))
print(Xu.shape)
print(yu)
print(X,y)
unique_index = [np.where(X==x1) for x1 in Xu] # get corresponding indices for unique X values
print(Xu, y[unique_index])"""

gp.fit(X, y) # fitting the feature 1 on Y with the GP
n = 1000 # number of points to predict (the more the smoother)
x = np.atleast_1d(np.linspace(X.min(), X.max(), n)).reshape((n, 1)) # n input values for prediction
y_pred = gp.predict(x, eval_MSE=False) # prediction on x with the GP
y_pred = np.array(y_pred).T
print(y_pred.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason for printing something without a text?

print("y_pred=",y_pred)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More pythonic...

print ("y_pred=%f" % y_pred)

print(y_pred[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above...

temp= np.linspace(X.min(), X.max(), n)
temp2= y_pred[0]
print(temp.shape, temp2.shape)
plt.plot(np.linspace(X.min(), X.max(), n), y_pred[0], 'b')
plt.scatter(X, y)
plt.title('theta_0=%s '%theta_0 + 'theta_L=%s '%theta_L + 'theta_U=%s '%theta_U)
plt.savefig('Image Gp/Concrete_feature1_GP_theta0=%s'%theta_0 + 'thetaL=%s'%theta_L + 'thetaU=%s'%theta_U + '.png')
plt.show()