Skip to content

Commit

Permalink
Making things more compatible with tf.keras.Model, increase flexibi…
Browse files Browse the repository at this point in the history
…lity with rapid-prototyping of optimizers (#12)

* updating on tom_dev

* string issue

* logger name issue

* adding cifar imagenet transfer learning

* damping parameter

* damping issue

* updating transfer learning driver

* updating cifar10 driver

* adding seed to logger name

* updating

* wrong size

* updating

* architecture is incorrect

* another error in the weights

* unsetting visible devices

* cuda devices

* starting to add keras Model wrapper stuff to hessianlearn

* updating the preconditioner due to eager issues

* inferring dtype when its not passed in

* updating adam

* updating

* updating incg

* updating problem

* checkpointing work on multi input output keras Model compatibility

* weighted sum of losses has been implemented now

* checkpointing work on kerasModelWrapper that streamlines the nn training without old hessianlearn baggage

* updating with a working prototype of the kerasModelWrapper

* updating getting close to merging the PR

* updating

* updating

* getting close to merging

* modifying tf version for unit tests

* modifying tf version for unit tests

* updating the unit tests to suppress all of tensorflows nonsense

* updating the unit test

Co-authored-by: Tom OR <[email protected]>
  • Loading branch information
tomoleary and tomoleary committed Dec 15, 2021
1 parent 021d1c1 commit 9f5c3bc
Show file tree
Hide file tree
Showing 16 changed files with 1,022 additions and 48 deletions.
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@

language: python
python:
- "3.5"
- "3.6"
- "3.7"
install:
- sudo apt-get update
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
Expand All @@ -31,7 +31,7 @@ install:
# # Useful for debugging any issues with conda
# - conda info -a
# Replace dep1 dep2 ... with your dependencies
- conda create -n hessianlearn2 python=$TRAVIS_PYTHON_VERSION tensorflow scipy
- conda create -n hessianlearn2 python=$TRAVIS_PYTHON_VERSION tensorflow=2.0.0 scipy
- conda activate hessianlearn2
# # - python setup.py install
script:
Expand Down
60 changes: 51 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,15 @@ Set `HESSIANLEARN_PATH` environmental variable
Train a keras model

```python
import os,sys
import tensorflow as tf
sys.path.append( os.environ.get('HESSIANLEARN_PATH'))
from hessianlearn import *

# Define keras neural network model
neural_network = tf.keras.models.Model(...)
# Define loss function and compile model
neural_network.compile(loss = ...)

```

Expand All @@ -77,7 +80,9 @@ hessianlearn implements various training [`problem`](https://github.com/tomolear
```python
# Instantiate the problem (this handles the loss function,
# construction of hessian and gradient etc.)
problem = RegressionProblem(neural_network,dtype = tf.float32)
# KerasModelProblem extracts loss function and metrics from
# a compiled keras model
problem = KerasModelProblem(neural_network)
# Instantiate the data object, this handles the train / validation split
# as well as iterating during training
data = Data({problem.x:x_data,problem.y_true:y_data},train_batch_size,\
Expand All @@ -94,6 +99,40 @@ HLModel = HessianlearnModel(problem,regularization,data)
HLModel.fit()
```

### Alternative Usage (More like Keras Interface)
The example above was the original way the optimizer interface was implemented in hessianlearn, however to better mimic the keras interface and allow for more end-user rapid prototyping of the optimizer that is used to fit data, as of December 2021, the following way has been created

```python
import os,sys
import tensorflow as tf
sys.path.append( os.environ.get('HESSIANLEARN_PATH'))
from hessianlearn import *

# Define keras neural network model
neural_network = tf.keras.models.Model(...)
# Define loss function and compile model
neural_network.compile(loss = ...)
# Instance keras model wrapper which deals with the
# construction of the `problem` which handles the construction
# of Hessian computational graph and variables
HLModel = KerasModelWrapper(neural_network)
# Then the end user can pass in an optimizer
# (e.g. custom end-user optimizer)
optimizer = LowRankSaddleFreeNewton # The class constructor, not an instance
optparameters = LowRankSaddleFreeNewtonParameters()
optimizer_parameters['hessian_low_rank'] = 40
HLModel.set_optimizer(optimizer,optimizer_parameters = optparameters)
# The data object still needs to key on to the specific computational
# graph variables that data will be passed in for.
# Note that data can naturally handle multiple input and output data,
# in which case problem.x, problem.y_true are lists corresponding to
# neural_network.inputs, neural_network.outputs
problem = HLModel.problem
data = Data({problem.x:x_data,problem.y_true:y_data},train_batch_size,\
validation_data_size = validation_data_size)
# And finally one can call fit!
HLModel.fit(data)
```

## Examples

Expand All @@ -108,7 +147,7 @@ These publications motivate and use the hessianlearn library for stochastic nonc
[**Inexact Newton Methods for Stochastic Nonconvex Optimization with Applications to Neural Network Training**](https://arxiv.org/abs/1905.06738).
arXiv:1905.06738.
([Download](https://arxiv.org/pdf/1905.06738.pdf))<details><summary>BibTeX</summary><pre>
@article{o2019inexact,
@article{OLearyRoseberryAlgerGhattas2019,
title={Inexact Newton methods for stochastic nonconvex optimization with applications to neural network training},
author={O'Leary-Roseberry, Thomas and Alger, Nick and Ghattas, Omar},
journal={arXiv preprint arXiv:1905.06738},
Expand All @@ -117,10 +156,10 @@ arXiv:1905.06738.
}</pre></details>

- \[2\] O'Leary-Roseberry, T., Alger, N., Ghattas O.,
[**Low Rank Saddle Free Newton**](https://arxiv.org/abs/2002.02881).
[**Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization**](https://arxiv.org/abs/2002.02881).
arXiv:2002.02881.
([Download](https://arxiv.org/pdf/2002.02881.pdf))<details><summary>BibTeX</summary><pre>
@article{o2020low,
@article{OLearyRoseberryAlgerGhattas2020,
title={Low Rank Saddle Free Newton: Algorithm and Analysis},
author={O'Leary-Roseberry, Thomas and Alger, Nick and Ghattas, Omar},
journal={arXiv preprint arXiv:2002.02881},
Expand All @@ -133,11 +172,14 @@ arXiv:2002.02881.
[**Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs**](https://arxiv.org/abs/2011.15110).
arXiv:2011.15110.
([Download](https://arxiv.org/pdf/2011.15110.pdf))<details><summary>BibTeX</summary><pre>
@article{o2020derivative,
title={Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs},
author={O'Leary-Roseberry, Thomas and Villa, Umberto and Chen, Peng and Ghattas, Omar},
journal={arXiv preprint arXiv:2011.15110},
year={2020}
@article{OLearyRoseberryVillaChenEtAl2022,
title={Derivative-informed projected neural networks for high-dimensional parametric maps governed by {PDE}s},
author={O’Leary-Roseberry, Thomas and Villa, Umberto and Chen, Peng and Ghattas, Omar},
journal={Computer Methods in Applied Mechanics and Engineering},
volume={388},
pages={114199},
year={2022},
publisher={Elsevier}
}
}</pre></details>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@

pretrained_resnet50 = tf.keras.applications.resnet50.ResNet50(weights = 'imagenet',include_top=False,input_tensor=input_tensor)


for layer in pretrained_resnet50.layers[:143]:
layer.trainable = False

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# This file is part of the hessianlearn package
#
# hessianlearn is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or any later version.
#
# hessianlearn is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# If not, see <http://www.gnu.org/licenses/>.
#
# Author: Tom O'Leary-Roseberry
# Contact: [email protected]


import numpy as np
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
os.environ["KMP_WARNINGS"] = "FALSE"
# os.environ['CUDA_VISIBLE_DEVICES'] = '1'
import pickle
import tensorflow as tf
import time, datetime
# if int(tf.__version__[0]) > 1:
# import tensorflow.compat.v1 as tf
# tf.disable_v2_behavior()


# Memory issue with GPUs
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
for device in gpu_devices:
tf.config.experimental.set_memory_growth(device, True)
# Load hessianlearn library
import sys
sys.path.append( os.environ.get('HESSIANLEARN_PATH', "../../"))
from hessianlearn import *

# Parse run specifications
from argparse import ArgumentParser

parser = ArgumentParser(add_help=True)
parser.add_argument("-optimizer", dest='optimizer',required=False, default = 'lrsfn', help="optimizer type",type=str)
parser.add_argument('-fixed_step',dest = 'fixed_step',\
required= False,default = 1,help='boolean for fixed step vs globalization',type = int)
parser.add_argument('-alpha',dest = 'alpha',required = False,default = 1e-4,help= 'learning rate alpha',type=float)
parser.add_argument('-hessian_low_rank',dest = 'hessian_low_rank',required= False,default = 40,help='low rank for sfn',type = int)
parser.add_argument('-record_spectrum',dest = 'record_spectrum',\
required= False,default = 0,help='boolean for recording spectrum',type = int)
# parser.add_argument('-weight_burn_in',dest = 'weight_burn_in',\
# required= False,default = 0,help='',type = int)

# parser.add_argument('-data_seed',dest = 'data_seed',\
# required= False,default = 0,help='',type = int)

parser.add_argument('-batch_size',dest = 'batch_size',required= False,default = 32,help='batch size',type = int)
parser.add_argument('-hess_batch_size',dest = 'hess_batch_size',required= False,default = 8,help='hess batch size',type = int)
parser.add_argument('-keras_epochs',dest = 'keras_epochs',required= False,default = 50,help='keras_epochs',type = int)
parser.add_argument("-keras_opt", dest='keras_opt',required=False, default = 'adam', help="optimizer type for keras",type=str)
parser.add_argument('-keras_alpha',dest = 'keras_alpha',required= False,default = 1e-3,help='keras learning rate',type = float)
parser.add_argument('-max_sweeps',dest = 'max_sweeps',required= False,default = 1,help='max sweeps',type = float)
parser.add_argument('-weights_file',dest = 'weights_file',required= False,default = 'None',help='weight file pickle',type = str)

args = parser.parse_args()

try:
tf.set_random_seed(0)
except:
tf.random.set_seed(0)

# GPU Environment Details
gpu_availabe = tf.test.is_gpu_available()
built_with_cuda = tf.test.is_built_with_cuda()
print(80*'#')
print(('IS GPU AVAILABLE: '+str(gpu_availabe)).center(80))
print(('IS BUILT WITH CUDA: '+str(built_with_cuda)).center(80))
print(80*'#')

settings = {}
# Set run specifications
# Data specs
settings['batch_size'] = args.batch_size
settings['hess_batch_size'] = args.hess_batch_size


################################################################################
# Instantiate data
(x_train, y_train), (_x_test, _y_test) = tf.keras.datasets.cifar10.load_data()

# # Normalize the data
# x_train = x_train.astype('float32') / 255.
# x_test = x_test.astype('float32') / 255.

x_train = tf.keras.applications.resnet50.preprocess_input(x_train)
x_test_full = tf.keras.applications.resnet50.preprocess_input(_x_test)
x_val = x_test_full[:2000]
x_test = x_test_full[2000:]

y_train = tf.keras.utils.to_categorical(y_train)
y_test_full = tf.keras.utils.to_categorical(_y_test)
y_val = y_test_full[:2000]
y_test = y_test_full[2000:]

################################################################################
# Create the neural network in keras

# tf.keras.backend.set_floatx('float64')

resnet_input_shape = (200,200,3)
input_tensor = tf.keras.Input(shape = resnet_input_shape)

pretrained_resnet50 = tf.keras.applications.resnet50.ResNet50(weights = 'imagenet',include_top=False,input_tensor=input_tensor)

for layer in pretrained_resnet50.layers[:143]:
layer.trainable = False

classifier = tf.keras.models.Sequential()
classifier.add(tf.keras.layers.Input(shape=(32,32,3)))
classifier.add(tf.keras.layers.Lambda(lambda image: tf.image.resize(image, resnet_input_shape[:2])))
classifier.add(pretrained_resnet50)
classifier.add(tf.keras.layers.Flatten())
classifier.add(tf.keras.layers.BatchNormalization())
classifier.add(tf.keras.layers.Dense(64, activation='relu'))
classifier.add(tf.keras.layers.Dropout(0.5))
classifier.add(tf.keras.layers.BatchNormalization())
classifier.add(tf.keras.layers.Dense(10, activation='softmax'))


if args.keras_opt == 'adam':
optimizer = tf.keras.optimizers.Adam(learning_rate = args.keras_alpha,epsilon = 1e-8)
elif args.keras_opt == 'sgd':
optimizer = tf.keras.optimizers.SGD(learning_rate=args.keras_alpha)
else:
raise

classifier.compile(optimizer=optimizer,
loss=tf.keras.losses.CategoricalCrossentropy(from_logits = True),
metrics=['accuracy'])

loss_test_0, acc_test_0 = classifier.evaluate(x_test,y_test,verbose=2)
print('acc_test = ',acc_test_0)
loss_val_0, acc_val_0 = classifier.evaluate(x_val,y_val,verbose=2)
print('acc_val = ',acc_val_0)


if args.weights_file is not 'None':
try:
logger = open(args.weights_file, 'rb')
best_weights = pickle.load(logger)['best_weights']
for layer_name,weight in best_weights.items():
classifier.get_layer(layer_name).set_weights(weight)
except:
print('Issue loading best weights')

loss_test_final, acc_test_final = classifier.evaluate(x_test,y_test,verbose=2)
print('acc_test final = ',acc_test_final)
loss_val_final, acc_val_final = classifier.evaluate(x_val,y_val,verbose=2)
print('acc_val final = ',acc_val_final)

################################################################################
# Evaluate again on all the data.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# # Normalize the data
# x_train = x_train.astype('float32') / 255.
# x_test = x_test.astype('float32') / 255.

x_train = tf.keras.applications.resnet50.preprocess_input(x_train)
x_test = tf.keras.applications.resnet50.preprocess_input(x_test)

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

loss_test_total, acc_test_total = classifier.evaluate(x_test,y_test,verbose=2)
print(80*'#')
print('After hessianlearn training'.center(80))
print('acc_test_total = ',acc_test_total)
7 changes: 4 additions & 3 deletions hessianlearn/algorithms/adam.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,15 @@ def minimize(self,feed_dict = None):
gradient = self.sess.run(self.grad,feed_dict = feed_dict)

self.m = self.parameters['beta_1']*self.m + (1-self.parameters['beta_1'])*gradient
# m_hat = [m/(1 - self.parameters['beta_1']**self.iter) for m in self.m]
m_hat = self.m / (1.0 - self.parameters['beta_1']**self._iter)

g_sq_vec = np.square(gradient)
self.v = self.parameters['beta_2']*self.v + (1-self.parameters['beta_2'])*g_sq_vec
v_root = np.sqrt(self.v)
v_hat = self.v / (1.0 - self.parameters['beta_2']**self._iter)
v_root = np.sqrt(v_hat)


update = -alpha*self.m/(v_root +self.parameters['epsilon'])
update = -alpha*m_hat/(v_root +self.parameters['epsilon'])
self.p = update
self._sweeps += [1,0]
self.sess.run(self.problem._update_ops,feed_dict = {self.problem._update_placeholder:update})
Expand Down
7 changes: 3 additions & 4 deletions hessianlearn/algorithms/inexactNewtonCG.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,12 @@ def minimize(self,feed_dict = None,hessian_feed_dict = None):
if not self.trust_region_initialized:
self.initialize_trust_region()
# Set trust region radius
self.cg_solver.set_trust_region_radius(self.trust_region.radius)
p,on_boundary = self.cg_solver.solve(-gradient,feed_dict)
self._sweeps += [1,2*self.cg_solver.iter]
self.p = p
self.cg_solver.set_trust_region_radius(self.trust_region.radius)
# Solve for candidate step
p, on_boundary = self.cg_solver.solve(-gradient,hessian_feed_dict)
pg = np.dot(p,gradient)
self._sweeps += [1,2*self.cg_solver.iter]
self.p = p
# Calculate predicted reduction
feed_dict[self.cg_solver.problem.dw] = p
Hp = self.sess.run(self.cg_solver.Aop,feed_dict)
Expand Down
1 change: 0 additions & 1 deletion hessianlearn/algorithms/inexactNewtonMINRES.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,5 @@ def minimize(self,feed_dict = None,hessian_feed_dict = None):






4 changes: 3 additions & 1 deletion hessianlearn/algorithms/lowRankSaddleFreeNewton.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@


def ParametersLowRankSaddleFreeNewton(parameters = {}):
parameters['alpha'] = [1e0, "Initial steplength, or learning rate"]
parameters['alpha'] = [1e-3, "Initial steplength, or learning rate"]
parameters['rel_tolerance'] = [1e-3, "Relative convergence when sqrt(g,g)/sqrt(g_0,g_0) <= rel_tolerance"]
parameters['abs_tolerance'] = [1e-4,"Absolute converge when sqrt(g,g) <= abs_tolerance"]
parameters['default_damping'] = [1e-3, "Levenberg-Marquardt damping when no regularization is used"]
Expand Down Expand Up @@ -95,6 +95,8 @@ def __init__(self,problem,regularization = None,sess = None,parameters = Paramet

self._rq_std = 0.0

self.eigenvalues = None

@property
def rank(self):
return self._rank
Expand Down
12 changes: 12 additions & 0 deletions hessianlearn/algorithms/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,18 @@ def iter(self):
def regularization(self):
return self._regularization

@property
def set_sess(self):
return self._set_sess


def _set_sess(self,sess):
r"""
Sets the tf.Session()
"""
self._sess = sess
if 'H' in dir(self):
self.H._sess = sess

def minimize(self):
r"""
Expand Down
Loading

0 comments on commit 9f5c3bc

Please sign in to comment.