DanRuta
diff --git a/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 136 additions & 10 deletions b/‎README.md‎
Lines changed: 136 additions & 10 deletions
diff --git a/‎dev/cpp/ConvLayer.cpp‎
Lines changed: 13 additions & 3 deletions b/‎dev/cpp/ConvLayer.cpp‎
Lines changed: 13 additions & 3 deletions
diff --git a/‎dev/cpp/FCLayer.cpp‎
Lines changed: 32 additions & 3 deletions b/‎dev/cpp/FCLayer.cpp‎
Lines changed: 32 additions & 3 deletions
diff --git a/‎dev/cpp/Filter.cpp‎
Lines changed: 1 addition & 0 deletions b/‎dev/cpp/Filter.cpp‎
Lines changed: 1 addition & 0 deletions
@@ -1,3 +1,23 @@
+# 3.2.0 - IMG data, validation, early stopping
+---
+#### Network
+- Added weight+bias importing and exporting via images, using IMGArrays
+- Added validation config to .train(), with interval config
+- Added early stopping to validation, with threshold stopping condition
+- Added early stopping patience condition
+- Added early stopping divergence condition
+- Breaking change: "error" key in training callbacks have been changed to "trainingError"
+- Breaking change: Removed ability to use either data keys 'expected' and 'output'. Now just 'expected'.
+
+#### NetUtil
+- Added splitData function
+- Added normalize function
+
+#### NetMath
+- Added root mean squared error cost function
+- Added momentum weight update function
+- Breaking change: Renamed "vanilla update fn" to "vanilla sgd"
+
 # 3.1.0 - Optimizations
 ---
 #### ConvLayer
 
@@ -19,7 +19,12 @@ https://ai.danruta.co.uk/webassembly - Performance comparison between JS and Web
 ---
 There are two different versions of jsNet: WebAssembly, and JavaScript-only. There are demos included for loading both versions, in nodejs, as well as in the browser. The WebAssembly version is a little more complex to load, due to the NetWASM files which are generated by emscripten, containing the compiled code and the glue code to manage the WASM code. The ```NetWASM.js``` lazy loads the ```NetWASM.wasm``` file with the given path.
 
-The API has been kept the same as the JavaScript only version. Every single value has get/set bindings to the WebAssembly variables, meaning that apart from not being able to freely browse the values in dev tools (need to call them, to see them), you should notice no API difference between the two versions.
+The API has been kept the same as the JavaScript only version. Every single value has get/set bindings to the WebAssembly variables, meaning that apart from not being able to freely browse the values in dev tools (need to call them, to see them), you should notice no API difference between the two versions. One thing to note is that when changing primitive WebAssembly array values, eg, setting `net.layers[1].neurons[0].weights[0]` to 1, you need to set the entire, modified weights array, not at an index. For example, you would do this instead:
+```javascript
+const weights = net.layers[1].neurons[0].weights
+weights[0] = 1
+net.layers[1].neurons[0].weights = weights
+```
 
 Note that you need to serve files via a server (a basic server is an included) to load WebAssembly into a browser.
 
@@ -164,11 +169,12 @@ When building a convolutional network, make sure that the number of neurons in t
 ### Training
 ----
 
-The data structure must be an object with key ```input``` having an array of numbers, and key ```expected``` or ```output``` holding the expected output of the network. For example, the following are both valid inputs for both training and testing.
+The data structure must be an object with key ```input``` having an array of numbers, and key ```expected```  holding the expected output of the network. For example, the following is a valid input for training, validation and testing.
 ```javascript
 {input: [1,0,0.2], expected: [1, 2]}
-{input: [1,0,0.2], output: [1, 2]}
 ```
+***Tip**: You can normalize data using the ```NetUtil.normalize()``` function (see at the bottom)*
+
 You train the network by passing a set of data. The network will log to the console the error and epoch number, after each epoch, as well as time elapsed and average epoch duration.
 ```javascript
 const {training} = mnist.set(800, 200) // Get the training data from the mnist library, linked above
@@ -187,16 +193,70 @@ By default, this is ```1``` and represents how many times the data passed will b
 net.train(training, {epochs: 5}) // This will run through the training data 5 times
 ```
 ###### Callback
-You can also provide a callback in the options parameter, which will get called after each iteration (Maybe updating a graph?). The callback is passed how many iterations have passed, the error, the milliseconds elapsed and the input data for that iteration.
+You can also provide a callback in the options parameter, which will get called after each iteration (Maybe updating a graph?). The callback is passed how many iterations have passed, the milliseconds elapsed since training started, and the validation error OR the training error with input data for that iteration.
 ```javascript
-const doSomeStuff = ({iterations, error, elapsed, input}) => ....
+const doSomeStuff = ({iterations, trainingError, validationError, elapsed, input}) => ....
 net.train(training, {callback: doSomeStuff})
 ```
 ###### Log
 You can turn off the logging by passing log: false in the options parameter.
 ```javascript
 net.train(training, {log: false})
 ```
+
+###### Validation
+You can specify an array of data to use as validation. This must have the same structure as the training/test data. The validation config contains three parts: data, interval, and early stopping (see below). The data is where the data is provided. The interval is an integer, representing how many training iterations pass between validations of the entire validation set. By default, this is set to 1 epoch, aka the length of the given training data set.
+```javascript
+// Validate every 5 training iterations
+net.train(training, {validation: {
+    data: [...],
+    interval: 5
+}})
+// Validate every 3 epochs
+net.train(training, {validation: {
+    data: [...],
+    interval: training.length * 3
+}})
+```
+**Tip**: You can use ```NetUtil.splitData(data)``` to split a large array of data into training, validation, and test arrays, with default or specified ratios. See the NetUtil section at the bottom.
+
+###### Early stopping
+When using validation data, you can specify an extra config object, `earlyStopping`, to configure stopping the training early, once a condition has been met, to counter overfitting. By default, this is turned off, but each option has default values, once the type is specified, via the `type` key.
+
+|  Type | What it does | Available Configurations | Default value |
+|:-------------:| :-----:| :-----:| :---: |
+| threshold | Stops the training the first time the validation error reaches, or goes below the specified threshold. A final backward pass is made, and weights updated, before stopping. | threshold.  | 0.01 |
+| patience | This backs up the weights and biases of the network when the validation error reaches a new best low, following which, if the validation error is worse, a certain number of times in a row, it stops the training and reverts the network weights and biases to the backed up values. The number of times in a row to tolerate is configured via the `patience` hyperparameter | patience | 20 |
+| divergence | This backs up the weights and biases of the network when the validation error reaches a new best low, following which, if the validation error is worse, by at least a percent value equal to that specified, it stops the training and reverts the network weights and biases to the backed up values. The percentage is configured via the `percent` hyperparameter. A very jittery validation error is likely to stop the training very early, when using this condition. | percent | 30 |
+
+Examples:
+```javascript
+// Threshold - Training stops once the validation error reaches down to at most 0.2
+net.train(training, {validation: {
+    data: [...],
+    earlyStopping: {
+        type: "threshold",
+        threshold: 0.2
+    }
+}})
+// Patience - Training stops once the validation error is worse than the best found, 20 times in a row
+net.train(training, {validation: {
+    data: [...],
+    earlyStopping: {
+        type: "patience",
+        patience: 10
+    }
+}})
+// Divergence - Training stops once the validation error is worse than the best found, by 30%
+net.train(training, {validation: {
+    data: [...],
+    earlyStopping: {
+        type: "divergence",
+        percent: 30
+    }
+}})
+```
+
 ###### Mini Batch Size
 You can use mini batch SGD training by specifying a mini batch size to use (changing it from the default, 1). You can set it to true, and it will default to how many classifications there are in the training data.
 
@@ -235,20 +295,46 @@ net.train(training).then(() => net.test(test, {callback: doSomeStuff}))
 
 ### Exporting
 ---
-Weights data is exported as a JSON object.
+There are two way you can manage your data. The built in way is to use JSON for importing and exporting. If you provide my IMGArrays library (https://github.com/DanRuta/IMGArrays), you can alternatively use images, which are much quicker and easier to use, when using the browser.
+
+To export weights data as JSON:
 ```javascript
 const data = trainedNet.toJSON()
 ```
 
+See the IMGArrays library documentation for more details, and nodejs instructions, but its integration into jsNet is as follows:
+```javascript
+const canvas = trainedNet.toIMG(IMGArrays, opts)
+IMGArrays.downloadImage(canvas)
+```
+
 ### Importing
 ---
-Only the weights are exported. You still need to build the net with the same structure and configs, eg activation function.
+Only the weights are exported. You still need to build the net with the same structure and configs, eg activation function. Again, data can be imported as either JSON or an image, when using IMGArrays, like above.
+
+When using json:
 ```javascript
 const freshNetwork = new Network(...)
 freshNetwork.fromJSON(data)
 ```
 If using exported data from before version 2.0.0, just do a find-replace of "neurons" -> "weights" on the exported data and it will work with the new version.
 
+When using IMGArrays:
+```javascript
+const freshNetwork = new Network(...)
+freshNetwork.fromIMG(document.querySelector("img"), IMGArrays, opts)
+```
+
+As an example you could run, you can use the image below to load data for the following jsNet configuration, to have a basic model trained on MNIST.
+```javascript
+const net = new Network({
+    layers: [new FCLayer(784), new FCLayer(100), new FCLayer(10)]
+})
+net.fromIMG(document.querySelector("img"), IMGArrays)
+```
+
+<img width="100%" src="fc-784f-100f-10f.png">
+
 ### Trained usage
 ---
 Once the network has been trained, tested and imported into your page, you can use it via the ```forward``` function.
@@ -275,7 +361,7 @@ const net = new Network({
     l2: undefined,
     l1: undefined,
     layers: [ /* 3 FCLayers */ ]
-    updateFn: "vanillaupdatefn",
+    updateFn: "vanillasgd",
     weightsConfig: {
         distribution: "xavieruniform"
     }
@@ -289,7 +375,7 @@ You can check the framework version via Network.version (static).
 |  Attribute | What it does | Available Configurations | Default value |
 |:-------------:| :-----:| :-----:| :---: |
 | learningRate | The speed at which the net will learn. | Any number | 0.2 (see below for exceptions) |
-| cost | Cost function to use when printing out the net error | crossEntropy, meanSquaredError | meansquarederror |
+| cost | Cost function to use when printing out the net error | crossEntropy, meanSquaredError, rootMeanSquaredError | meansquarederror |
 | channels | Specifies the number of channels in the input data. EG, 3 for RGB images. Used by convolutional networks. | Any number | undefined |
 | conv | (See ConvLayer) An object where the optional keys filterSize, zeroPadding and stride set values for all Conv layers to default to  | Object | {} |
 | pool | (See PoolLayer) An object where the optional keys size and stride set values for all Pool layers to default to  | Object | {} |
@@ -327,9 +413,10 @@ Learning rate is 0.2 by default, except when using the following configurations:
 ### Weight update functions
 |  Attribute | What it does | Available Configurations | Default value |
 |:-------------:| :-----:| :-----:| :---: |
-| updateFn | The function used for updating the weights/bias. The vanillaupdatefn option just sets the network to update the weights without any changes to learning rate. | vanillaupdatefn, gain, adagrad, RMSProp, adam , adadelta| vanillaupdatefn |
+| updateFn | The function used for updating the weights/bias. The vanillasgd option just sets the network to update the weights without any changes to learning rate. | vanillasgd, gain, adagrad, RMSProp, adam , adadelta, momentum | vanillasgd |
 | rmsDecay | The decay rate for RMSProp, when used | Any number | 0.99 |
 | rho | Momentum for Adadelta, when used | Any number | 0.95 |
+| momentum | Momentum for the (sgd) momentum update function. | Any number | 0.9 |
 
 ##### Examples
 ```javascript
@@ -543,6 +630,45 @@ net = new Network({
     learningRate: 0.05
 })
 ```
+### NetUtil
+There is a NetUtil class included, containing some potentially useful functions.
+
+### shuffle(data)
+_array_ **data** - The data array to shuffle
+
+This randomly shuffles an array _in place_ (aka, data passed by reference, the parameter passed will be changed).
+##### Example
+```javascript
+const data = [1,2,3,4,5]
+NetUtil.shuffle(data)
+// data != [1,2,3,4,5]
+```
+
+### splitData(data), splitData(data, {training=0.7, validation=0.15, test=0.15})
+_array_ **data** - The data array to split
+_object_ configs: Override values for the ratios to split. The values should add up to 1.
+
+This is used for splitting a large array of data into the different parts needed for training.
+##### Example
+```javascript
+const data = [1,2,3,4,5]
+const {training, validation, test} = NetUtil.splitData(data)
+// or
+const {training, validation, test} = NetUtil.splitData(data, {training: 0.5, validation: 0.25, test: 0.25})
+```
+
+### normalize(data)
+_array_ **data** - The data array to normalize
+
+This normalizes an array of positive and/or negative numbers to a [0-1] range. The data is changed in place, similar to the shuffle function.
+##### Example
+```javascript
+const data = [1,2,3,-5,0.4,2]
+const {minValue, maxValue} = NetUtil.normalize(data)
+// data == [0.75, 0.875, 1, 0, 0.675, 0.875]
+// minValue == -5
+// maxValue == 3
+```
 
 ## Future plans
 ---
 
@@ -208,13 +208,13 @@ void ConvLayer::applyDeltaWeights (void) {
                                 + net->l2 * filterWeights[f][c][r][v]
                                 + net->l1 * (filterWeights[f][c][r][v] > 0 ? 1 : -1)) / net->miniBatchSize;
 
-                            filterWeights[f][c][r][v] = NetMath::vanillaupdatefn(netInstance, filterWeights[f][c][r][v], regularized);
+                            filterWeights[f][c][r][v] = NetMath::vanillasgd(netInstance, filterWeights[f][c][r][v], regularized);
 
                             if (net->maxNorm) net->maxNormTotal += filterWeights[f][c][r][v] * filterWeights[f][c][r][v];
                         }
                     }
                 }
-                biases[f] = NetMath::vanillaupdatefn(netInstance, biases[f], deltaBiases[f]);
+                biases[f] = NetMath::vanillasgd(netInstance, biases[f], deltaBiases[f]);
             }
             break;
         case 1: // gain
@@ -318,4 +318,14 @@ void ConvLayer::applyDeltaWeights (void) {
         net->maxNormTotal = sqrt(net->maxNormTotal);
         NetMath::maxNorm(netInstance);
     }
-}
+}
+
+void ConvLayer::backUpValidation (void) {
+    validationBiases = biases;
+    validationFilterWeights = filterWeights;
+}
+
+void ConvLayer::restoreValidation (void) {
+    biases = validationBiases;
+    filterWeights = validationFilterWeights;
+}
@@ -188,11 +188,11 @@ void FCLayer::applyDeltaWeights (void) {
                         + net->l2 * weights[n][dw]
                         + net->l1 * (weights[n][dw] > 0 ? 1 : -1)) / net->miniBatchSize;
 
-                    weights[n][dw] = NetMath::vanillaupdatefn(netInstance, weights[n][dw], regularized);
+                    weights[n][dw] = NetMath::vanillasgd(netInstance, weights[n][dw], regularized);
 
                     if (net->maxNorm) net->maxNormTotal += weights[n][dw] * weights[n][dw];
                 }
-                biases[n] = NetMath::vanillaupdatefn(netInstance, biases[n], deltaBiases[n]);
+                biases[n] = NetMath::vanillasgd(netInstance, biases[n], deltaBiases[n]);
             }
             break;
         case 1: // gain
@@ -276,4 +276,33 @@ void FCLayer::applyDeltaWeights (void) {
         net->maxNormTotal = sqrt(net->maxNormTotal);
         NetMath::maxNorm(netInstance);
     }
-}
+}
+
+void FCLayer::backUpValidation (void) {
+
+    validationBiases = {};
+    validationWeights = {};
+
+    for (int n=0; n<neurons.size(); n++) {
+        validationBiases.push_back(biases[n]);
+
+        std::vector<double> neuron;
+
+        for (int w=0; w<weights[n].size(); w++) {
+            neuron.push_back(weights[n][w]);
+        }
+
+        validationWeights.push_back(neuron);
+    }
+}
+
+void FCLayer::restoreValidation (void) {
+
+    for (int n=0; n<neurons.size(); n++) {
+        biases[n] = validationBiases[n];
+
+        for (int w=0; w<weights[n].size(); w++) {
+            weights[n][w] = validationWeights[n][w];
+        }
+    }
+}
@@ -11,6 +11,7 @@ void Filter::init (int netInstance, int channels, int filterSize) {
         case 2: // adagrad
         case 3: // rmsprop
         case 5: // adadelta
+        case 6: // momentum
             biasCache = 0;
             weightsCache = NetUtil::createVolume<double>(channels, filterSize, filterSize, 0);
Original file line number	Diff line number	Diff line change
`@@ -208,13 +208,13 @@ void ConvLayer::applyDeltaWeights (void) {`
`208`	`208`	`+ net->l2 * filterWeights[f][c][r][v]`
`209`	`209`	`+ net->l1 * (filterWeights[f][c][r][v] > 0 ? 1 : -1)) / net->miniBatchSize;`
`210`	`210`
`211`		`- filterWeights[f][c][r][v] = NetMath::vanillaupdatefn(netInstance, filterWeights[f][c][r][v], regularized);`
	`211`	`+ filterWeights[f][c][r][v] = NetMath::vanillasgd(netInstance, filterWeights[f][c][r][v], regularized);`
`212`	`212`
`213`	`213`	`if (net->maxNorm) net->maxNormTotal += filterWeights[f][c][r][v] * filterWeights[f][c][r][v];`
`214`	`214`	`}`
`215`	`215`	`}`
`216`	`216`	`}`
`217`		`- biases[f] = NetMath::vanillaupdatefn(netInstance, biases[f], deltaBiases[f]);`
	`217`	`+ biases[f] = NetMath::vanillasgd(netInstance, biases[f], deltaBiases[f]);`
`218`	`218`	`}`
`219`	`219`	`break;`
`220`	`220`	`case 1: // gain`
`@@ -318,4 +318,14 @@ void ConvLayer::applyDeltaWeights (void) {`
`318`	`318`	`net->maxNormTotal = sqrt(net->maxNormTotal);`
`319`	`319`	`NetMath::maxNorm(netInstance);`
`320`	`320`	`}`
`321`		`-}`
	`321`	`+}`
	`322`	`+`
	`323`	`+void ConvLayer::backUpValidation (void) {`
	`324`	`+ validationBiases = biases;`
	`325`	`+ validationFilterWeights = filterWeights;`
	`326`	`+}`
	`327`	`+`
	`328`	`+void ConvLayer::restoreValidation (void) {`
	`329`	`+ biases = validationBiases;`
	`330`	`+ filterWeights = validationFilterWeights;`
	`331`	`+}`