Implementation of a neural network from first principles. This is a very basic network that uses a single output node (i.e. binary classification). The network can take any number of inputs and an arbitrary number of hidden layers and nodes in those hidden layers. Optimisation is implemented through mini-batch gradient descent. A sigmoid activation function is applied to the hidden layers as well as the output layer. Since we are making a binary classification, the output from the sigmoid function on the final layer is interpreted in terms of probabilities.