Welcome to Kur! You've found the future of deep learning!
- Install Kur easily with
pip install kur
. - Design, train, and evaluate models without ever needing to code.
- Describe your model with easily understandable concepts.
- Quickly explore better versions of your model with the power of the Jinja2 templating engine.
- Supports Theano, TensorFlow, and PyTorch, and supports multi-GPU out-of-the-box.
- COMING SOON: Share your models with the community, making it incredibly easy to collaborate on sophisticated models.
Go ahead and give it a whirl: Get the Code and then jump into the Examples! Then build your own model in our Tutorial. Remember to check out our homepage for complete documentation and the newest news.
Like us? Share!
Kur is a system for quickly building and applying state-of-the-art deep learning models to new and exciting problems. Kur was designed to appeal to the entire machine learning community, from novices to veterans. It uses specification files that are simple to read and author, meaning that you can get started building sophisticated models without ever needing to code. Even so, Kur exposes a friendly and extensible API to support advanced deep learning architectures or workflows. Excited? Jump straight into the Examples.
Kur is really easy to install! You can pick either one of these two options for installing Kur.
NOTE: Kur requires Python 3.4 or greater. Take a look at our installation guide for step-by-step instructions for installing Kur and setting up a virtual environment.
If you know what you are doing, then this is easy:
pip install kur
Just check it out and run the setup script:
git clone https://github.com/deepgram/kur
cd kur
pip install .
Quick Start: Or, if you already have Python 3 installed, then here's a few quick-start lines to get you training your first model:
Quick Start For Using pip:
pip install virtualenv # Make sure virtualenv is present
virtualenv -p $(which python3) ~/kur-env # Create a Python 3 environment for Kur
. ~/kur-env/bin/activate # Activate the Kur environment
pip install kur # Install Kur
kur --version # Check that everything works
git clone https://github.com/deepgram/kur # Get the examples
cd kur/examples # Change directories
kur train mnist.yml # Start training!
Quick Start For Using git:
pip install virtualenv # Make sure virtualenv is present
virtualenv -p $(which python3) ~/kur-env # Create a Python 3 environment for Kur
. ~/kur-env/bin/activate # Activate the Kur environment
git clone https://github.com/deepgram/kur # Check out the latest code
cd kur # Change directories
pip install . # Install Kur
kur --version # Check that everything works
cd examples # Change directories
kur train mnist.yml # Start training!
If everything has gone well, you shoud be able to use Kur:
kur --version
You'll typically be using Kur in commands like kur train model.yml
or kur
test model.yml
. You'll see these in the Examples, which is
where you should head to next!
If you run into any problems installing or using Kur, please check out our troubleshooting page for lots of useful help. And if you want more detailed installation instructions, with help on setting up your environment, before sure to see our installation page.
Let's look at some examples of how fun and easy Kur makes state-of-the-art deep learning.
Let's jump right in and see how awesome Kur is! The first example we'll look at is Yann LeCun's MNIST dataset. This is a dataset of 28x28 pixel images of individual handwritten digits between 0 and 9. The goal of our model will be to perform image recognition, tagging the image with the most likely digit it represents.
NOTE: As with most command line examples, lines preceded by $
are lines
that you are supposed to type (followed by the ENTER
key). Lines without an
initial $
are lines which are printed to the screen (you don't type them).
First, you need to Get the Code! If you installed via
pip
, you'll need to checkout the examples
directory from the
repository, like this:
git clone https://github.com/deepgram/kur
cd kur/examples
If you installed via git
, then you alreay have the examples
directory
locally, so just move into the example directory:
$ cd examples
Now let's train the MNIST model. This will download the data directly from the web, and then start training for 10 epochs.
$ kur train mnist.yml
Downloading: 100%|█████████████████████████████████| 9.91M/9.91M [03:44<00:00, 44.2Kbytes/s]
Downloading: 100%|█████████████████████████████████| 28.9K/28.9K [00:00<00:00, 66.1Kbytes/s]
Downloading: 100%|█████████████████████████████████| 1.65M/1.65M [00:31<00:00, 52.6Kbytes/s]
Downloading: 100%|█████████████████████████████████| 4.54K/4.54K [00:00<00:00, 19.8Kbytes/s]
Epoch 1/10, loss=1.524: 100%|███████████████████████| 480/480 [00:02<00:00, 254.97samples/s]
Validating, loss=0.829: 100%|█████████████████████| 3200/3200 [00:03<00:00, 889.91samples/s]
Epoch 2/10, loss=0.628: 100%|███████████████████████| 480/480 [00:02<00:00, 228.25samples/s]
Validating, loss=0.533: 100%|████████████████████| 3200/3200 [00:03<00:00, 1046.12samples/s]
Epoch 3/10, loss=0.547: 100%|███████████████████████| 480/480 [00:02<00:00, 185.77samples/s]
Validating, loss=0.491: 100%|████████████████████| 3200/3200 [00:03<00:00, 1030.57samples/s]
Epoch 4/10, loss=0.488: 100%|███████████████████████| 480/480 [00:02<00:00, 225.42samples/s]
Validating, loss=0.443: 100%|████████████████████| 3200/3200 [00:03<00:00, 1046.23samples/s]
Epoch 5/10, loss=0.464: 100%|███████████████████████| 480/480 [00:03<00:00, 115.17samples/s]
Validating, loss=0.403: 100%|█████████████████████| 3200/3200 [00:04<00:00, 799.46samples/s]
Epoch 6/10, loss=0.486: 100%|███████████████████████| 480/480 [00:03<00:00, 183.11samples/s]
Validating, loss=0.400: 100%|████████████████████| 3200/3200 [00:02<00:00, 1134.17samples/s]
Epoch 7/10, loss=0.369: 100%|███████████████████████| 480/480 [00:02<00:00, 214.10samples/s]
Validating, loss=0.366: 100%|█████████████████████| 3200/3200 [00:04<00:00, 735.61samples/s]
Epoch 8/10, loss=0.353: 100%|███████████████████████| 480/480 [00:03<00:00, 204.33samples/s]
Validating, loss=0.351: 100%|████████████████████| 3200/3200 [00:02<00:00, 1147.05samples/s]
Epoch 9/10, loss=0.399: 100%|███████████████████████| 480/480 [00:02<00:00, 219.17samples/s]
Validating, loss=0.343: 100%|████████████████████| 3200/3200 [00:02<00:00, 1149.07samples/s]
Epoch 10/10, loss=0.307: 100%|██████████████████████| 480/480 [00:02<00:00, 220.97samples/s]
Validating, loss=0.324: 100%|████████████████████| 3200/3200 [00:02<00:00, 1142.78samples/s]
What just happened? Kur downloaded the MNIST dataset from LeCun's website, and then trained a model for ten epochs. Awesome!
Now let's see how well our model actually performs:
$ kur evaluate mnist.yml
Evaluating: 100%|██████████████████████████████| 10000/10000 [00:06<00:00, 1537.74samples/s]
LABEL CORRECT TOTAL ACCURACY
0 969 980 98.9%
1 1118 1135 98.5%
2 910 1032 88.2%
3 926 1010 91.7%
4 923 982 94.0%
5 735 892 82.4%
6 871 958 90.9%
7 884 1028 86.0%
8 818 974 84.0%
9 868 1009 86.0%
ALL 9022 10000 90.2%
Wow! Across the board, we already have 90% accuracy for recognizing handwritten digits, and we only used 0.8% of the training set! That's how awesome Kur is.
Excited yet? Read on!
NOTE: Clever readers will notice that each training epoch only used 480 training samples. But MNIST provides 60,000 training samples total, so what gives? Simple: lots of us are running this code on consumer hardware; in fact, I'm running this example on my tiny ultrabook on an Intel Core m7 CPU. As you'll see in Under the Hood, I truncate the training process to only train on 10 batches of 32 samples each, just to make the training loop finish in a reasonable amount of time. It's not cheating: you still get 90% accuracy! But if you have awesome hardware, or just want to see how good your accuracy can get, then by all means read on and we'll show you how to modify that.
So what exactly is going on here? Let's take a look at the MNIST example specification file:
train:
data:
- mnist:
images:
url: "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
labels:
url: "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
model:
- input: images
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
include: mnist-defaults.yml
This is just plain, old YAML, a markup language meant to be easy for humans to interpret (for a good overview of YAML language features, look at the Ansible overview).
There's a section to put the data. That's this:
train:
data:
- mnist:
images:
url: "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
labels:
url: "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
And then there's a spot to define your model:
model:
- input: images
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
And there is an "include" part that just contains some default settings (advanced users might want to tweak these---don't worry, it's still simple):
include: mnist-defaults.yml
Very simple! Kur downloaded our data directly from LeCun's website for us, that's easy. But what goes into in a Kur model? Just a nice, gentle list of things you want your deep learning model to do. Let's break it down:
- We have an
input
calledimages
(yep, it's the sameimages
from ourtrain
section). - We pass the input to a
convolution
layer. - We add a regularized linear unit ("ReLU") activation.
- We collapse (
flatten
) the high-dimensional output of a convolution into a nice, flat, 1-dimensional shape appropriate for sending into the fully-connected layers. - We add a fully-connected (
dense
) layer with 10 outputs. - We add a softmax activation (appropriate for classification tasks like MNIST),
and mark it as producing labels (
name: labels
).
And that's it! It's pretty naïve: one convolution + activation + fully-connected + activation. But it works: we got 90% accuracy after only showing it a small subset of the training set.
But let's think about make it more complicated. What if we want two
convolutional layers instead? Easy! Just add another convolution
section to
the model. We'll also add in another non-linearity (ReLU activation) between
the two convolutions.
model:
- input: images
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
We can also add more dense (fully-connected) layers. You probably want them separated by activation layers, too. So if we add a 32-node fully-connected layer to our model, it now looks like this:
model:
- input: images
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- flatten:
- dense: 32
- activation: relu
- dense: 10
- activation: softmax
name: labels
Let's give it a try! Save your changes, a just run the same kur train
mnist.yml
and kur evaluate mnist.yml
commands from before.
NOTE: A more complex model will likely need more data. So be sure to look at the tip in More Advanced Things to train on more of the data set.
If you want to know more, the YAML specification that Kur uses is described in greater detail in our Using Kur page.
The one line in the mnist.yml
specification that we didn't cover is the
include: mnist-defaults.yml
line. This is just a convenient way for us to
separate out the default behavior of the MNIST example.
If you tweak this file, probably the big thing you want to remove is the
num_batches: 10
line, which is what limits training to just the first 10
batches every epoch. Just delete the line or comment it out, and Kur will train
on the whole dataset.
90% is pretty good! But can we do better? Absolutely! Let's see how.
We need to build a more expressive, deeper model. We will use more convolutional layers, with occassional pooling layers.
model:
- input: images
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- convolution:
kernels: 96
size: [3, 3]
- activation: relu
- pool: [3, 3]
- convolution:
kernels: 96
size: [3, 3]
- activation: relu
- flatten:
- dense: [64, 10]
- activation: softmax
name: labels
So we have three convolutions with a 3-by-3 pooling layer in the middle, and
two fully-connected layers. Try training this model: kur train mnist.yml
.
Then evaluate it to see how it does: kur eval mnist.yml
. We got better than
95% by training on only 0.8% of the training set.
What happens if we give it more data? Like we mentioned above, we can
adjust the amount of data we give Kur by twiddling the num_batches
entry in
the train
section of mnist-defaults.yml
. Let's try using 5% of the
dataset. To do this, we'll set num_batches: 94
(because 5% of 60,000 is
3000, and for the default batch size of 32, this comes out to about 94
batches). Now try training and evaluating again. We got almost 98%!
Don't stop now, let's train on the whole thing (just remove the num_batches
line altogether, or set num_batches: null
). Still training only 10 epochs,
we got 98.6%. Wow. Let's compare this to state of the art, which Yann LeCun
tracks on the MNIST website. It looks
like the best error rate also uses convolutions and achieved a 0.23% error rate
(so 99.77% accuracy). With just a couple tweaks, we are already only a percent
away from the world's best. Kur rocks.
Okay, MNIST was pretty cool, but Kur can do much, much more. Imagine if you wanted to have an arbitrary number of convolution layers. Imagine if each convolution should have a different number of kernels. Imagine if you truly want flexibility. You've come to the right place.
Kur uses an engine to determine how do variable substitution. Jinja2 is the default templating engine, and it is very powerful and extensible. Let's see how to use it!
Let's look at the CIFAR-10 dataset. This is a image classification dataset of small 32 by 32 pixel color (RGB) images, each with one of ten classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). You might decide to start with a very similar model to the MNIST example:
model:
- input: images
- convolution:
kernels: 64
size: [3, 3]
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
We will start with a simple modification: let's make the convolution size a variable, so we can easily change it later. We can do it like this:
settings:
cnn:
size: [3, 3]
model:
- input: images
- convolution:
kernels: 64
size: "{{ cnn.size }}"
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
Okay, what just happened? First, we added a settings:
section. This section
is the appropriate place to declare variables, settings, and hyperparameters
that will be used by the model (or for training, evaluation, etc.). We declared
a variable named cnn
with a nested size
variable. In Python, this would
be equivalent to a dictionary: {"cnn": {"size": [3, 3]}}
.
Then we used the variable in the model's convolution layer: size: "{{
cnn.size }}"
. This is standard Jinja2 grammar. The double-brackets indicate
that variable substitution should take place (without the brackets, we would
accidently assign size
to the literal string "cnn.size", which doesn't make
sense). The variable we grab is cnn.size
, corresponding to the variables we
added in the settings
section.
Cool! So we can use variables now. But how does that help us? It seems like we just made it more complicated. Well, let's imagine if we added another convolution layer. We already know how to add extra convolutions by just adding another convolution block (and usually you want another activation: relu layer, too). So this would look like:
settings:
cnn:
size: [3, 3]
model:
- input: images
- convolution:
kernels: 64
size: "{{ cnn.size }}"
- activation: relu
- convolution:
kernels: 64
size: "{{ cnn.size }}"
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
Ah! So now we can see why variablizing the convolution size was nice: if we want to play with a model that uses different size kernels, we only need to edit one line instead of two.
But there are still two problems we might encounter:
- What if we wanted to try out lots of models with different numbers of convolutions?
- What if we wanted to use different
size
orkernel
values in each convolution?
Kur can do it!
Let's address the first problem: what if we want to make the number of convolutions? Kur supports many "meta-layers" that it calls "operators." A very simple operator is the classic "for" loop. This allows us to add many convolution + activation layers at once. It looks like this:
settings:
cnn:
size: [3, 3]
model:
- input: images
- for:
range: 2
iterate:
- convolution:
kernels: 64
size: "{{ cnn.size }}"
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
This is equivalent to the version without the "for" loop. The for:
loop
tells us to do everything in the iterate:
section twice. (Why twice?
Because range: 2
.) And of course, we can variabilize the number of
iterations like this:
settings:
cnn:
size: [3, 3]
layers: 2
model:
- input: images
- for:
range: "{{ cnn.layers }}"
iterate:
- convolution:
kernels: 64
size: "{{ cnn.size }}"
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
Think about this for a minute. Does it make sense? It should. The model looks like this:
- An
input
layer of images. - A number of
convolution
andactivation
layers. How many?cnn.layers
, so 2. - The rest of the model is as expected: a dense operation followed by an activation.
So we solved the problem of allowing for a variable number of convolutions. But
what if each convolution should use a different number of kernels (or sizes,
etc.)? Well, Kur can happily handle this, too. In fact, the for:
loop
already does most of the work. Every for:
loop creates its own "local"
variable to let you know which iteration it is on. The default name for this
variable is index
. So if we want to use a different number of kernels for
each convolution, we can do this:
settings:
cnn:
size: [3, 3]
kernels: [64, 32]
layers: 2
model:
- input: images
- for:
range: "{{ cnn.layers }}"
iterate:
- convolution:
kernels: "{{ cnn.kernels[index] }}"
size: "{{ cnn.size }}"
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
Again, this is just Jinja2 substitution: we are asking for the index
-th
element of the cnn.kernels
list. Each iteration of the for:
loop
therefore grabs a different value for kernels:
. Cool, huh?
But we can do one better.
The annoying thing about our current model is that nothing forces the layers
value to be the same as the length of the kernels
variable. If you make
really long (like, length seventeen) but leave layers
at two, you probably
made a mistake. (Why did you put in seventeen layers but then only use the first
two in the loop?) What you really want is to make sure that layers
is set to
the length of the kernels
list. Or put another way, you want add as many
convolutions as you have kernels in the list.
Jinja2 supports a concept called "filters," which are basically functions that you can apply to objects. You can even define your own filters. But what we want right now is a way to get the length of a list. It's easy and it looks like this:
settings:
cnn:
size: [3, 3]
kernels: [64, 32]
model:
- input: images
- for:
range: "{{ cnn.kernels|length }}"
iterate:
- convolution:
kernels: "{{ cnn.kernels[index] }}"
size: "{{ cnn.size }}"
- activation: relu
- flatten:
- dense: 10
- activation: softmax
name: labels
You'll notice that the layers
variable is gone, and we have this funky
|length
thing in the "for" loop's range
. This is standard Jinja2: the
length
filter returns the length of a list. So now we are asking the "for"
loop to iterate as many times as we have another kernel size.
This is really cool if you think about it. You want to add another convolution
to the network? All you do is add it's size to the kernels
list. And
look! You're model is now more general, more reuseable. You could have used
the same model for MNIST! Or CIFAR! Or many different applications.
This is the heart of the Kur philosophy: you should describe your model once and simply. The specification describes* your model: a bunch of convolutions and then a fully-connected layer. You can specify the details (how many convolutions, their parameters, etc.) elsewhere. The model should stay elegant.
NOTE: Of course, it isn't always easy to write reusable models. And the learning curve can get in the way. When we say that models should be "simple," we don't mean that you don't need to think about it. We mean that it should be simple to use, simple to modify, and simple to share. A more general model is elegant: making changes to it is easy (you only modify the settings). And this makes it easier to reuse in new contexts or to share with the community. Simplicity is power.
Great, we now have a simple, but powerful and general model. Let's train it. As
before, you'll need to cd examples
first.
kur train cifar.yml
Again, evaluation is just as simple:
kur evaluate cifar.yml
The cifar.yml
specification file is more complicated than the MNIST one,
mostly to expose you to some more knobs you can tweak. For example, you'll see
these lines in the train
section:
provider:
batch_size: 32
num_batches: 2
As in the MNIST case, num_batches
tells Kur to only train on that many
batches of data each epoch (mostly so that if you don't have a nice GPU, the
example still finishes in a reasonable amount of time). The batch_size
value
indicates the number of training samples that should be used in each batch.
The train
section also has a log: cifar-log
line. This tells Kur to
save a log file to cifar-log
(in the current working directory). This log
contains lots of interesting information about current training loss, batch
loss, and the number of epochs. By default, they are binary-encoded files, but
you can load them using the Kur API (in Python 3):
from kur.loggers import BinaryLogger
data = BinaryLogger.load_column(LOG_PATH, STATISTIC)
where LOG_PATH
is the path to the log file (e.g., cifar-log
) and
STATISTIC
is one of the logged statistics. data
will be a Numpy array. To find available statistics, just list the
available files in the LOG_PATH
, like this:
$ ls cifar-log
training_loss_labels
training_loss_total
validation_loss_labels
validation_loss_total
For an example of using this log data, see our Tutorial.
Another difference from the MNIST examples is that there are more files
referring to weights in the CIFAR specification. For example, in the
validate
section there is:
weights: cifar.best.valid.w
This tells Kur to save the best models weights (corresponding to the lowest
loss on the validation set) to cifar.best.valid.w
. Similarly, in the
train
section there is this:
weights:
initial: cifar.best.valid.w
save_best: cifar.best.train.w
last: cifar.last.w
The initial
key tells Kur to try and load cifar.best.valid.w
(the best
weights with respect to the validation loss) at the beginning of training. If
this file doesn't exist, nothing happens. This means that if you run the
training cycle many times (with many calls to kur train cifar.yml
), you
always "restart" from the best model weights.
We are also saving the best weights (with respect to the training loss) to
cifar.best.train.w
. The most recent weights are saved to cifar.last.w
.
NOTE: The weights depend on the model architecture. Say you you train CIFAR
and produce cifar.best.valid.w
. Then you tweak the model in the
specification file. If you try to resume training (kur train cifar.yml
),
Kur will try to load cifar.best.valid.w
. But the weights many not fit the
new architecture! So, to be safe, you should always delete (or backup) your
weight files before trying to train a fresh, tweaked model. In a production
environment, you probably want to have different sub-directories for each
variation/tweak to the model so that you never run into this problem.
The CIFAR-10 example also explicitly specifies an optimizer in the train
section:
optimizer:
name: adam
learning_rate: 0.001
The optimizer function is set in the name
field and all other parameters
(such as learning_rate
) are defined in the other fields. You can safely
change the optimizer without breaking backwards-compatibility with older weight
files.