realize image style transfer using basic components of neural network According to UCAS course: Intelligent Computing Systems
Basic NN Components layers1
The partial derivative of
The loss function of softmax layer is defined as:
Considering batch processing:
The partial derivative of
Considering batch processing:
Demo1 MNIST Classification
import struct
import numpy as np
MNIST_DIR = "../mnist_data"
TRAIN_DATA = "train-images-idx3-ubyte"
TRAIN_LABEL = "train-labels-idx1-ubyte"
TEST_DATA = "t10k-images-idx3-ubyte"
TEST_LABEL = "t10k-labels-idx1-ubyte"
def load_mnist(file_dir, is_images = 'True'):
# Read binary data
bin_file = open(file_dir, 'rb')
bin_data =
# Analyze file header
if is_images:
# Read images
fmt_header = '>iiii'
magic, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, 0)
# Read labels
fmt_header = '>ii'
magic, num_images = struct.unpack_from(fmt_header, bin_data, 0)
num_rows, num_cols = 1, 1
data_size = num_images * num_rows * num_cols
mat_data = struct.unpack_from('>' + str(data_size) + 'B', bin_data, struct.calcsize(fmt_header))
mat_data = np.reshape(mat_data, [num_images, num_rows * num_cols])
print('Load images from %s, number: %d, data shape: %s' % (file_dir, num_images, str(mat_data.shape)))
return mat_data
train_images = load_mnist(TRAIN_DATA, True)
train_labels = load_mnist(TRAIN_LABEL, False)
test_images = load_mnist(TEST_DATA, True)
test_labels = load_mnist(TEST_LABEL, False)
Basic CNN Components layers2
In this section,we use VGG19 instead of VGG16.
Name | Type | Kernel Size | Stride | Padding Size | Cin | Cout | K |
conv1_1 | Conv | 3 | 1 | 1 | 3 | 64 | 224 |
conv1_2 | Conv | 3 | 1 | 1 | 64 | 64 | 224 |
pool1 | MaxPool | 2 | 2 | - | 64 | 64 | 112 |
conv2_1 | Conv | 3 | 1 | 1 | 64 | 128 | 112 |
conv2_2 | Conv | 3 | 1 | 1 | 128 | 128 | 112 |
pool2 | MaxPool | 2 | 2 | - | 128 | 128 | 56 |
conv3_1 | Conv | 3 | 1 | 1 | 128 | 256 | 56 |
conv3_2 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
conv3_3 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
conv3_4 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
pool3 | MaxPool | 2 | 2 | - | 256 | 256 | 28 |
conv4_1 | Conv | 3 | 1 | 1 | 256 | 512 | 28 |
conv4_2 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
conv4_3 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
conv4_4 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
pool4 | MaxPool | 2 | 2 | - | 512 | 512 | 14 |
conv5_1 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
conv5_2 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
conv5_3 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
conv5_4 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
pool5 | MaxPool | 2 | 2 | - | 512 | 512 | 7 |
fc6 | FCL | - | - | - | 512*7*7 | 4096 | 1 |
fc7 | FCL | - | - | - | 4096 | 4096 | 1 |
fc8 | FCL | - | - | - | 4096 | 1000 | 1 |
softmax | Softmax | - | - | - | - | - | - |
Convolution Kernel
The input feature map
To obtain expected output in each layer, after image padding:
$$\boldsymbol{X}{pad}(n,c{in},h,w)=\begin{cases} \boldsymbol{X}(n,c_{in},h-p,w-p) &p\le{h}\le{p+H_{in}},p\le{w}\le{p+W_{in}}\ 0 &otherwise \end{cases}$$
Apply convolution operation to
$$\nabla_{\boldsymbol{W}(c_{in},k_h,k_w,c_{out})}L=\sum\limits_{n,h,w}\nabla_{\boldsymbol{Y}(n,c_{out},h,w)}L\boldsymbol{X}{pad}(n,c{in},hs+k_h,ws+k_w) $$
The input of max pooling
Standard model can be acquired from vgg,so official dataset is unnecessary. Code for loading test pictures as follows:
def load_image(image_dir):
input_image = scipy.misc.imread(image_dir)
input_image = scipy.misc.imresize(input_image,[224,224,3]) #unifies the size of the input
input_image = np.array(input_image).astype(np.float32) #quantification
input_image -= image_mean #separately calculated
input_image = np.reshape(input_image,[1]+list(input_image.shape)) #input dim:[N=1,height=224,width=224,channel=3]
input_image = np.transpose(input_image,[0,3,1,2]) #input dim:[N=1,channel=3,height=224,width=224]
Classification result id=281,class category refers to here
Demo3 Image Style Transfer(not real-time)layers3
To train neural network,batch random gradient descent is used to update network parameters.In experiment,Adam algorithom is used instead of batch random gradient descent, because it converges faster.
Parameter updating:
Note: It will cost a lot of time to process images(about one hour each epoch). Model acceleration will be considered in the future.