realize image style transfer using basic components of neural network According to UCAS course: Intelligent Computing Systems
Basic NN Components layers1
where
Define
where
The partial derivative of
where
The loss function of softmax layer is defined as:
where
Considering batch processing:
where
The partial derivative of
Considering batch processing:
Demo1 MNIST Classification
import struct
import numpy as np
MNIST_DIR = "../mnist_data"
TRAIN_DATA = "train-images-idx3-ubyte"
TRAIN_LABEL = "train-labels-idx1-ubyte"
TEST_DATA = "t10k-images-idx3-ubyte"
TEST_LABEL = "t10k-labels-idx1-ubyte"
def load_mnist(file_dir, is_images = 'True'):
# Read binary data
bin_file = open(file_dir, 'rb')
bin_data = bin_file.read()
bin_file.close()
# Analyze file header
if is_images:
# Read images
fmt_header = '>iiii'
magic, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, 0)
else:
# Read labels
fmt_header = '>ii'
magic, num_images = struct.unpack_from(fmt_header, bin_data, 0)
num_rows, num_cols = 1, 1
data_size = num_images * num_rows * num_cols
mat_data = struct.unpack_from('>' + str(data_size) + 'B', bin_data, struct.calcsize(fmt_header))
mat_data = np.reshape(mat_data, [num_images, num_rows * num_cols])
print('Load images from %s, number: %d, data shape: %s' % (file_dir, num_images, str(mat_data.shape)))
return mat_data
train_images = load_mnist(TRAIN_DATA, True)
train_labels = load_mnist(TRAIN_LABEL, False)
test_images = load_mnist(TEST_DATA, True)
test_labels = load_mnist(TEST_LABEL, False)
Basic CNN Components layers2
In this section,we use VGG19 instead of VGG16.
Name | Type | Kernel Size | Stride | Padding Size | Cin | Cout | K |
---|---|---|---|---|---|---|---|
conv1_1 | Conv | 3 | 1 | 1 | 3 | 64 | 224 |
conv1_2 | Conv | 3 | 1 | 1 | 64 | 64 | 224 |
pool1 | MaxPool | 2 | 2 | - | 64 | 64 | 112 |
conv2_1 | Conv | 3 | 1 | 1 | 64 | 128 | 112 |
conv2_2 | Conv | 3 | 1 | 1 | 128 | 128 | 112 |
pool2 | MaxPool | 2 | 2 | - | 128 | 128 | 56 |
conv3_1 | Conv | 3 | 1 | 1 | 128 | 256 | 56 |
conv3_2 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
conv3_3 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
conv3_4 | Conv | 3 | 1 | 1 | 256 | 256 | 56 |
pool3 | MaxPool | 2 | 2 | - | 256 | 256 | 28 |
conv4_1 | Conv | 3 | 1 | 1 | 256 | 512 | 28 |
conv4_2 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
conv4_3 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
conv4_4 | Conv | 3 | 1 | 1 | 512 | 512 | 28 |
pool4 | MaxPool | 2 | 2 | - | 512 | 512 | 14 |
conv5_1 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
conv5_2 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
conv5_3 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
conv5_4 | Conv | 3 | 1 | 1 | 512 | 512 | 14 |
pool5 | MaxPool | 2 | 2 | - | 512 | 512 | 7 |
fc6 | FCL | - | - | - | 512*7*7 | 4096 | 1 |
fc7 | FCL | - | - | - | 4096 | 4096 | 1 |
fc8 | FCL | - | - | - | 4096 | 1000 | 1 |
softmax | Softmax | - | - | - | - | - | - |
Convolution Kernel
The input feature map
To obtain expected output in each layer, after image padding:
$$\boldsymbol{X}{pad}(n,c{in},h,w)=\begin{cases} \boldsymbol{X}(n,c_{in},h-p,w-p) &p\le{h}\le{p+H_{in}},p\le{w}\le{p+W_{in}}\ 0 &otherwise \end{cases}$$
Apply convolution operation to
$$\boldsymbol{Y}(n,c_{out},h,w)=\sum\limits_{c_{in}}\sum\limits_{k_h}\sum\limits_{k_w}\boldsymbol{W}(c_{in},k_h,k_w,c_{out})\boldsymbol{X}{pad}(n,c{in},hs+k_h,ws+k_w)+\boldsymbol{b}(c_{out})$$
Define
$$\nabla_{\boldsymbol{W}(c_{in},k_h,k_w,c_{out})}L=\sum\limits_{n,h,w}\nabla_{\boldsymbol{Y}(n,c_{out},h,w)}L\boldsymbol{X}{pad}(n,c{in},hs+k_h,ws+k_w) $$
The input of max pooling
Standard model can be acquired from vgg,so official dataset is unnecessary. Code for loading test pictures as follows:
def load_image(image_dir):
input_image = scipy.misc.imread(image_dir)
input_image = scipy.misc.imresize(input_image,[224,224,3]) #unifies the size of the input
input_image = np.array(input_image).astype(np.float32) #quantification
input_image -= image_mean #separately calculated
input_image = np.reshape(input_image,[1]+list(input_image.shape)) #input dim:[N=1,height=224,width=224,channel=3]
input_image = np.transpose(input_image,[0,3,1,2]) #input dim:[N=1,channel=3,height=224,width=224]
Classification result id=281,class category refers to here
Demo3 Image Style Transfer(not real-time)layers3
Suppose
Suppose
To train neural network,batch random gradient descent is used to update network parameters.In experiment,Adam algorithom is used instead of batch random gradient descent, because it converges faster.
Parameter updating:
Note: It will cost a lot of time to process images(about one hour each epoch). Model acceleration will be considered in the future.