tensorflow学习笔记（5）卷积神经网络（CNN）

文明世界拼图 2020-01-19

展开全文

对比http://blog.csdn.net/piaoxuezhong/article/details/78916872的结果，softmax分类器准确性优于两层神经网络结构的结果，之前在cs231n课程中，老师提到了这一点，神经网络层数达到一定复杂度后，神经网络才能发挥出比较大的优越性能，本篇使用TensorFlow实现卷积神经网络（CNN），测试一下效果。

测试数据仍是MNIST数据集，具体说明详见上一篇。

先附一下程序代码，然后详解：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import tensorflow as tf
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#设置学习率，迭代步数，批大小
learning_rate = 0.001
num_steps = 500
batch_size = 128
display_step = 50
# Network Parameters
num_input = 784 #MNIST图片尺寸: 28*28)
num_classes = 10 #MNIST分类数 (数字0-9)  
dropout = 0.75 # cnn中的丢弃神经元概率
# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32) # dropout (保留概率保存)
# Conv2D, relu激励函数
def conv2d(x, W, b, strides=1):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)
def maxpool2d(x, k=2):
    # 池化
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')
# Create model
def conv_net(x, weights, biases, dropout):
    # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
    # Reshape to match picture format [Height x Width x Channel]
    # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
    x = tf.reshape(x, shape=[-1, 28, 28, 1])#-1表示该维度值由其他维的值和总的值决定，这里代表一次输入的数量
    # 第一个卷积
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # 池化，降采样
    conv1 = maxpool2d(conv1, k=2)
    # 第二个卷积
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # 池化，降采样
    conv2 = maxpool2d(conv2, k=2)
    # 全连接层，先对conv2的输出进行变形
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # 随机失活
    fc1 = tf.nn.dropout(fc1, dropout)
    # 输出分类结果
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out
# weights和biases设置，两者是对应的
weights = {
    # 卷积1：5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 卷积2：5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # 全连接层, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 输出层：1024 inputs, 10 outputs
    'out': tf.Variable(tf.random_normal([1024, num_classes]))
}
biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([num_classes]))
}
# 构建卷积实例
logits = conv_net(X, weights, biases, keep_prob)
prediction = tf.nn.softmax(logits)
# 定义损失函数和最优化方法
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
# 模型评估
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#参数全部初始化
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.8})
        if step % display_step == 0 or step == 1:
            # 计算批损失和准确度
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y,
                                                                 keep_prob: 1.0})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))
    print("Optimization Finished!")
    # 计算测试集的准确度
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
                                      Y: mnist.test.labels[:256],
keep_prob: 1.0}))

执行结果：

Step 1, Minibatch Loss= 62928.7383, Training Accuracy= 0.141
Step 50, Minibatch Loss= 3887.7532, Training Accuracy= 0.773
Step 100, Minibatch Loss= 2378.7659, Training Accuracy= 0.867
Step 150, Minibatch Loss= 932.5720, Training Accuracy= 0.938
Step 200, Minibatch Loss= 1575.1481, Training Accuracy= 0.922
Step 250, Minibatch Loss= 1262.4065, Training Accuracy= 0.945
Step 300, Minibatch Loss= 810.5175, Training Accuracy= 0.930
Step 350, Minibatch Loss= 195.0972, Training Accuracy= 0.961
Step 400, Minibatch Loss= 1277.8181, Training Accuracy= 0.922
Step 450, Minibatch Loss= 301.4168, Training Accuracy= 0.961
Step 500, Minibatch Loss= 362.0416, Training Accuracy= 0.961
Optimization Finished!
Testing Accuracy: 0.984375

函数总的实现功能：该CNN网络对输入图像进行两次卷积和池化操作，然后是全连接层，最后是输出层。
输入的图像形状为【N,784】,N表示一次输入图片的个数，这是一种批处理方法。
为了满足卷积函数的输入的需要，reshape函数将输入变为【N,28,28,1】；
然后conv2d函数进行卷积，卷积核操作的结果是【N,28,28,32】；然后进行池化，图像形状变为【N,14,14,32】；
然后第二次卷积和池化操作，到图像形状变为【N,7,7,64】；
然后是全连接，全连接首先将输入形状改变为【N,7*7*64】，全连接之后变为【N,1024】，
最后是输出层，尺寸变为【N,10】,结果是输入的N张图片的标签。

函数的具体实现方式：

程序中定义了weights和bias两个字典数据，weights中wc1,wc2,wd1,out键值分别对应卷积层的卷积核，全连接层和输出层的尺寸；

def conv2d(x, W, b, strides=1):...函数定义卷积操作；

def maxpool2d(x, k=2):...函数定义池化操作；

def conv_net(x, weights, biases, dropout):...函数定义该函数的卷积模型结构：对输入图像进行两次卷积和池化操作，然后是全连接层，最后是输出层；具体详见代码中的步骤说明：

其中语句：conv1 = conv2d(x, weights['wc1'], biases['bc1'])，卷积层对每个5 * 5的patch计算出32个特征映射(feature map)，它的权值tensor:wc1为[5, 5, 1, 32]. 前两维是patch的大小，第三维是输入通道的数目，最后一维是输出通道的数目，并对每个输出通道加上偏置(bias)。

fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)

对应的是全连接层，此时图像尺寸被缩减为7 * 7，加入神经元数目为1024的全连接层，将最后池化层的输出结果尺寸变为一维向量，与权值相乘，并加上偏置bd1，结果输入ReLu函数激励。

其他操作大都和之前的类似，这里附录两个主要函数的说明：

（1）tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)

input：需要做卷积的输入图像，要求是一个4维的Tensor，shape为[batch, in_height, in_width, in_channels]，分别是指[一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，要求类型为float32或float64；
filter：相当于CNN中的卷积核，要求是一个4维的Tensor，shape为[filter_height, filter_width, in_channels, out_channels]，具体是指[卷积核高度，卷积核宽度，图像通道数，卷积核个数]，其第三维in_channels就是参数input的第四维；
strides：卷积时在图像每一维上的步长，一维的向量，长度为4；
padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式；
use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true；
结果返回一个Tensor，即feature map，shape为[batch, height, width, channels]。

（2）tf.nn.max_pool(value, ksize, strides, padding, name=None)
value：需要池化的输入，一般池化层接在卷积层后面，所以输入通常是feature map，shape为：[batch, height, width, channels]；
ksize：池化窗口的大小，一般是[1, height, width, 1]四维向量，因为不在batch和channels上做池化，所以这两个维度设为1；
strides：和卷积类似，窗口在每一个维度上滑动的步长，一般是[1, stride,stride, 1]；
padding：和卷积类似，可以取'VALID' 或'SAME'；
返回一个Tensor，类型不变，shape为[batch, height, width, channels]。

从执行结果来看，该CNN的准确度能达到：0.984375，比之前的softmax分类和简单的神经网络分类效果要好。

参考：

https://www./get_started/mnist/pros

http://www./tfdoc/tutorials/mnist_pros.html

http://www./tensorflow-learning-notes-2.html

http://lib.csdn.net/article/aimachinelearning/61475