对比http://blog.csdn.net/piaoxuezhong/article/details/78916872的结果,softmax分类器准确性优于两层神经网络结构的结果,之前在cs231n课程中,老师提到了这一点,神经网络层数达到一定复杂度后,神经网络才能发挥出比较大的优越性能,本篇使用TensorFlow实现卷积神经网络(CNN),测试一下效果。
测试数据仍是MNIST数据集,具体说明详见上一篇。
先附一下程序代码,然后详解:
from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) num_input = 784 #MNIST图片尺寸: 28*28) num_classes = 10 #MNIST分类数 (数字0-9) dropout = 0.75 # cnn中的丢弃神经元概率 X = tf.placeholder(tf.float32, [None, num_input]) Y = tf.placeholder(tf.float32, [None, num_classes]) keep_prob = tf.placeholder(tf.float32) # dropout (保留概率保存) def conv2d(x, W, b, strides=1): x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], def conv_net(x, weights, biases, dropout): # MNIST data input is a 1-D vector of 784 features (28*28 pixels) # Reshape to match picture format [Height x Width x Channel] # Tensor input become 4-D: [Batch Size, Height, Width, Channel] x = tf.reshape(x, shape=[-1, 28, 28, 1])#-1表示该维度值由其他维的值和总的值决定,这里代表一次输入的数量 conv1 = conv2d(x, weights['wc1'], biases['bc1']) conv1 = maxpool2d(conv1, k=2) conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) conv2 = maxpool2d(conv2, k=2) fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]]) fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1']) fc1 = tf.nn.dropout(fc1, dropout) out = tf.add(tf.matmul(fc1, weights['out']), biases['out']) # weights和biases设置,两者是对应的 # 卷积1:5x5 conv, 1 input, 32 outputs 'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])), # 卷积2:5x5 conv, 32 inputs, 64 outputs 'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])), # 全连接层, 7*7*64 inputs, 1024 outputs 'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])), # 输出层:1024 inputs, 10 outputs 'out': tf.Variable(tf.random_normal([1024, num_classes])) 'bc1': tf.Variable(tf.random_normal([32])), 'bc2': tf.Variable(tf.random_normal([64])), 'bd1': tf.Variable(tf.random_normal([1024])), 'out': tf.Variable(tf.random_normal([num_classes])) logits = conv_net(X, weights, biases, keep_prob) prediction = tf.nn.softmax(logits) loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) train_op = optimizer.minimize(loss_op) correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) init = tf.global_variables_initializer() with tf.Session() as sess: for step in range(1, num_steps+1): batch_x, batch_y = mnist.train.next_batch(batch_size) sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.8}) if step % display_step == 0 or step == 1: loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x, print("Step " + str(step) + ", Minibatch Loss= " + \ "{:.4f}".format(loss) + ", Training Accuracy= " + \ print("Optimization Finished!") print("Testing Accuracy:", \ sess.run(accuracy, feed_dict={X: mnist.test.images[:256], Y: mnist.test.labels[:256],
执行结果:
Step 1, Minibatch Loss= 62928.7383, Training Accuracy= 0.141 Step 50, Minibatch Loss= 3887.7532, Training Accuracy= 0.773 Step 100, Minibatch Loss= 2378.7659, Training Accuracy= 0.867 Step 150, Minibatch Loss= 932.5720, Training Accuracy= 0.938 Step 200, Minibatch Loss= 1575.1481, Training Accuracy= 0.922 Step 250, Minibatch Loss= 1262.4065, Training Accuracy= 0.945 Step 300, Minibatch Loss= 810.5175, Training Accuracy= 0.930 Step 350, Minibatch Loss= 195.0972, Training Accuracy= 0.961 Step 400, Minibatch Loss= 1277.8181, Training Accuracy= 0.922 Step 450, Minibatch Loss= 301.4168, Training Accuracy= 0.961 Step 500, Minibatch Loss= 362.0416, Training Accuracy= 0.961 Testing Accuracy: 0.984375
函数总的实现功能:该CNN网络对输入图像进行两次卷积和池化操作,然后是全连接层,最后是输出层。
输入的图像形状为【N,784】,N表示一次输入图片的个数,这是一种批处理方法。
为了满足卷积函数的输入的需要,reshape函数将输入变为【N,28,28,1】;
然后conv2d函数进行卷积,卷积核操作的结果是【N,28,28,32】;然后进行池化,图像形状变为【N,14,14,32】;
然后第二次卷积和池化操作,到图像形状变为【N,7,7,64】;
然后是全连接,全连接首先将输入形状改变为【N,7*7*64】,全连接之后变为【N,1024】,
最后是输出层,尺寸变为【N,10】,结果是输入的N张图片的标签。
函数的具体实现方式:
程序中定义了weights和bias两个字典数据,weights中wc1,wc2,wd1,out键值分别对应卷积层的卷积核,全连接层和输出层的尺寸;
def conv2d(x, W, b, strides=1):...函数定义卷积操作;
def maxpool2d(x, k=2):...函数定义池化操作;
def conv_net(x, weights, biases, dropout):...函数定义该函数的卷积模型结构:对输入图像进行两次卷积和池化操作,然后是全连接层,最后是输出层;具体详见代码中的步骤说明:
其中语句:conv1 = conv2d(x, weights['wc1'], biases['bc1']),卷积层对每个5 * 5的patch计算出32个特征映射(feature map),它的权值tensor:wc1为[5, 5, 1, 32]. 前两维是patch的大小,第三维是输入通道的数目,最后一维是输出通道的数目,并对每个输出通道加上偏置(bias)。
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
对应的是全连接层,此时图像尺寸被缩减为7 * 7,加入神经元数目为1024的全连接层,将最后池化层的输出结果尺寸变为一维向量,与权值相乘,并加上偏置bd1,结果输入ReLu函数激励。
其他操作大都和之前的类似,这里附录两个主要函数的说明:
(1)tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)
input:需要做卷积的输入图像,要求是一个4维的Tensor,shape为[batch, in_height, in_width, in_channels],分别是指[一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],要求类型为float32或float64; filter:相当于CNN中的卷积核,要求是一个4维的Tensor,shape为[filter_height, filter_width, in_channels, out_channels],具体是指[卷积核高度,卷积核宽度,图像通道数,卷积核个数],其第三维in_channels就是参数input的第四维; strides:卷积时在图像每一维上的步长,一维的向量,长度为4; padding:string类型的量,只能是"SAME","VALID"其中之一,这个值决定了不同的卷积方式; use_cudnn_on_gpu:bool类型,是否使用cudnn加速,默认为true;
结果返回一个Tensor,即feature map,shape为[batch, height, width, channels]。
(2)tf.nn.max_pool(value, ksize, strides, padding, name=None) value:需要池化的输入,一般池化层接在卷积层后面,所以输入通常是feature map,shape为:[batch, height, width, channels]; ksize:池化窗口的大小,一般是[1, height, width, 1]四维向量,因为不在batch和channels上做池化,所以这两个维度设为1; strides:和卷积类似,窗口在每一个维度上滑动的步长,一般是[1, stride,stride, 1]; padding:和卷积类似,可以取'VALID' 或'SAME';
返回一个Tensor,类型不变,shape为[batch, height, width, channels]。
从执行结果来看,该CNN的准确度能达到:0.984375,比之前的softmax分类和简单的神经网络分类效果要好。
参考:
https://www./get_started/mnist/pros
http://www./tfdoc/tutorials/mnist_pros.html
http://www./tensorflow-learning-notes-2.html
http://lib.csdn.net/article/aimachinelearning/61475
|