Networks Using Blocks (VGG)

While AlexNet offered empirical evidence that deep CNNs can achieve good results, it did not provide a general template to guide subsequent researchers in designing new networks. In the following sections, we will introduce several heuristic concepts commonly used to design deep networks. With researchers moving from thinking in terms of individual neurons to whole layers, and now to blocks, repeating patterns of layers. The idea of using blocks first emerged from the Visual Geometry Group(VGG) at Oxford University, in their eponymously-named VGG network. It is easy to implement these repeated structures in code with any modern deep learning framework by using loops and subroutines.

VGG Blocks

The basic building block of classic CNNs is a sequence of the following: (i) a convolutional layer with padding to maintain the resolution, (ii) a nonlinearity such as a ReLU, (iii) a pooling layer such as a maximum pooling layer. One VGG block consists of a sequence of convolutional layers, followed by a maximum pooling layer for spatial downsampling. In the original VGG paper (Simonyan & Zisserman, 2014), the authors employed convolutions with 3 × 3 kernels with padding of 1 (keeping height and width) and 2×2 maximum pooling with stride of 2 (halving the resolution after each block). In the code below, we define a function called vgg_block to implement one VGG block.

VGG Network

The original VGG network had 5 convolutional blocks, among which the first two have one convolutional layer each and the latter three contain two convolutional layers each. The first block has 64 output channels and each subsequent block doubles the number of output channels, until that number reaches 512. Since this network uses 8 convolutional layers and 3 fully-connected layers, it is often called VGG-11.

Now get Started with the code written in python using MXNet framework

# Uninstall to remove CPU version of mxnet and used GPU
# Note to used CPU just remove the (this mxnet download with pip install mxnet)
!pip uninstall mxnet
!pip install -U mxnet-cu101==1.7.0
!pip install d2l
from mxnet import np,npx,init
from mxnet.gluon import nn
from d2l import mxnet as d2l
npx.set_np()
# GPU
npx.cpu(), npx.gpu(),npx.num_gpus()
(cpu(0), gpu(0), 1)
def vgg_block(num_convs,num_channels):
  blk=nn.Sequential()
  for _ in range(num_convs):
    blk.add(nn.Conv2D(num_channels,kernel_size=3,padding=1,activation='relu'))
  blk.add(nn.MaxPool2D(pool_size=2,strides=2))
  return blk
def vgg(conv_arch):
  net=nn.Sequential()
  for (num_convs,num_channels) in conv_arch:
    # The convolutional part
    net.add(vgg_block(num_convs,num_channels))
  # fully connected_layer
  net.add(nn.Dense(4096,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(4096,activation='relu'),
          nn.Dropout(0.5),
          nn.Dense(10)
          )
  return net
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
net = vgg(conv_arch)

we will construct a single-channel data example with a height and width of 224 to observe the output shape of each layer.

net.initialize(ctx=d2l.try_gpu())
X=np.random.uniform(size=(1,1,224,224))
for blk in net:
  X=X.as_in_ctx(d2l.try_gpu())
  X=blk(X)
  print(blk.name,'output shape:\t',X.shape)
sequential1 output shape:	 (1, 64, 112, 112)
sequential2 output shape:	 (1, 128, 56, 56)
sequential3 output shape:	 (1, 256, 28, 28)
sequential4 output shape:	 (1, 512, 14, 14)
sequential5 output shape:	 (1, 512, 7, 7)
dense0 output shape:	 (1, 4096)
dropout0 output shape:	 (1, 4096)
dense1 output shape:	 (1, 4096)
dropout1 output shape:	 (1, 4096)
dense2 output shape:	 (1, 10)

Since VGG-11 is more computationally-heavy than AlexNet we construct a network with a smaller number of channels. This is more than sufficient for training on Fashion-MNIST.

ratio = 4
small_vgg_arch = [(pair[0],pair[1]//ratio) for pair in conv_arch]
net = vgg(small_vgg_arch)
lr,num_epochs,batch_size=0.005,10,128
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size,resize=224)
d2l.train_ch6(net,train_iter,test_iter,num_epochs,lr,d2l.try_gpu())
Number of Epochs::  10%|█         | 1/10 [01:43<15:32, 103.59s/it]


loss 2.116, train acc 0.282, test acc 0.625


Number of Epochs::  20%|██        | 2/10 [03:24<13:41, 102.70s/it]


loss 0.830, train acc 0.687, test acc 0.797


Number of Epochs::  30%|███       | 3/10 [05:04<11:54, 102.08s/it]


loss 0.618, train acc 0.771, test acc 0.810


Number of Epochs::  40%|████      | 4/10 [06:45<10:09, 101.60s/it]


loss 0.532, train acc 0.805, test acc 0.847


Number of Epochs::  50%|█████     | 5/10 [08:25<08:26, 101.30s/it]


loss 0.482, train acc 0.824, test acc 0.854


Number of Epochs::  60%|██████    | 6/10 [10:06<06:44, 101.08s/it]


loss 0.444, train acc 0.837, test acc 0.861


Number of Epochs::  70%|███████   | 7/10 [11:46<05:02, 100.90s/it]


loss 0.424, train acc 0.845, test acc 0.870


Number of Epochs::  80%|████████  | 8/10 [13:27<03:21, 100.83s/it]


loss 0.403, train acc 0.854, test acc 0.874


Number of Epochs::  90%|█████████ | 9/10 [15:08<01:40, 100.80s/it]


loss 0.385, train acc 0.860, test acc 0.878


Number of Epochs:: 100%|██████████| 10/10 [16:48<00:00, 100.89s/it]


loss 0.369, train acc 0.866, test acc 0.882
import mxnet as mx

Accuracy on test_data

for (x,labels) in test_iter:
  x=x.as_in_ctx(d2l.try_gpu())
  labale=label.as_in_ctx(d2l.try_gpu())
  predicts = net(x)
  ce = mx.metric.Accuracy()
  ce.update(mx.nd.array(list(labels)),mx.nd.array(list(predicts)))
  print(ce.get())
  break
('accuracy', 0.8828125)

Accurcay on train_data

for (x,labels) in train_iter:
  x=x.as_in_ctx(d2l.try_gpu())
  labale=label.as_in_ctx(d2l.try_gpu())
  predicts = net(x)
  ce = mx.metric.Accuracy()
  ce.update(mx.nd.array(list(labels)),mx.nd.array(list(predicts)))
  print(ce.get())
  break
('accuracy', 0.890625)
from sklearn.metrics import confusion_matrix,accuracy_score,f1_score,roc_auc_score
y_pred = np.argmax(predicts,axis=1).asnumpy()
f1_score(y_true, y_pred, average='macro')
0.8913966953276781

Google Collab Notebook

Updated:

Leave a Comment