Machice Problem 6 CS 415

Computer Vision I
Machice Problem 6
CS 415
Nikolaos Agadakos
Figure 1: An FMNIST Sample
1 Overview
In this exercise we will explore the use of Deep Neural Networks (DNN’s) and their practises
in Computer Vision. DNN’s having undergone massive exploration in the past decade, have
proven to be powerful tools in the discipline of Computer Vision, especially in the tasks of
image classification. In this problem set we will train and evaluate a DNN to classify images
from a widely available and popular dataset, Fashion MNIST. WE will go through the basics
of data handling, network training, evaluation and testing, while briefly covering the subjects
of over fitting and network specificity. Detailed instructions and descriptions, for each topic,
will be provided, for each topic below.
1.1 Dataset Description
Fashion-MNIST is a dataset of Zalando’s article images|consisting of a training set of 60,000
examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image,
associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct
drop-in replacement for the original MNIST dataset for benchmarking machine learning
algorithms. It shares the same image size and structure of training and testing splits and
is often seen as an alternative to the classic MNIST, by provide a slightly more challenging
classification task. A sample of the dataset can be seen in figure 1.
Computer Vision I
Machice Problem 6
CS 415
Nikolaos Agadakos

Class Label Item Description

1 T-shirt/top
2 Trouser
3 Pullover
4 Dress
5 Coat
6 Sandal
7 Shirt
8 Sneaker
9 Bag
10 Ankle Boot
Table 1: Fashion MNIST Classes and thier associated labels.
1.1.1 Content
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total.
Each pixel has a single pixel-value associated with it, indicating the lightness or darkness
of that pixel, with higher numbers meaning darker. This pixel-value is an integer between
0 and 255. The training and test data sets have 785 columns. The first column consists of
the class labels (see above), and represents the article of clothing. The rest of the columns
contain the pixel-values of the associated image.
Each training and test example is assigned to one of the following labels:
Each row is a separate image Column 1 is the class label. Remaining columns are pixel
numbers (784 total). Each value is the darkness of the pixel (1 to 255)
1.2 Problem Statement
The objective for this problem is to train a network that can classify an unknown sample,
from the test set of the available data, in one of the 10 classes defines in table 1. We will
use the labels in column one for identification and we will evaluate the performance of the
implemented network on the popular metrics of Accuracy, Precision and Recall. We will be
gradually building an increasing powerful network and observe the ramifications of increasing
network depth on performance!
1.3 DNN Framework. Requirements and installation.
This document will cover the installation and use of PyTorch[Pas+17] for the purposes of
this assignment, although you are not limited this specific framework for solving the tasks in
section 1.3.1.
PyTorch is a rapidly growing deep learning framework developed by Facebook. It is very pop-
Computer Vision I
Machice Problem 6
CS 415
Nikolaos Agadakos
1 Numpy
2 Pip or Conda
3 Pytorch
4 PIL (imaging optional)
Table 2: Software Requirements for MP
ular in the academic community for its descriptive documentation, unrestricted use, intuitive
design and Pythonic use! It is very commonly used for prototyping and rapid evaluation.
PyTorch offer a wide range of implemented Tools for managing, training and deploying networks, that work seamlessly and a very active broad community that contribute regularly,
and provide answers and support to the myriads of problems arising during development;
tutorials and resource are widely available. With frameworks such as PyTorch you do not
need to implement the essential algorithms for DNN training such as Stochastic Gradient
Descent (SGD) and Backpropagation; they are already implemented for you. A developer’s
task is to to mostly manage the data, define the network architecture and handle/monitor
the training process.
1.3.1 Requirements
PyTorch can run in MacOS, Linux and Windows environments. The site contains information
on how to install the framework for each environment. GPU is not required, although it will
significantly increase the speed of training and testing; from a few tens of minutes down to
a handful of minutes. Access to basic GPU resources can be gained through the Amazon
Web Services (AWS), as a student. Amazon provides certain free credits to all students, per
year, to use for cloud computing as part of their dedication to education and training, and,
UIC students get an even larger amount due to direct collaboration! Instructions on how to
create an account can be found in the official site and here.
1.3.2 Data Downlading
PyTorch, among its many strengths, comes equipped with a wide assortment for tools specifically for Computer Vision! One of those tools is data downloading machinery. To get the
data required for this assignment we only need to import it with the following commands:
Algorithm 1: Dataloading and Preprocessing
1 # Define a transformation to Normalize the data.
2 tr ans = tTrans . Compose ( [ tTrans . ToTensor ( ) ,
3 tTrans . Normalize ( ( mean , ) , ( std , ) ) ] )
4 # Load data set
5 mnistTrains et = tdata . FashionMNIST ( root= ’ . / data ’ , t r a i n=True ,
6 download=True , transform=transform )
Computer Vision I
Machice Problem 6
CS 415
Nikolaos Agadakos
7 mnistT ests et = tdata . FashionMNIST ( root= ’ . / data ’ , t r a i n=False ,
8 download=True , transform=transform )
10 # Once we have a dataset , torch.utils has a very nice library for iterating
11 # on that dataset , with shuffle AND batch loading implemented. Very useful
12 # in larger datasets , generally batch sizes are taken in the 8, 16, 32 or maybe
13 # 64 range. They why comes from Optimization theory.
14 trainLoad er = tor ch . u t i l s . data . DataLoader ( mnistTrains et , b a t c h s i z e = batch ,
15 ∗∗comArgs )
16 t estLoad er = tor ch . u t i l s . data . DataLoader ( mnistT ests et , b a t c h s i z e = 10∗ batch ,
17 ∗∗comArgs )
18 # End of DataLoading ——————-
Where in the above snippet, we define a transform object, in line 2, to tell the data loader
object called in line 5, to normalize the data to 0-1 value range. This is a standard practise
in machine learning, as it can make training of systems that have different features easier.
This is because several features might have widely different value ranges, which can lead
to numerical problems such as exploding or vanishing gradients, overflowing numbers and
domination of smaller valued features by larger ones (think of weight in pounds as a feature
in contrast to height in meters…). Lines 14 and 16 create a loader object with PyTorchniftily
uses to do all the data splitting into batches, shuffle the batches in a random order and acting
as an iterator so we can easily access our data in a for loop! That is all we need to do to
begin working on our data!!
2 Tasks
Use the provided template file to get an idea on how a basic PyTorch training file looks.
There is extensive commentary within the file, make sure to read it carefully. You can use
that file to gradually build your network to ask the tasks presented below. Also, make sure
to read the online documentation for any PyTorch toolkit or library used, to get an elevated
understanding of how the framework works.
2.1 Towards a Deeper architecture
We will explore how building a gradually deeper architecture impacts performance! In general, for all the tasks, the final layer, hereafter referred to as “classification layer”, will be
a dense layer 1 that accepts as input the output of the last hidden layer and is output is
the number of classes required for our problem. Finally, for our purposes performance is
defined as accuracy and per-epoch train and test time, for all tasks, except the final one,
where you also need to report per class Precision and recall. Note that figuring out the
dimensions at each layer’s input and output is part of the assignment.
1A dense layer is a fully-connected layer.
Computer Vision I
Machice Problem 6
CS 415
Nikolaos Agadakos
2.1.1 Basic network
Build a network with a single hidden dense layer. You are free to choose the output dimension
of this layer. Note that you will need to flatten the 2D input to 1D before you pass it through
the layer. Report performance.
2.1.2 Expanded Basic network
Build a network with 2 hidden dense layers. Report performance. Build a network with 3
hidden dense layers. Report performance.
2.2 Block Networks
A common practise in DNN’s is to create compound layer or block layers. These are defined
as a combination of basic layers foe example a block layer can be a linear layer, followed by
a convolutional layer and then a drop-off layer. Many authors label these compound layers
so as to discern them and for easy of writing. Block layers are usually repeated multiple
times and data are “fed” through them. The actual determination of what layers should be
bundled together and what the parameters are is specific to the problem at hand. for our
purposes we will consider the following block:
1. Convolutional Layer
2. Downsampling Layer
3. Max-Pool Layer
The block layer itself produces a 2D output. The simplest way to classify is to “unroll” or
“flatten” the 2D output to 1D and perform a final round of feature learning in 1D space and
then classify. To do this we need the following “block-linear” layer:
1. Dense Layer (Flatten 2D to 1D)
2. Dense Layer (Compression of long 1D flattened layer to smaller dimension)
Finally, recall that to classify, we would need a dense layer placed at the very end of our
network, (our output layer) and that its output size is the number of classes we need to be
able to distinguish. Hopefully, the network can learn weights that will enable it to assign the
proper class to each test sample!
Computer Vision I
Machice Problem 6
CS 415
Nikolaos Agadakos
2.2.1 Intuition behind layers
On a intuitive level, remark on the reason behind how such a block configuration would be
useful to use in image classification tasks, i.e how would you expect a Convolutional layer
could learn features important for classifying an image? How could downsampling help us?
What is the benefit of max-pooling?
2.2.2 Simple Block Network
Build a network with a single block layer followed by a single dense layer. Report performance.
2.2.3 Extended Block Network
Build a network with two block layers followed by a single dense layer. Report performance.
2.2.4 Full Network
Build a network with two block layers followed by a block-linear layer. Report performance
(recall that here we need overall accuracy, per class Precision and Recall).
2.2.5 Bonus: Robust Full Network
Build a network with two block layers followed by a block-linear layer. Add a dropout layer
in your block layers right after the downsampling layer. Report performance. (recall that
here we need overall accuracy, per class Precision and Recall). Explain, briefly why would
such a layer help in the train or testing of our network. What do you observe in comparison
to the network version with no drop-out?

[Pas+17] Adam Paszke et al. Automatic Differentiation in PyTorch”. In: NIPS Autodiff
Workshop. 2017.
FMNIST sample. https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna content/uploads/2019/02/Plot- of- a- Subset- of- Images


Posted in Uncategorized

Leave a Reply