Applying ANN | Digit and Fashion MNIST

Published in

Analytics Vidhya

11 min readOct 27, 2020

Introduction

The deep learning have open the new era to increase the computational power of processing the dataset. Also it has paved way to process unstructured data such as image, audio and video data which normal machine learning models takes longer hours to train. With the help of neural networks and back propogation we can minimize the loss in our prediction and be more accurate.

What is Deep Learning ?

Deep learning is a part of Machine Learning where the model learns with the help of deep neural networks which resembles human brain. The complex data problems can be solved with the help of Deep learning soon.

Image Classification

If you give a human apple, he/she easily identifies that apple as a red fruit and names it ‘Apple’. Internally the person’s brain captures the image of apple and compares it with the historical images(training data) which the person has seen before and someone telling him apple(label). With the help of this data and label his brain has trained to classify any fruits given to him. The same can be fed to trained neural networks and it shall output us what is the name of the image.

Acknowledgements

Keras Documentation
A Gentle Introduction to ANN- Naresh Bhat
Comprehensive Guide to ANN with Keras- Prashant Banerjee

Problem statement

Here we work on two datasets , one MNIST (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems and the other one is fashion MNIST which is also a large database on clothing apparels.Our work is to built an efficient deep learning model to identify handwritten digits and clothing accessories.

Objectives of Project

The Objective of project involves-

Exploratory Data Analysis of MNIST
Data preprocessing
Building Deep learning model(ANN)
Evaluation of model

Note: This is my first deep learning project and I have considered Fashion MNIST and Digit MNIST dataset for practise.

Importing Libraries

Loading both datasets

Let’s welcome our MNIST datasets

How does the image dataset look like?

You might think of dataset with only images in the train and test dataset, But computer doesn’t understand images but numbers. Let’s see how does it look

fashion_train.head()

In fashion mnist dataset, the label number doesn’t mean the number itself but the id for the clothing accessory.We can get that image from the pixedl values given in the record. Each pixel values vary between 0 to 255. The higher intensity value(255) it resembles a color and lower intensity value(0) is white. There are many shades in between.

Train test split-Fashion MNIST

We need to split our fashion MNIST dataset into input and label data. Our MNIST has already been extracted as X-y train test data, so no need to split that

X_train_fashion = fashion_train.drop('label',axis = 1)
y_train_fashion = fashion_train['label']
X_test_fashion = fashion_test.drop('label',axis = 1)
y_test_fashion = fashion_test['label']

Exploratory Data Analysis

Visualizing the numbers

Let’s take a look at the images in each dataset. But before splitting in our fashion dataset the data format is in a dataframe and in such cases we can’t view the images,so before that we are reshaping our dataframe and make them into array to get each images in the dataset

Here, we can see the different types of clothing accessories for both men and women in fashion mnist

#Names of numbers in the dataset in order
col_names = ['Zero','One','Two','Three','Four','Five','Six','Seven','Eight','Nine']

#Visualizing the digits
plt.figure(figsize=(10,10))
for i in range(15):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(X_train_digit[i], cmap='gray')
    plt.xlabel(col_names[y_train_digit[i]])
plt.show()

Here, we can see the handwritten digits from the mnist dataset. If you notice all the handwritten records are different from each other which makes it challenging for the computer to predict, but neural network does it with ease

Pixel intensity of images

We know the RGB will have values between 0 to 255 where 0 being the lowest intensity(black) and 255 being the highest(white). Let’s check out the pixel intensity of each pixel with a help of amazing function taken from Naresh Bhat’s notebook

#Visualizing for digit MNIST
fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
visualize_input(X_train_digit[1], ax)
plt.show()

We took the ‘0’ image and you can find all the highest intensity pixel ranging around 220–255 have bright colors and rest (green) have 0 intensity

#Visualizing for Fashion MNIST
fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
visualize_input(x_train_reshape[1], ax)
plt.show()

We took the shoe image and you can find all the highest intensity pixel ranging around 220–255 have bright colors and rest (green) have 0 intensity . Here there are also dull intensity pixels inside the object and it also has been captured

Count of labels in MNIST

#Setting plot size
sns.set(rc={'figure.figsize':(11.7,8.27)})

#Getting dataframe data
mnist=pd.read_csv("../input/digit-recognizer/train.csv")

#Countplot
ax = sns.countplot(x="label", data=mnist,
                   facecolor=(0, 0, 0, 0),
                   linewidth=5,
                   edgecolor=sns.color_palette("dark", 3),
                   order = mnist['label'].value_counts().index)

Insights:

The label of the MNIST dataset are well balanced.
The highest number of label is ‘1’ followed by ‘7’ and ‘3’.

There’s no need to analyse fashion MNIST because the data has exactly 6000 records for each labels

Data Processing

Reshaping Digit MNIST

The shape of digit MNIST is extracted in 2D data which can’t be fed to a neural network as it allows only 1D data, so we convert them with the help of reshape function. Lets confirm by checking the dimensions of training data.

Here the dimensions are showing as 3, where the first belongs to records followed by the 2D data. Let’s check the shape.

From the train shape we can see that we have (6000,28,28). Here 6000 is the number of records we have and 28X28 is the dimension of 2D data. Now it can be represented as 784(28X28) which is a 1 dimensional data. By converting into 1D we can feed the data to neural network for training. Now using reshape function,

X_train_digit = X_train_digit.reshape(60000, 784)
X_test_digit = X_test_digit.reshape(10000, 784)

Encoding the labels

Our MNIST datasets have 10 classes each in both the datasets. Now, let’s encode the label classes in the dataset with the help of to_categorical() function from Keras utils library. If the label is ‘5’ it will encode to one in the fifth position of vector and so on for all the class labels. Let’s see it visually

From the result we can see that the number 2 is in activated in the second position of vector and thus it is encoded. This is similar to one hot encoding.

Building Deep Learning model- Artificial Neural Network

Artificial Neural network resembles the brain’s neural network with densely connected neurons in between input and output layers. It has a hidden layers where the internal processing happens in ANN. The neural network’s objective is to minimize the loss(actual-predicted) by using the learning method called as back propagation where the weights get re-initialized in each connecting layer for many epochs through which the loss is minimized.

First lets build and compile an ANN model without any hyperparameter tuning and then we apply hyperparameter tuning to understand how the model accuracy improves

In this stage we follow 3 steps

Defining the model
Compile the model with loss function
Fitting the model to our data

Defining the model

To define the model we need the Sequential() function which helps us to build the base neural network on that we have to decide the dense layers and neurons.

We have used relu activation function for the hidden layers and sigmoid for the output layer
Since we didn’t normalize our dataset we are using BatchNormalization() function to normalize in the neural network
We are also considering the drop out layer in each hidden layers to reduce the chances of overfitting

Compiling the model

The base model of neural network is ready. It’s time to connect the brain for the neural network. In this part we tell the neural network on how to learn the model where we signify the type of loss function and which optimizer and metrics to use.

Optimizer: Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data. It doesn’t take a constant learning rate like SGD, it adapts in changing learning rate in each cycle
Loss function: Categorical crossentropy is a loss function that is used in multi-class classification tasks. These are tasks where an example can only belong to one out of many possible categories, and the model must decide which one.Formally, it is designed to quantify the difference between two probability distributions.
Metrics: Accuracy is the ratio of number of correct predictions to the total number of input samples.

#Compiling the model
model.compile(loss="categorical_crossentropy",
              optimizer="adam",
              metrics = ['accuracy'])

Fitting the model

Now, its time to train our neural network. Since we didn’t use any hyperparameter training here. I’m just giving value for batch size and epochs myself which performed better in this case.

model.fit(X_train_digit, y_train_digit, batch_size=100, epochs=30)

From the training results we can see that after just 30 epochs the accuracy has gained to approximately 95% and the loss value is 0.19 which is very good. We can expect a much better accuracy with hyperparameter tuning

For the fashion MNIST dataset we got the accuracy of 83% and the loss is around 0.5 which is good but can be made better with hyperparameter tuning or CNN network. Let’s evaluate both the model with test data

Evaluation of Model

Test accuracy

Now it’s time to check how our model performs when it gets unseen data. We have evaluate() function from keras for evaluating our trained model, let’s use that to get the test accuracy of both the datasets

From the results we can see that digit MNIST data has performed better(97%) on test data compared to the fashion MNIST data(87%). But the situation may change after using hyperparameter tuning. Let’s display the confusion matrix

Confusion Matrix

Let’s see how many labels where classified right and how many were misclassified

#Confusion matrix for Fashion MNIST
con_mat=confusion_matrix(y_test_fash_eval,y_predict_fash)
plt.style.use('seaborn-deep')
plt.figure(figsize=(10,10))
sns.heatmap(con_mat,annot=True,annot_kws={'size': 15},linewidths=0.5,fmt="d",cmap="gray")
plt.title('True or False predicted Fashion MNIST\n',fontweight='bold',fontsize=15)
plt.show()

From the results we can see that both the matrix show a positive result of having most of the labels classified right and there were very few labels which has been misclassified(numbers which are not in the diagonal part).

If you notice digit mnist model has done an impeccable work compared to fashion mnist. In fashion mnist label 6 product was classified incorrectly when compared to the rest.

Let’s now try hyperparameter tuning and see whether it gets improved.

Hyperparameter tuning in ANN

Hyperparameters are the variables which determines the network structure(Eg: Number of Hidden Units) and the variables which determine how the network is trained(Eg: Learning Rate).Hyperparameters are set before training(before optimizing the weights and bias). We have a lot of parameters when it comes to

Number of layers
Number of neurons in each layers
Batch Size
Epochs
Optimizer
Loss function
Activation

Note : I’m not covering hyperparameter tuning for all the params since it will have high computation and it requires a good device with better workstation to run it smoothly.

Digit MNIST- Hyperparameter tuning

Now, lets set hyperparameter for Digit MNIST model and repeat the same for Fashion MNIST

#Fitting the params with the training data to figure out the best params and accuracy score
grid_result = grid.fit(X_train_digit, y_train_digit)

print(grid_result.best_score_,grid_result.best_params_)

We have got the best score as 91% and params where we select sigmoid as activation function with 256 batch size and 40,20 hidden layers

Evaluation of model after Hyperparameter tuning

We have got a pretty good result. let’s evaluate the tuned model with our test data

#Predicting from the params we got from grid search cv
pred_y = grid.predict(X_test_digit)

y_test_digit=np.argmax(y_test_digit, axis=1)

#Confusion matrix
con_mat=confusion_matrix(y_test_digit,pred_y)
plt.style.use('seaborn-deep')
plt.figure(figsize=(10,10))
sns.heatmap(con_mat,annot=True,annot_kws={'size': 15},linewidths=0.5,fmt="d",cmap="gray")
plt.title('True or False predicted digit MNIST\n',fontweight='bold',fontsize=15)
plt.show()

We have got 91% accuracy which is quite poor when compared to the one without hyperparamter tuning. It could be due to the fact that I haven't dropout layers here.

Fashion MNIST- Hyperparameter tuning

It’s time to tune our fashion MNIST model

def create_model_fash(layers, activation):
    model = Sequential()
    for i, nodes in enumerate(layers):
        if i==0:
            model.add(Dense(nodes,input_dim=X_train_fashion.shape[1]))
            model.add(Activation(activation))
            model.add(Dropout(0.3))
        else:
            model.add(Dense(nodes))
            model.add(Activation(activation))
            model.add(Dropout(0.3))
            
    model.add(Dense(units = 10, kernel_initializer= 'glorot_uniform', activation = 'softmax')) 
    
    model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
    return model#Using Keras classifier to apply the function
model4 = KerasClassifier(build_fn=create_model_fash, verbose=0)

#Tuning the layers, activation function and batch sizes
layers = [(20,), (40, 20), (45, 30, 15)]
activations = ['sigmoid', 'relu','softmax']
param_grid = dict(layers=layers, activation=activations, batch_size = [128, 256], epochs=[30])

#Using GridSearchCV to fit the param dictionary
grid = GridSearchCV(estimator=model4, param_grid=param_grid,cv=5)

We have got only 78% on accuracy which is poor compared to the one without hyperparameter tuning. There is only one hidden layer which was chosen with the sigmoid activation and there is no drop out layer here as well.

#Predicting from the params we got from grid search cv
pred_y = grid.predict(X_test_fashion)

y_test_fashion=np.argmax(y_test_fashion, axis=1)

#Confusion matrix
con_mat=confusion_matrix(y_test_fashion,pred_y)
plt.style.use('seaborn-deep')
plt.figure(figsize=(10,10))
sns.heatmap(con_mat,annot=True,annot_kws={'size': 15},linewidths=0.5,fmt="d",cmap="gray")
plt.title('True or False predicted fashion MNIST\n',fontweight='bold',fontsize=15)
plt.show()

We have got a 78% test accuracy from Fashion MNIST model after tuning . You can notice that label 6 and 4 products have been misclassified alot.You can try by tuning with more adding more in parameter dictionary and get better results than this

Conclusion

We have arrived at the conclusion where I would like to recap what we did in our project. We took two popular MNIST datasets and preprocessed it. We later fed into Artificial Neural network by creating one. Also we performed hyperparameter tuning in ANN and got the best accuracy. Before signing off, I would like to point out the stuff which can be done further

More parameters can be tuned
Hyperparameter tuning can also be done on selecting the number of neurons
Can perform techniques such as early stopping and batch normalization

I thank everyone for reading the whole article. If you have any critical feedback or suggestions to my work, please drop them in the comments.

Find my other articles here