Why CNN is preferred for Image Data

Why CNN is preferred for Image Data

Avinash R - July 22, 2023

ยท

11 min read

Index

1. Problem Statement
2. Dataset Description
3. Artificial Neural Network Implementation
3.1. Import Libraries & Dependencies
3.2. Import Dataset
3.3. Data Preprocess
3.4. Normalize the Data
3.5. Train Test Split
3.6. Model Training
3.7. Model Evaluation
3.8. Testing
3.9. Save Model & Weights
4. Artificial Neural Network Implementation with Feature extraction
5. Convolutional Neural Network Implementation
6. Results Comparisons
7. Author

1. Problem Statement

In this article, we will be using Artificial Neural Network (ANN), Convolutional Neural Networks (CNN) and ANN with Feature Extraction to perform binary classification of Homer and Bart Images to compare the result and time taken by each method to practically prove why CNN is preferred for image data.

2. Dataset Description

We'll use the image dataset of Homer and Bart in this article. The dataset can be taken from Kaggle. The Simpsons - Bart and Homer Data The dataset consists of Homer and Bart images.

3. Artificial Neural Network Implementation

The implementation code is available in my GitHub account. Repository Link: Homer Vs Bart ANN. We'll go through step by step explanation of the code.

3.1 Import Libraries & Dependencies

Initially, let's import the necessary libraries.

import os
import cv2
import seaborn as sns
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import save_model

3.2 Import Dataset

directory = '../input/homerbart1/homer_bart_1'  #path of extracted dataset (current location)
files = [os.path.join(directory, f) for f in sorted(os.listdir(directory))] #contains paths of each images

directory variable specifies the path of the dataset images folder;

files variable contains the file names of the images;

3.3 Data Preprocessing

height, width = 128, 128   #since all images are of different shapes reshape them to this shape
images =[]  #pixel values of all images
classes = []    #class of all images

Each images are of a different size, so we'll reshape them to equal size - (128, 128). images variable consists of image pixel values;

classes variable denotes the class of each image;

for image_path in files:
    try:    #exception handling for the irrelevant file DSstore that is in the directory (shortly can delete the file)
        image = cv2.imread(image_path)  #read image one by one
        (H, W) = image.shape[:2]    #since this is RGB image need only first two values are height and weight
    except:
        continue

    image = cv2.resize(image, (width, height))  #resize the image to 128,128,3
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #convert to gray scale
    image = image.ravel()       #convert matrix to vector to feed to input layer
    images.append(image)    #store all vector representation of images

    image_name = os.path.basename(os.path.normpath(image_path)) #extract only filename

    if(image_name.startswith('b')): #encoding categorical data 
        class_name = 0  #bart - 0
    else:
        class_name = 1  #homer -1

    classes.append(class_name)  #store all class_names

The above code snippet reads images from the path and performs preprocessing like resizing, converting to gray scale, flatten the array. ravel() is used to flatten the image from 2D array to 1D array. After such preprocessing, the pixel values are appended to the list denoted by the variable image.

Also, labels of each file are extracted based on file names. The images starting with the file name ' b' (denoting 'Bart' images) are labeled as class 0 while the images starting with file name 'h' (denoting 'Homer' images) are labeled as class 1. The classes are appended to the variable classes.

Now, we'll convert images and classes variables to numpy representation for further processing compatibility and for ease of use.

X = np.asarray(images)  #or can use X = np.array(images)
Y = np.asarray(classes) #or can use Y = np.array(classes)

3.4 Normalize the data

scaler = MinMaxScaler()
X = scaler.fit_transform(X)                 #or just divide X by 255

Pixel values are then scaled using a min-max scaler for normalization.

3.5 Train Test Split

Let's split the entire data into training data (80%) and testing data (20%). The model is trained using the training data and the trained model is evaluated using testing data.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state = 1)    #20% data as test data

3.6 Train Model

Now comes the model training, blindly let's specify the number of neurons in the hidden layer as the average of input and output layer units. The input layer gets images of 128x128 values each as a single dimension vector (since we have flattened the 2d array using the ravel() method). Therefore, there are 16384 neurons in the input layer. Also, since we are dealing with binary classification, the output layer consists of 1 neuron.

Model Configurations:

  • Input Layer Units: 16384

  • Output Layer Units: 1

  • Number of Hidden Layers: 3

  • Activation Function (Hidden Layers): relu

  • Activation Function (Output Layer): sigmoid

  • Optimizer: Adam

  • Loss: Binary Cross-Entropy

  • Metric: Accuracy

  • Epochs: 50

network1 = tf.keras.models.Sequential()
network1.add(tf.keras.layers.Dense(input_shape=(X_train.shape), units = 8193, activation='relu'))
network1.add(tf.keras.layers.Dense(units = 8193, activation='relu'))    #2nd layer
network1.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))  #since binary classification sigmoid activation

Then comes model compilation,

network1.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])

Training for 50 epochs...

history = network1.fit(X_train, Y_train, epochs = 50)

3.7 Model Evaluation

Let's visualize the loss degradation concerning epochs.

print(history.history.keys())
plt.plot(history.history['loss']);

3.8 Testing

Let's test our model performance with the test data set.

predictions = network1.predict(X_test)

#convert predictions (probabilities) to classes
predictions = (predictions > 0.5) #threshold = 0.5

#test evaluation
print('Accuracy :', accuracy_score(predictions, Y_test)*100)
cm = confusion_matrix(Y_test, predictions)

Accuracy: 53.70370370370371

Confusion Matrix,

The classification report of the model on the test data is,

print(classification_report(Y_test, predictions))

         precision    recall  f1-score   support

           0       0.64      0.25      0.36        28
           1       0.51      0.85      0.64        26

    accuracy                           0.54        54
   macro avg       0.57      0.55      0.50        54
weighted avg       0.58      0.54      0.49        54

3.9 Save Model & Weights

Let's save the model in .json format and the trained weights in .hdf5 format.

#save json model structure
model_json = network1.to_json()   #convert model to json format
with open('network1.json', 'w') as json_file:
  json_file.write(model_json) #write the json format of model to network1.json file

#save model weights
network1_saved = tf.keras.models.save_model(network1, 'weights1.hdf5')

This is the end of Artificial Neural Networks for binary image classification. The results will be revisited in the upcoming sections for comparison purposes.

4. Artificial Neural Network Implementation with Feature extraction

Let's move directly to the core concept of feature extraction since the overall flow is covered in the previous section. Refer to the GitHub code for full code - ANN + Feature Extraction

The feature extraction considered is based on the colors. Let's take some sample images of Homer and Bart to distinguish them only based on colors. Note: The features are extracted only based on colors and not based on any other factors like shape, size etc...

The following code snippet performs the feature extraction of the classes based on colors. Z in (X, Y, Z) is the channel of the image. Since we are dealing with BGR,

Z=0=> Blue;
Z=1=>Green;
Z=2=> Red.

Therefore, we extract each channel's intensity and compare them with a set of pre-determined colors which are helpful to distinguish Homer and Bart. Here, we have considered the colors of the mouth, pants, and shoes of Homer as the features and colored them yellow to visualize the result. Similarly, we have considered the colors of t-shirts, shorts, and sneakers of Bart as features and colored them with green color to visualize the result.

#go through each pixel one by one to extract colors of each pixel
  for height in range(0,H): #each pixel in height
    for width in range(0, W): #each pixel in width
      blue = image.item(height, width, 0)     #extract value from height,width (location) and 0th channel in BGR which is blue channel
      green = image.item(height, width, 1)     #extract value from height,width (location) and 1st channel in BGR which is green channel
      red = image.item(height, width, 2)     #extract value from height,width (location) and 2nd channel in BGR which is red channel

      #Homer - brown(mouth)
      if(blue>=95 and blue<=140 and green >=160 and green <=185 and red >= 175 and red <= 200):
        image[height,width] = [0,255,255]   #color the pixel with yellow color
        mouth+=1  #incrementing that one pixel has the color of the feature

      #Homer - blue (pants)
      if(blue>=150 and blue<=180 and green >=98 and green <=120 and red >= 0 and red <= 90):
        image[height,width] = [0,255,255]   #color the pixel with yellow color
        pants+=1

      #Homer - gray(shoes)
      if(height > (H/2)):   #since shoes are present only at bottom only considering lower part of image 
        if(blue>=25 and blue<=45 and green >=25 and green <=45 and red >= 25 and red <= 45):
          image[height,width] = [0,255,255]   #color the pixel with yellow color
          shoes+=1

      #Bart - orange (tshirt)
      if(blue>=11 and blue<=22 and green >=85 and green <=105 and red >= 240 and red <= 255):
        image[height,width] = [0,255,0]   #color the pixel with green color
        tshirt+=1

      #Bart - blue (shorts)
      if(blue>=125 and blue<=170 and green >=0 and green <=12 and red >= 0 and red <= 20):
        image[height,width] = [0,255,0]   #color the pixel with green color
        shorts+=1

      #Bart - blue (sneakers)
      if(height > (H/2)):   #since sneakers are present only at bottom only considering lower part of image 
        if(blue>=125 and blue<=170 and green >=0 and green <=12 and red >= 0 and red <= 20):
          image[height,width] = [0,255,0]   #color the pixel with green color
          sneakers+=1

The below code snippet normalizes the feature area with a total number of pixels. Eg: Consider we have 100x200 pixels of an image (20000 pixels) and 2000 of the pixels are detected to have a mouth feature so (2000 / 20000) x 100 = 10 i.e,10% of the image we have pixels to detect the mouth feature

  mouth = round((mouth / (H*W))*100,9)    
  pants = round((pants / (H*W))*100,9)    
  shoes = round((shoes / (H*W))*100,9)    
  tshirt = round((tshirt / (H*W))*100,9)  
  shorts = round((shorts / (H*W))*100,9)  
  sneakers = round((sneakers / (H*W))*100,9)   

#appending features in a image to image_features
  image_features.append(mouth)
  image_features.append(pants)
  image_features.append(shoes)
  image_features.append(tshirt)
  image_features.append(shorts)
  image_features.append(sneakers)

#appending class names to image_features
  image_features.append(class_name)

#appending image_features to features (which contains all features of all images)
  features.append(image_features)

Now, after extracting the features, let's store them as a comma-separated value format.

with open('features.csv', 'w') as file:
  for l in export:
    file.write(l)
file.closed

After saving the extracted features as a CSV file, let's build our ANN model to classify the images.

Loading data and extracting input (X) and label (Y) and train-test-split is given below,

dataset = pd.read_csv('../input/featuresbart-homer/features.csv')
#Extract X and Y
X = dataset.iloc[:,:-1].values
Y = dataset.iloc[:,-1].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 1)

Now comes the model building, Since we have only considered 6 features (i.e., mouth, pants, shoes of Homer and t-shirt, shorts, sneakers of Bart), the input layer only required 6 neurons. Similar model configurations are used here as well.

network1 = tf.keras.models.Sequential()
network1.add(tf.keras.layers.Dense(input_shape = (6,), units = 4, activation='relu'))
network1.add(tf.keras.layers.Dense(units = 4, activation='relu'))
network1.add(tf.keras.layers.Dense(units = 4, activation='relu'))
network1.add(tf.keras.layers.Dense(units = 1, activation='sigmoid'))

The results of the training are as follows,

Accuracy: 87.03703703703704

Classification Report:

 precision    recall  f1-score   support

           0       0.96      0.79      0.86        28
           1       0.81      0.96      0.88        26

    accuracy                           0.87        54
   macro avg       0.88      0.87      0.87        54
weighted avg       0.88      0.87      0.87        54

Confusion Matrix:

This is the end of Artificial Neural Network + Feature Extraction for binary image classification. We can see that accuracy has risen!!!.... Let's now dive into Convolutional Neural Networks to see how will be the results.

5. Convolutional Neural Network Implementation

Only core concepts are discussed here to avoid redundancy. Check out this GitHub link for the entire code: Homer Vs Bart - CNN

Let's use the ImageDataGenerator to load our images. The augmentation implemented are scaling, rotation, horizontal flips and zooming.

train_generator = ImageDataGenerator(rescale = 1./255,  #generate train images
                                     rotation_range=7,
                                     horizontal_flip=True,
                                     zoom_range=0.2)

train_dataset = train_generator.flow_from_directory('../input/homerbart2/homer_bart_2/training_set',
                                                    target_size=(64, 64),
                                                    batch_size = 8,
                                                    class_mode='categorical',
                                                    shuffle=True)
test_generator = ImageDataGenerator(rescale = 1./255)

test_dataset = test_generator.flow_from_directory('../input/homerbart2/homer_bart_2/test_set', 
                                                  target_size=(64,64),
                                                  batch_size=1,
                                                  class_mode='categorical',
                                                  shuffle=False)

Let's now build our CNN model for the same classification task,

model = Sequential()

model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu', input_shape=(64,64,3)))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(2,2))

model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(2,2))

model.add(Flatten())    #1152 outputs
#(1152 + 2)/2 = 577
model.add(Dense(units=577, activation='relu'))
model.add(Dense(units=577, activation='relu'))
model.add(Dense(units=2, activation='softmax'))

Model Summary,

model.summary()
tf.keras.utils.plot_model(model, to_file='Model.png', show_shapes=True)

Visualizing the training results,

plt.plot(model_history.history['loss'])
plt.plot(model_history.history['accuracy'])
plt.grid(True)
plt.legend(['Loss','Accuracy'])
plt.title('Training')
plt.savefig('Training.jpg')

Evaluation of the test data set.

predictions = model.predict(test_dataset)

#since we used categorical model(not binary) we get probabilities for an image to be in each class
#so we consider that the image belogs to the class with higher probability
#print(predictions)
predictions = np.argmax(predictions, axis=1)

#print(test_dataset.classes) #expected classes
#print('\n',predictions) #predicted classes

print('Accuracy :',accuracy_score(predictions,test_dataset.classes)*100)    #accuracy
cm = confusion_matrix(test_dataset.classes, predictions)
sns.heatmap(cm, annot=True)
print(classification_report(test_dataset.classes, predictions))

Accuracy: 92.5925925925926

Classification Report:

 precision    recall  f1-score   support

           0       0.88      1.00      0.93        28
           1       1.00      0.85      0.92        26

    accuracy                           0.93        54
   macro avg       0.94      0.92      0.93        54
weighted avg       0.94      0.93      0.93        54

Confusion Matrix

This is the end of the Convolutional Neural Network for binary image classification. We can see that accuracy has risen a lot!!!.... Let's now compare the results and come to a conclusion.

6. Results Comparison

To summarize, this blog considered the Homer - Bart image dataset for the classification task. The dataset is classified by using Artificial Neural Networks, Artificial Neural Networks with Feature Extraction and Convolutional Neural Networks. We have seen (so far) that accuracy has risen. This section compares factors other than accuracy scores for image classification and concludes.

Implementation / Factors

Training Time

Weight Storage

Accuracy

ANN

Very High

Around 25 GB

53.70370370370371

ANN+Feature Extraction

Lesser than ANN

Few KBs

87.03703703703704

CNN

Very Less

Few KBs

92.5925925925926

From the table above, it can be seen that in terms of Training Time, Space occupied by trained weights and accuracy, Convolutional Neural Networks have the bigger hand. Therefore, this proves why CNNs are considered to be the best while dealing with image datasets.

7. Author

You can find the code on my Github: Homer-Bart-Classification-ANN-Feature-Extraction-CNN-Transfer-Learning


About the Author :

Hiii, I'm Avinash, pursuing a Bachelor of Engineering in Computer Science and Engineering from Mepco Schlenk Engineering College, Sivakasi. I'm an AI enthusiast and Open-Source contributor.

Connect me through :

Feel free to correct me !! :)
Thank you folks for reading. Happy Learning !!! ๐Ÿ˜Š

Did you find this article valuable?

Support Avinash by becoming a sponsor. Any amount is appreciated!

ย