JustToThePoint English Website Version
JustToThePoint en español
JustToThePoint in Thai

Neural Networks: Image Classification with TensorFlow

Neural Networks

The classic recognition program is something that goes like this. Suppose that you have a bunch of small square 28×28-pixel grayscale images of handwritten single digits between 0 and 9. The task is to classify a new image of a handwritten digit into one of 10 classes, each class representing one digit from 0 to 9.

We will use a neural network. A neuron is a function that takes in all the outputs of all the neurons in the previous layer and spits out a number. The network starts with a group of neurons corresponding to each and every pixel of the 28*28 pixels of the input image. The number inside the neuron is called its activation. The last layer has ten neurons, each one of them representing one of the digits. The activation in these neurons represents how much the system thinks that a given number corresponds with a given digit. There are also some layers in between called the hidden layers.

We assign a weight to each of the connections between neurons. These weights “wi” are multiplied by the input value “ai”, that is, w1a1 + w2a2 + … + wnan and get an output. However, we want a number between 0 and 1, so we use an “activation” function like Sigmoid, σ(x) = 11 + e-x or ReLU (Rectified linear unit) that normalizes results between [0, 1]: σ(w1a1 + w2a2 + … + wnan).

Neurons also have biases which control how easy it is to get neurons to output or fire: σ(w1a1 + w2a2 + … + wnan - bias). In other words, they adjust the output along with the weighted sum of the inputs to the neuron. Written in matricial form: a(1) = σ(Wa(0) + b). Data starts at the input layer and is transformed as it passes through subsequent layers.

So learning is about “tuning” or “finding” the right combination of weights and biases. We have a cost function: Cost Function (weights,biases) = (total number of images incorrectly identified) ÷ (Total number of images). In practice, the formula is the mean squared error, MSE = 1⁄n∑(Ŷi-Yi)2.

Now, we have a clear objective: find weights and biases which minimize the cost function. Easier said than done, how do you we find them? One way would be to just randomly pick them, run our neural network over our entire training data and calculate the cost function, and maybe one of the tries we will get lucky and find a low enough value of the cost function that we consider good enough. However, we have a more mathematical and sophisticated approach to resolve this problem: the ‘Gradient Descent’ algorithm.

The first time it runs, it picks up random weights and biases, but in subsequent iterations, it doesn’t choose randomly but in a calculated manner to minimize the cost function more and more (it takes the cost function’s derivate in respect to the summation function for the given neuron and moves in the direction where it is decreasing). It keeps iterating until it finds the minimum value of the cost function. At this very point, our neural network has already been ‘trained’ and it is ready to go, it could start “predicting” or classifying pictures that it hasn’t seen before.

A network may have three types of layers: input layers that take raw input from the training data, hidden layers that take input from previous layers and pass output to other layers, and output layers that make a prediction. All hidden layers typically use the same activation function. We will use Rectified Linear Activation (ReLU) as our activation function and a standard feed-forward neural network.

Feedforward neural networks are artificial neural networks where information only travels forward in the network (no loops), first through the input nodes, then through the hidden nodes (if present), and finally through the output nodes.

Image Classification

We are going to use TensorFlow, an end-to-end open-source platform for machine learning. This entry is based on the article, Basic classification: Classify images of clothing, included in the TensorFlow documentation.

We will use TensorFlow and tf.keras, a high-level API to build and train models in TensorFlow.

import tensorflow as tf

# Numpy is the core library for scientific computing in Python.
import numpy as np
import matplotlib.pyplot as plt

Import and Preprocess the Data

We are going to use the Fashion MNIST database, too. It contains 70,000 grayscale images in 10 categories. These images show individual articles of clothing at a very low resolution (28 by 28 pixels). Each image is associated with a label from 10 classes: 0 (T-shirt/top), 1 (Trouser),… 9 (Ankle boot).

class BasicClasification:
    def __init__(self):
        self.fashion_mnist = tf.keras.datasets.fashion_mnist # Import and load the Fashion MNIST database directly from TensorFlow.
        (self.train_images, self.train_labels), (self.test_images, 
                                                 self.test_labels) = self.fashion_mnist.load_data()
        [... to be continued...]

It returns four NumPy arrays: train_images, train_labels are the data the model is going to use to learn (our training data). The model is going to be tested against the test_images and test.labels NumPy arrays. Let’s explore the dataset before going deeper into the modelling and training.

user@pc:~$ python Python 3.9.5 (default, May 11 2021, 08:20:37) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. 
>>> import tensorflow as tf 
>>> fashion_mnist = tf.keras.datasets.fashion_mnist 
>>> (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() 
>>> print(train_images.shape) 
(60000, 28, 28) # There are 60,000 images in the training set. Each image is made up of 28 x 28 pixels (748). 
>>> print(test_images.shape) 
(10000, 28, 28) # There are 10,000 images in the test set. 
>>> train_images[0, 7, 22] # We are having a look at a single pixel of an image. 122 
>>> import matplotlib.pyplot as plt # Let's display the first image in the training set. We can observe that all pixels are between 0 (black) and 255 (white). 
>>> plt.figure() 
>>> plt.imshow(train_images[0]) 
>>> plt.colorbar() # Colorbars are a visualization of the mapping from scalar values to colors. 
>>> plt.grid(False) 
>>> plt.show()
Image Classification with TensorFlow

Image Classification with TensorFlow

>>> print(train_labels) [9 0 0 ... 3 0 5] # The labels are integers ranging from 0 to 9. Each integer represents a specific article of clothing.
        [... to be continued...]
        self.class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress',
                            'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] # We are going to create a property in our class to indicate the correspondence between the label values (integers ranging from 0 to 9) and their names. These names are completely irrelevant to the Neural Network.
        self.train_images = self.train_images / 255.0 # Our data needs to be preprocessed before creating the model and training the network. So, we are going to scale all greyscales pixel values (0, 255) to a range of 0 to 1 by diving them by 255. 
        self.test_images = self.test_images / 255.0 # The method display_training is used to display images from the training set and verify that the data is in the correct format before feeding it to the model.
        [... to be continued 2...]

Creating a model

We are going to use a keras sequential model, that is, a linear stack of layers with three different layers. It represents a feed-forward neural network.

        [... to be continued 2...]
        self.model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dense(10, activation='softmax')
        ])
        [... to be continued 3...]

The first layer in this network is the input layer. We use a flatten layer, tf.keras.layers.Flatten, with an input shape of (28,28). It transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels), so that each pixel will be associated with one neuron.

After the pixels are flattened, the network consists of a sequence of two tf.keras.layers.Dense layers. These are densely or fully connected, neural layers. The first Dense layer is the only hidden layer. It has 128 nodes (or neurons) and uses Relu as the activation function.

Relu is an activation function that is defined as this: relu(x) = { 0 if x<0, x if x > 0}. An activation function outputs a small value for small inputs, and a larger value if its inputs exceed a threshold. If the inputs are large enough, the activation function “fires”, otherwise it does nothing. In other words, an activation function is like a gate that checks that an incoming value is greater than a critical number. They add non-linearities into neural networks.

The last or output layer has ten neurons. The activation function softmax is used on this layer because softmax “squishes” the inputs so that sum(input) = 1, and it does the mapping by interpreting the inputs as log-probabilities (logits, it means representing probabilities on a logarithmic scale, instead of the standard [0,1] unit interval) and then converting them into probabilities between 0 and 1, so each neuron in this layer represents the probability of a given image belonging to one of the ten different classes.

Compile the model

Before the model is ready for training, it still needs a few more settings where we are going to define:

        [... to be continued 3...]
        self.model.compile(optimizer='adam', loss = tf.keras.losses.SparseCategoricalCrossentropy( from_logits=True ), metrics=['accuracy'])
        self.isTrained = False # This flags indicates that the model is not trained yet, this is the last instruction of the class constructor.

Training the model

Training the model is as simple as a line of code. We need to feed the training data into the model so it can learn to associate all the images with their respective labels.

Training a model is an iterative process; in each iteration (called an epoch) the model makes a guess about the output, calculates the error in its guess (loss), computes the derivatives of the error with respect to its parameters (back propagation), then parameters (model weights and biases) are adjusted according to the gradient of the loss function, and finally, it updates these parameters.

    def training(self):
        self.model.fit(self.train_images, self.train_labels, epochs=9)
        self.isTrained = True

[…] As the model trains and iterates, the loss and accuracy metrics are being displayed in the terminal.
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2488 - accuracy: 0.9076

It reaches an accuracy of 90.76% on the training data.

Evaluating the model

Next, we test or evaluate the model, that is, compare how the model generalizes or performs over the test database. We use another built-in method from keras, model.evaluate.

    def evaluate_model(self):
        if self.isTrained == False: # This method requires that the model has been trained.
            self.training()
        test_loss, test_acc = self.model.evaluate(self.test_images, self.test_labels, verbose=2)
        print('\nTest accuracy:', test_acc)

Test accuracy: 0.8852999806404114
It is likely that the accuracy of the test set is lower than that of the training set. This difference is referred to as overfitting, but this is fortunately not the case.

Overfitting occurs when a model fits exactly against its training data. When the model memorizes the noise and fits too closely or exactly to our training set, the model becomes “overfitted,” and it is unable to generalize well to new data.

Making predictions

With the model already trained and evaluated, it is time to use it to make predictions on images. We are going to create a “predict” method in our class for that very purpose (Yes, I know, in the future people will write songs about my creativity :)).

    def predict(self, image):
        if self.isTrained == False: # This method requires that the model has been trained.
            self.training()

        prediction = self.model.predict(np.array([self.test_images[image]]))
        predicted_class = self.class_names[np.argmax(prediction)] # np.argmax returns the index with the highest values in the given NumPy array.
        correct_label = self.test_labels[image]
        self.show_image( # We will display the image with its predicted and actual class.
            self.test_images[image], self.class_names[correct_label], predicted_class)

    # It displays an image and its predicted and actual class.
    def show_image(self, img, label, guess):
        plt.figure()
        plt.imshow(img)
        plt.title("Expected: " + label)
        plt.xlabel("Guess: " + guess)
        plt.colorbar()
        plt.grid(False)
        plt.show()
    # It displays images from the training set and displays the class name below each image, too.
    def display_training(self, begin, end):
        plt.figure(figsize=(10, 10))
        for i in range(begin, end):
            plt.subplot(5, 5, i+1)
            plt.xticks([])
            plt.yticks([])
            plt.grid(False)
            plt.imshow(self.train_images[i], cmap=plt.cm.binary) # It displays an image. 
            plt.xlabel(self.class_names[self.train_labels[i]]) # It displays the class name below each image.

        plt.show()
Image Classification with TensorFlow

Image Classification with TensorFlow

def main():
    myBasicClasification = BasicClasification()
    myBasicClasification.display_training(0, 24)
    myBasicClasification.evaluate_model()
    for i in range(10):
        myBasicClasification.predict(i)

if __name__ == '__main__':
    main()
Image Classification with TensorFlow

Image Classification with TensorFlow

Classify Handwritten Digits with Tensorflow

We will use TensorFlow and tf.keras, a high-level API to build and train models in TensorFlow.

import tensorflow as tf
# Numpy is the core library for scientific computing in Python.
import numpy as np
import matplotlib.pyplot as plt

Import and Preprocess the Data

We are going to use the MNIST dataset. This is a dataset of 60,000 28x28 grayscale images of the ten digits from 0 to 9, along with a test set of 10,000 images.

class BasicClasification:
    def __init__(self):
         self.mnist = tf.keras.datasets.mnist # Import and load the MNIST dataset.
        (self.train_images, self.train_labels), (self.test_images,
                                                 self.test_labels) = self.mnist.load_data()
        # It returns four NumPy arrays: train_images, train_labels are the data the model is going to use to learn (our training data). The model is going to be tested against the test_images and test.labels NumPy arrays.         
        self.train_images = tf.keras.utils.normalize(self.train_images, axis=1) # Our data needs to be preprocessed before creating the model and training the network. So, we are going to scale all greyscales pixel values (0, 255) to a range of 0 to 1. 
        self.test_images = tf.keras.utils.normalize(self.test_images, axis=1) # The method display_training is used to display images from the training set and verify that the data is in the correct format before feeding it to the model.

Creating a model

We are going to use a keras sequential model, that is, a linear stack of layers with three different layers. It represents a feed-forward neural network.

Feedforward neural networks are artificial neural networks where information only travels forward in the network (no loops), first through the input nodes, then through the hidden nodes (if present), and finally through the output nodes.

        self.model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dense(128, activation='relu'),
            tf.keras.layers.Dense(10,  activation='softmax')
        ])

The first layer in this network is the input layer. We use a flatten layer, tf.keras.layers.Flatten, with an input shape of (28,28). It transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels), so that each pixel will be associated with one neuron.

After the pixels are flattened, the network consists of a sequence of three tf.keras.layers.Dense layers. These are densely or fully-connected, neural layers. The first two Dense layers are hidden layers. They have 128 nodes (or neurons) each and use Relu as the activation function.

Relu is an activation function that is defined as this: relu(x) = { 0 if x<0, x if x > 0}. An activation function outputs a small value for small inputs, and a larger value if its inputs exceed a threshold. If the inputs are large enough, the activation function “fires”, otherwise it does nothing. In other words, an activation function is like a gate that checks that an incoming value is greater than a critical number. They add non-linearities into neural networks.

The last or output layer has ten neurons. The activation function softmax is used on this layer because softmax “squishes” the inputs so that sum(input) = 1, and it does the mapping by interpreting the inputs as log-probabilities (logits, it means representing probabilities on a logarithmic scale, instead of the standard [0,1] unit interval) and then converting them into probabilities between 0 and 1, so each neuron in this layer represents the probability of a given image belonging to one of the ten different classes: 0-9, the ten digits of our decimal system.

Compile the model

Before the model is ready for training, it still needs a few more settings where we are going to define:

        self.model.compile(optimizer='adam', loss = tf.keras.losses.SparseCategoricalCrossentropy( from_logits=True ), metrics=['accuracy'])
        self.isTrained = False # This flags indicates that the model is not trained yet, this is the last instruction of the class constructor.

Train the model

Training the model is as simple as a line of code. We need to feed the training data into the model so it can learn to associate all the images with their respective labels.

Training a model is an iterative process; in each iteration (called an epoch) the model makes a guess about the output, calculates the error in its guess (loss), computes the derivatives of the error with respect to its parameters (back propagation), then parameters (model weights and biases) are adjusted according to the gradient of the loss function, and finally, it updates these parameters.

    def training(self):
        self.model.fit(self.train_images, self.train_labels, epochs=3)
        self.isTrained = True

[…] As the model trains and iterates, the loss and accuracy metrics are being displayed in the terminal.
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0715 - accuracy: 0.9776
It reaches an accuracy of 97,76% on the training data.

Evaluating the model

Next, we test or evaluate the model, that is, compare how the model generalizes or performs over the test database. We use another built-in method from keras, model.evaluate.

    def evaluate_model(self):
        if self.isTrained == False: # This method requires that the model has been trained.
            self.training()
        test_loss, test_acc = self.model.evaluate(self.test_images, self.test_labels, verbose=2)
        print('\nTest accuracy:', test_acc)

Test accuracy: 0.9700000286102295 It is likely that the accuracy of the test set is lower than that of the training set. This difference is referred to as overfitting, but this is fortunately not the case.

Overfitting occurs when a model fits exactly against its training data. When the model memorizes the noise and fits too closely or exactly to our training set, the model becomes “overfitted,” and it is unable to generalize well to new data.

Making predictions

With the model already trained and evaluated, it is time to use it to make predictions on images. We are going to create a “predict” method in our class for that very purpose.

    def predict(self, image):
        if self.isTrained == False: # This method requires that the model has been trained.
            self.training()

        predictions = self.model.predict([self.test_images])
        print(f"Label: {self.test_labels[image]}")
        print(f"Prediction: {np.argmax(predictions[image])}") # np.argmax returns the index with the highest values in the given NumPy array.
        self.show_image(self.test_images[image], self. test_labels[image], np.argmax(predictions[image])) # We will display the image with its predicted and actual class/number.

    # It displays an image and its predicted and actual class/digit.
    def show_image(self, img, label, guess):
        plt.figure()
        plt.imshow(img)
        plt.title("Expected: " + str(label))
        plt.xlabel("Guess: " + str(guess))
        plt.colorbar()
        plt.grid(False)
        plt.show()
    # It displays images from the training set and displays the class label (digit) below each image, too.
    def display_training(self, begin, end):
        plt.figure(figsize=(10, 10))
        for i in range(begin, end):
            plt.subplot(5, 5, i+1)
            plt.xticks([])
            plt.yticks([])
            plt.grid(False)
            plt.imshow(self.train_images[i], cmap=plt.cm.binary) # It displays an image.
            plt.xlabel(str([self.train_labels[i]])) # It displays the class label (digit) below each image.

        plt.show()
Classify Handwritten Digits with Tensorflow

Classify Handwritten Digits with Tensorflow

def main():
    myBasicClasification = BasicClasification()
    myBasicClasification.display_training(0, 24)
    myBasicClasification.evaluate_model()
    for i in range(10):
        myBasicClasification.predict(i)

if __name__ == '__main__':
    main()
Classify Handwritten Digits with Tensorflow

Classify Handwritten Digits with Tensorflow

Bitcoin donation

JustToThePoint Copyright © 2011 - 2024 Anawim. ALL RIGHTS RESERVED. Bilingual e-books, articles, and videos to help your child and your entire family succeed, develop a healthy lifestyle, and have a lot of fun.

This website uses cookies to improve your navigation experience.
By continuing, you are consenting to our use of cookies, in accordance with our Cookies Policy and Website Terms and Conditions of use.