Teach you to use TensorFlow2 to recognize Arabic handwritten character datasets

@Author: Runsen

In this tutorial, we will use TensorFlow (Keras API) to implement a deep learning model for a multi-classification task that requires recognition of a dataset of Arabic handwritten characters.

Data set download address: https://www.kaggle.com/mloey1/ahcd1

Data set introduction

The data set consisted of 16,800 characters written by 60 participants, ranging in age from 19 to 40, and 90% of the participants were right-handed.

Each participant wrote each character (from "alef" to "yeh") ten times in two forms, as shown in Figures 7(a) and 7(b). The form is scanned at a resolution of 300 dpi. Use Matlab 2016a to automatically divide each block to determine the coordinates of each block. The database is divided into two groups: training set (13,440 characters to 480 images per category) and test set (3,360 characters to 120 images per category). The data labels are from 1 to 28 categories.
Here, all data sets are CSV files, representing image pixel values ​​and their corresponding labels, and no corresponding image data is provided.

Import module

import numpy as np
import pandas as pd
#允许对dataframe使用display()
from IPython.display import display
# 导入读取和处理图像所需的库
import csv
from PIL import Image
from scipy.ndimage import rotate

Read data

# 训练数据images
letters_training_images_file_path = "../input/ahcd1/csvTrainImages 13440x1024.csv"
# 训练数据labels
letters_training_labels_file_path = "../input/ahcd1/csvTrainLabel 13440x1.csv"
# 测试数据images和labels
letters_testing_images_file_path = "../input/ahcd1/csvTestImages 3360x1024.csv"
letters_testing_labels_file_path = "../input/ahcd1/csvTestLabel 3360x1.csv"

# 加载数据
training_letters_images = pd.read_csv(letters_training_images_file_path, header=None)
training_letters_labels = pd.read_csv(letters_training_labels_file_path, header=None)
testing_letters_images = pd.read_csv(letters_testing_images_file_path, header=None)
testing_letters_labels = pd.read_csv(letters_testing_labels_file_path, header=None)

print("%d个32x32像素的训练阿拉伯字母图像。" %training_letters_images.shape[0])
print("%d个32x32像素的测试阿拉伯字母图像。" %testing_letters_images.shape[0])
training_letters_images.head()

13,440 32x32 pixel training Arabic alphabet images.
3360 32x32 pixel test Arabic alphabet images.

View the head of the training data

np.unique(training_letters_labels)
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], dtype=int32)

Next, we need to convert the csv value to an image, and we want to display the pixel value image of the corresponding image.

def convert_values_to_image(image_values, display=False):
    image_array = np.asarray(image_values)
    image_array = image_array.reshape(32,32).astype('uint8')
    # 原始数据集被反射,因此我们将使用np.flip翻转它,然后通过rotate旋转,以获得更好的图像。
    image_array = np.flip(image_array, 0)
    image_array = rotate(image_array, -90)
    new_image = Image.fromarray(image_array)
    if display == True:
        new_image.show()
    return new_image
convert_values_to_image(training_letters_images.loc[0], True)


This is a letter f.

Below, we will perform data preprocessing, mainly image standardization. We rescale the image by dividing each pixel in the image by 255, and normalize to[0,1]

training_letters_images_scaled = training_letters_images.values.astype('float32')/255
training_letters_labels = training_letters_labels.values.astype('int32')
testing_letters_images_scaled = testing_letters_images.values.astype('float32')/255
testing_letters_labels = testing_letters_labels.values.astype('int32')
print("Training images of letters after scaling")
print(training_letters_images_scaled.shape)
training_letters_images_scaled[0:5]

The output is as follows

Training images of letters after scaling
(13440, 1024)

From the label csv file, we can see that this is a multi-class classification problem. The next step is to perform classification label coding. It is recommended to convert the category vector to a matrix type.

The output format is as follows: change 1 to 28 into 0 to 27 categories. The letters from "alef" to "yeh" have classification numbers from 0 to 27. to_categorical is the matrix type representation that converts the category vector to binary (​only 0 and 1)

Here, we will use a hot encoding of keras to encode these category values.

A hot encoding converts an integer into a binary matrix, where the array contains only one "1" and the remaining elements are "0".

from keras.utils import to_categorical

# one hot encoding
number_of_classes = 28

training_letters_labels_encoded = to_categorical(training_letters_labels-1, num_classes=number_of_classes)
testing_letters_labels_encoded = to_categorical(testing_letters_labels-1, num_classes=number_of_classes)
print(training_letters_images_scaled.shape)
# (13440, 1024)

The following reshape the input image to 32x32x1, because when using TensorFlow as the backend, Keras CNN needs a 4D array as input, with a shape ( nb_samples、行、列、通道)

Where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels of each image, respectively.

# reshape input letter images to 32x32x1
training_letters_images_scaled = training_letters_images_scaled.reshape([-1, 32, 32, 1])
testing_letters_images_scaled = testing_letters_images_scaled.reshape([-1, 32, 32, 1])

print(training_letters_images_scaled.shape, training_letters_labels_encoded.shape, testing_letters_images_scaled.shape, testing_letters_labels_encoded.shape)
# (13440, 32, 32, 1) (13440, 28) (3360, 32, 32, 1) (3360, 28)

Therefore, we will reshape the input image into a 4D tensor shape (nb_samples, 32, 32, 1), because our image is a 32x32 pixel grayscale image.

#将输入字母图像重塑为32x32x1
training_letters_images_scaled = training_letters_images_scaled.reshape([-1, 32, 32, 1])
testing_letters_images_scaled = testing_letters_images_scaled.reshape([-1, 32, 32, 1])

print(training_letters_images_scaled.shape, training_letters_labels_encoded.shape, testing_letters_images_scaled.shape, testing_letters_labels_encoded.shape)

Design model structure

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization, Dropout, Dense

def create_model(optimizer='adam', kernel_initializer='he_normal', activation='relu'):
    # create model
    model = Sequential()
    model.add(Conv2D(filters=16, kernel_size=3, padding='same', input_shape=(32, 32, 1), kernel_initializer=kernel_initializer, activation=activation))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=2))
    model.add(Dropout(0.2))

    model.add(Conv2D(filters=32, kernel_size=3, padding='same', kernel_initializer=kernel_initializer, activation=activation))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=2))
    model.add(Dropout(0.2))

    model.add(Conv2D(filters=64, kernel_size=3, padding='same', kernel_initializer=kernel_initializer, activation=activation))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=2))
    model.add(Dropout(0.2))

    model.add(Conv2D(filters=128, kernel_size=3, padding='same', kernel_initializer=kernel_initializer, activation=activation))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=2))
    model.add(Dropout(0.2))
    model.add(GlobalAveragePooling2D())

    #Fully connected final layer
    model.add(Dense(28, activation='softmax'))

    # Compile model
    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer=optimizer)
    return model

Model structure

  • The first hidden layer is the convolutional layer. This layer has 16 feature maps with a size of 3×3 and an activation function, which is relu. This is the input layer and requires an image with the above structure.
  • The second layer is the batch normalization layer, which solves the changes in the feature distribution in the training and test data. The BN layer is added before the activation function to normalize the input of the input activation function. This solves the effect of offset and increase in the input data.
  • The third layer is the MaxPooling layer. The maximum pooling layer is used to downsample the input so that the model can make assumptions about features, thereby reducing overfitting. It also reduces the number of parameter learning and reduces training time.
  • The next layer is the regularization layer using dropout. It is configured to randomly exclude 20% of the neurons in the layer to reduce overfitting.
  • Another hidden layer contains 32 elements with a size of 3×3 and relu activation function to capture more features from the image.
  • The other hidden layers contain 64 and 128 elements with a size of 3×3 and a relu activation function,
  • Repeat the convolutional layer, MaxPooling, batch normalization, regularization and *GlobalAveragePooling2D layer three times.
  • The last layer is the output layer with (number of output classes), which uses the softmax activation function because we have multiple classes. Each neuron will give the probability of that class.
  • I use classification cross entropy as the loss function because it is a multi-class classification problem. I use accuracy as a measure to improve the performance of our neural network.
model = create_model(optimizer='Adam', kernel_initializer='uniform', activation='relu')
model.summary()


Keras supports drawing models in the Keras.utils.vis_utils module, which provides practical functions for drawing Keras models using graphviz

import pydot
from keras.utils import plot_model

plot_model(model, to_file="model.png", show_shapes=True)
from IPython.display import Image as IPythonImage
display(IPythonImage('model.png'))

Train the model, use it batch_size=20to train the model, and train the model for 15 epochs.

from keras.callbacks import ModelCheckpoint  

# 使用检查点来保存稍后使用的模型权重。
checkpointer = ModelCheckpoint(filepath='weights.hdf5', verbose=1, save_best_only=True)
history = model.fit(training_letters_images_scaled, training_letters_labels_encoded,validation_data=(testing_letters_images_scaled,testing_letters_labels_encoded),epochs=15, batch_size=20, verbose=1, callbacks=[checkpointer])

The training results are as follows:


Finally, Epochs draws loss and accuracy curves.

import matplotlib.pyplot as plt

def plot_loss_accuracy(history):
    # Loss 
    plt.figure(figsize=[8,6])
    plt.plot(history.history['loss'],'r',linewidth=3.0)
    plt.plot(history.history['val_loss'],'b',linewidth=3.0)
    plt.legend(['Training loss', 'Validation Loss'],fontsize=18)
    plt.xlabel('Epochs ',fontsize=16)
    plt.ylabel('Loss',fontsize=16)
    plt.title('Loss Curves',fontsize=16)

    # Accuracy 
    plt.figure(figsize=[8,6])
    plt.plot(history.history['accuracy'],'r',linewidth=3.0)
    plt.plot(history.history['val_accuracy'],'b',linewidth=3.0)
    plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18)
    plt.xlabel('Epochs ',fontsize=16)
    plt.ylabel('Accuracy',fontsize=16)
    plt.title('Accuracy Curves',fontsize=16) 

plot_loss_accuracy(history)


Load the model with the best validation loss

# 加载具有最佳验证损失的模型
model.load_weights('weights.hdf5')
metrics = model.evaluate(testing_letters_images_scaled, testing_letters_labels_encoded, verbose=1)
print("Test Accuracy: {}".format(metrics[1]))
print("Test Loss: {}".format(metrics[0]))

The output is as follows:

3360/3360 [==============================] - 0s 87us/step
Test Accuracy: 0.9678571224212646
Test Loss: 0.11759862171020359

Print the confusion matrix.

from sklearn.metrics import classification_report

def get_predicted_classes(model, data, labels=None):
    image_predictions = model.predict(data)
    predicted_classes = np.argmax(image_predictions, axis=1)
    true_classes = np.argmax(labels, axis=1)
    return predicted_classes, true_classes, image_predictions

def get_classification_report(y_true, y_pred):
    print(classification_report(y_true, y_pred))


y_pred, y_true, image_predictions = get_predicted_classes(model, testing_letters_images_scaled, testing_letters_labels_encoded)
get_classification_report(y_true, y_pred)

The output is as follows:


              precision    recall  f1-score   support

           0       1.00      0.98      0.99       120
           1       1.00      0.98      0.99       120
           2       0.80      0.98      0.88       120
           3       0.98      0.88      0.93       120
           4       0.99      0.97      0.98       120
           5       0.92      0.99      0.96       120
           6       0.94      0.97      0.95       120
           7       0.94      0.95      0.95       120
           8       0.96      0.88      0.92       120
           9       0.90      1.00      0.94       120
          10       0.94      0.90      0.92       120
          11       0.98      1.00      0.99       120
          12       0.99      0.98      0.99       120
          13       0.96      0.97      0.97       120
          14       1.00      0.93      0.97       120
          15       0.94      0.99      0.97       120
          16       1.00      0.93      0.96       120
          17       0.97      0.97      0.97       120
          18       1.00      0.93      0.96       120
          19       0.92      0.95      0.93       120
          20       0.97      0.93      0.94       120
          21       0.99      0.96      0.97       120
          22       0.99      0.98      0.99       120
          23       0.98      0.99      0.99       120
          24       0.95      0.88      0.91       120
          25       0.94      0.98      0.96       120
          26       0.95      0.97      0.96       120
          27       0.98      0.99      0.99       120

    accuracy                           0.96      3360
   macro avg       0.96      0.96      0.96      3360
weighted avg       0.96      0.96      0.96      3360

Finally draw a few random related prediction pictures

fig = plt.figure(0, figsize=(18,18))
indices = np.random.randint(0, testing_letters_labels.shape[0], size=49)
y_pred = np.argmax(model.predict(training_letters_images_scaled), axis=1)

for i, idx in enumerate(indices):
    plt.subplot(7,7,i+1)
        
    image_array = training_letters_images_scaled[idx][:,:,0]
    image_array = np.flip(image_array, 0)
    image_array = rotate(image_array, -90)
       
    plt.imshow(image_array, cmap='gray')
    plt.title("Pred: {} - Label: {}".format(y_pred[idx], (training_letters_labels[idx] -1)))
    plt.xticks([])
    plt.yticks([])
plt.show()