Both MLP and CNN can be used for Image classification. However, MLP takes vector as input, but CNN takes tensor as input. CNN understand spatial relation(relation between nearby pixels of image) between pixels of images better thus for complicated images CNN will perform better than MLP. RNN is powerful for modeling sequence data such as time series or natural language. In this section, we are going to learn the difference between MLP and CNN, as well as RNN.
Here we utilize the mnist recolonization as an example to illustrate the difference in the model-building processes of MLP and CNN. For more details, please visit the site.
The dataset can be uploaded from the keras package, and also from local computer. We upload it from keras.
After that, we check the structure of the data and do data pre-processing to allow the data meeting the requirement of MLP or CNN.
# x_train <- mnist$train$x
# y_train <- mnist$train$y
# x_test <- mnist$test$x
# y_test <- mnist$test$y
# # dim(x_train)
The x is a 3-d array (images,width,height) of grayscale values. We convert the 3-d arrays into the 2-d matrices by reshaping width and height into a single dimension (28x28 images are flattened into 784 vectors). The 2-d representation is used for a fully connected neural network or MLP as the input layer, which typically expects a 2-d matrix where each row is a training sample. We also convert the grayscale values from integers ranging between 0 to 255 into floating point values ranging between 0 and 1.
# # for MLP algorithm
# # reshape
# x_train_m <- array_reshape(x_train, dim = c(nrow(x_train), 784))
# x_test_m <- array_reshape(x_test, dim = c(nrow(x_test), 784))
# # rescale
# x_train_m <- x_train_m / 255
# x_test_m <- x_test_m / 255
We convert the x_train into a 4-d tensor with dimensions (nrow(x_train), 28, 28, 1). The nrow(x_train) represents the number of training samples. It retains the original 2-d spatial structure of the images (28x28), but it adds an extra dimension at the end. This additional dimension is dependent on color channels (e.g., 1 for grayscale images, 3 for RGB images). It’s a common practice to have this 4-d shape to make the input compatible with CNN that expect 4-d input tensors.
# # for CNN algorithm
# x_train_c <- array_reshape(x_train, dim = c(nrow(x_train), 28, 28, 1))
# x_test_c <- array_reshape(x_test, dim = c(nrow(x_test), 28, 28, 1))
# # rescale
# x_train_c <- x_train_c / 255
# x_test_c <- x_test_c / 255
The y data is an integer vector with values ranging from 0 to 9 for MLP and CNN. We use one-hot encoding to encode the vectors into binary class matrices using the Keras to_categorical() function.
# y_train <- to_categorical(y_train, 10)
# y_test <- to_categorical(y_test, 10)
The three neural networks, MLP, CNN and RNN, should be ideally used for the type of problem. MLP is good for simple image classification, while CNN is a good choice for complicated image classification and RNN is designed for sequence processing such as time-series data.
MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP uses a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perception.
CNN uses a custom matrix (filter) to convolute over images and create a map, and it consist of the input layer, convolution layer, pooling layer, fully connected layer, and output layer. It takes matrices as well as vectors as inputs. Layers are sparsely connected or partially connected rather than fully connected. Every node does not connect to every other node.
The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the Sequential model, a linear stack of layers. For MLP, the first layer specifies the shape of the input data (a length 784 numeric vector). The final layer outputs a length 10 numeric vector using a softmax activation function.
# # for MLP algorithm
# model <- keras_model_sequential()
# model %>%
# layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
# layer_dropout(rate = 0.4) %>%
# layer_dense(units = 128, activation = 'relu') %>%
# layer_dropout(rate = 0.3) %>%
# layer_dense(units = 10, activation = 'softmax')
# summary(model)
For CNN, the input_shape has 4-d. Its model structure is as follows.
# # for CNN algorithm
# model <- keras_model_sequential()
# model %>%
# layer_conv_2d(filters = 32, kernel_size = c(3, 3), padding = 'same', input_shape = c(28, 28, 1)) %>%
# layer_activation('relu') %>%
# # layer_max_pooling_2d(pool_size=c(2, 2), strides=c(2, 2)) %>%
# layer_conv_2d(filters = 16, kernel_size = c(2, 2), dilation_rate = 1, activation = 'softplus', padding = 'same') %>%
# layer_max_pooling_2d(pool_size=c(2, 2)) %>%
# layer_flatten() %>%
# layer_dense(1000, activation = 'relu') %>%
# layer_dropout(0.5) %>%
# layer_dense(10, activation = 'softmax')
# summary(model)
Next, we compile the model with appropriate loss function, optimizer, and metrics, and use the fit() function to train the model for 30 epochs using batches of 128 images. For MLP, using x_train_a to fit the model.
# # for MLP by feeding x_train_m
# model %>% compile(
# loss = 'categorical_crossentropy',
# optimizer = optimizer_rmsprop(),
# metrics = c('accuracy')
# )
#
# history <- model %>% fit(
# x_train_m, y_train,
# epochs = 30, batch_size = 128,
# validation_split = 0.2
# )
#
# plot(history)
For CNN, feeding x_train_c to fit the model.
# for CNN by feeding x_train_c
# model %>% compile(
# loss = 'categorical_crossentropy',
# optimizer = optimizer_rmsprop(),
# metrics = c('accuracy')
# )
#
# history <- model %>% fit(
# x_train_c, y_train,
# epochs = 10
# )
#
# plot(history)
Evaluate the model’s performance on the test data and predictions on new data.
# model %>% evaluate(x_test_m, y_test)
# model %>% predict(x_test_m)
As usual, the training data are too large to fit into memory. So we can use generators to read data, preprocess and eventually feed them into model for training. generator in R is to define a function within a function, which is well documented in the website and the website.
we can define a sampling generator, which serves as pre-processing pipelines – linking raw data to our expected data format by doing sampling, re-scaling, re-shaping and one-hot encoding.
# library(keras)
# np = reticulate::import('numpy') # converting data to numpy type
#
# # Load the MNIST dataset
# mnist <- dataset_mnist()
# x_train <- mnist$train$x
# y_train <- mnist$train$y
#
# # Sample size and indices
# sample_size <- 100 # Adjust as needed
# sample_indices <- sample(1:nrow(x_train), sample_size, replace = FALSE)
#
# # Create a data generator with preprocessing
# mnist_generator <- function(batch_size, sample_indices, x_data, y_data) {
# function() {
# batch_indices <- sample(sample_indices, batch_size, replace = TRUE)
# x_batch <- x_data[batch_indices,, , drop = FALSE] / 255
# y_batch <- to_categorical(as.numeric(y_data[batch_indices]), 10)
#
# # Convert to numpy arrays
# x_np <- np$array(x_batch)
# y_np <- np$array(y_batch)
#
# list(x = x_np, y = y_np)
# }
# }
#
# # Define the MLP model
# model <- keras_model_sequential() %>%
# layer_flatten(input_shape = c(28, 28, 1)) %>%
# layer_dense(units = 128, activation = 'relu') %>%
# layer_dense(units = 64, activation = 'relu') %>%
# layer_dense(units = 10, activation = 'softmax')
#
# model %>% compile(
# optimizer = 'adam',
# loss = 'categorical_crossentropy',
# metrics = c('accuracy')
# )
#
# # Define the batch size and number of epochs
# batch_size <- 32
# epochs <- 10 # Adjust as needed
#
# # Train the model using the generator
# history <- model %>% fit_generator(
# generator = mnist_generator(batch_size, sample_indices, x_train, y_train),
# steps_per_epoch = sample_size / batch_size,
# epochs = epochs
# )
#
# # Print training history
# print(history)
Callbacks are used to control the training process, such as saving model, early stop and reducing learning rate. Here’s an example that includes a sampling generator in the chunk along with the callbacks for model checkpointing, early stopping, and reducing learning rate.
# Load necessary libraries
# library(keras)
# library(reticulate)
# np = reticulate::import('numpy')
# # Load the MNIST dataset
# mnist <- dataset_mnist()
# x_train <- mnist$train$x
# y_train <- mnist$train$y
# x_test <- mnist$test$x
# y_test <- mnist$test$y
#
# # Convert data to numpy arrays
# x_train_np <- np$array(x_train / 255)
# y_train_np <- np$array(to_categorical(y_train, 10))
# x_test_np <- np$array(x_test / 255)
# y_test_np <- np$array(to_categorical(y_test, 10))
#
# # Function to define a simple model
# define_model <- function() {
# model <- keras_model_sequential() %>%
# layer_flatten(input_shape = c(28, 28, 1)) %>%
# layer_dense(units = 128, activation = 'relu') %>%
# layer_dense(units = 64, activation = 'relu') %>%
# layer_dense(units = 10, activation = 'softmax')
#
# model %>% compile(
# optimizer = 'adam',
# loss = 'categorical_crossentropy',
# metrics = c('accuracy')
# )
#
# return(model)
# }
#
# # Create the model
# model <- define_model()
#
# # Train the model using the fit function
# history <- model %>% fit(
# x = x_train_np,
# y = y_train_np,
# epochs = 20,
# batch_size = 32,
# validation_data = list(x_test_np, y_test_np),
# callbacks = list(
# callback_model_checkpoint("model_checkpoint.h5", save_best_only = TRUE),
# callback_early_stopping(monitor = "val_loss", patience = 3)
# )
# )
#
# # Print training history
# print(history)
There are a number of tools available for visualizing the training of Keras models, including: 1) A plot method for the Keras training history returned from fit(); 2) Real time visualization of training metrics within the RStudio IDE; 3) Integration with the TensorBoard visualization tool included with TensorFlow.
The Keras’s fit() method returns an R object containing the training history, including the value of metrics at the end of each epoch. You can plot the training metrics by epoch using the plot() method. Here we compile and fit a model with the “accuracy” metric.
# library(keras) # loading the package for data and modelling
# library(abind) # operating multidimensional arrays, which are often expressed any image.
# mnist <- dataset_mnist("/home/tank/Desktop/ecodatasci/images/mnist.npz")
# x_train <- mnist$train$x
# y_train <- mnist$train$y
# x_test <- mnist$test$x
# y_test <- mnist$test$y
# # for ANN algorithm
# # reshape
# x_train_a <- array_reshape(x_train, dim = c(nrow(x_train), 784))
# x_test_a <- array_reshape(x_test, dim = c(nrow(x_test), 784))
# # rescale
# x_train_a <- x_train_a / 255
# x_test_a <- x_test_a / 255
# model <- keras_model_sequential()
# model %>%
# layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
# layer_dropout(rate = 0.4) %>%
# layer_dense(units = 128, activation = 'relu') %>%
# layer_dropout(rate = 0.3) %>%
# layer_dense(units = 10, activation = 'softmax')
# summary(model)
# # for ANN by feeding x_train_a
# model %>% compile(
# loss = 'categorical_crossentropy',
# optimizer = optimizer_rmsprop(),
# metrics = c('accuracy')
# )
# history <- model %>% fit(
# x_train_a, y_train,
# epochs = 30, batch_size = 128,
# validation_split = 0.2
# )
#
# plot(history)
# model %>% evaluate(x_test_a, y_test)
# model %>% predict(x_test_a)
# # for CNN algorithm
# x_train_c <- array_reshape(x_train, dim = c(nrow(x_train), 28, 28, 1))
# x_test_c <- array_reshape(x_test, dim = c(nrow(x_test), 28, 28, 1))
# # rescale
# x_train_c <- x_train_c / 255
# x_test_c <- x_test_c / 255
# y_train <- to_categorical(y_train, 10)
# y_test <- to_categorical(y_test, 10)
# model <- keras_model_sequential()
# model %>%
# layer_conv_2d(filters = 32, kernel_size = c(3, 3), padding = 'same', input_shape = c(28, 28, 1)) %>%
# layer_activation('relu') %>%
# # layer_max_pooling_2d(pool_size=c(2, 2), strides=c(2, 2)) %>%
# layer_conv_2d(filters = 16, kernel_size = c(2, 2), dilation_rate = 1, activation = 'softplus', padding = 'same') %>%
# layer_max_pooling_2d(pool_size=c(2, 2)) %>%
# layer_flatten() %>%
# layer_dense(1000, activation = 'relu') %>%
# layer_dropout(0.5) %>%
# layer_dense(10, activation = 'softmax')
# summary(model)
#
# model %>% compile(
# loss = 'categorical_crossentropy',
# optimizer = optimizer_rmsprop(),
# metrics = c('accuracy')
# )
#
# history <- model %>% fit(
# x_train_c, y_train,
# epochs = 10, batch_size = 128,
# validation_split = 0.2
# )
#
# plot(history)
Tensorboard is the UI view to compare different models as well as the model structure visualization. To launch your tensorboard, type this in your terminal. Beyond just training metrics, TensorBoard has a wide variety of other visualizations available including the underlying TensorFlow graph, gradient histograms, model weights, and more. TensorBoard also enables you to compare metrics across multiple training runs.
# # launch TensorBoard
# tensorboard("logs/run_a")
#
# # fit the model with the TensorBoard callback
# history <- model %>% fit(
# x_train, y_train,
# batch_size = batch_size,
# epochs = epochs,
# verbose = 1,
# callbacks = callback_tensorboard("logs/run_a"),
# validation_split = 0.2
# )