How to Evaluate Pixel Scaling Methods for Image Classification With Convolutional Neural Networks

Image data must be prepared before it can be used as the basis for modeling in image classification tasks.

One aspect of preparing image data is scaling pixel values, such as normalizing the values to the range 0-1, centering, standardization, and more.

How do you choose a good, or even best, pixel scaling method for your image classification or computer vision modeling task?In this tutorial, you will discover how to choose a pixel scaling method for image classification with deep learning methods.

After completing this tutorial, you will know:Let’s get started.

How to Evaluate Pixel Scaling Methods for Image Classification With Convolutional Neural NetworksPhoto by Andres Alvarado, some rights reserved.

This tutorial is divided into 6 parts; they are:Given a new image classification task, what pixel scaling methods should be used?There are many ways to answer this question; for example:Instead, I recommend using experimentation in order to discover what works best for your specific dataset.

This can be achieved using the following process:The experimental approach will use a non-optimized model and perhaps a subset of training data, both of which may add noise to the decision you must make.

Therefore, you are looking for a signal that one data preparation scheme for your images is clearly better than the others; if this is not the case for your dataset, then the simplest (least computationally complex) technique should be used, such as pixel normalization.

A clear signal of a superior pixel scaling method may be seen in one of two ways:Now that we have a procedure for choosing a pixel scaling method for image data, let’s look at an example.

We will use the MNIST image classification task fit with a CNN and evaluate a range of standard pixel scaling methods.

The MNIST problem, or MNIST for short, is an image classification problem comprised of 70,000 images of handwritten digits.

The goal of the problem is to classify a given image of a handwritten digit as an integer from 0 to 9.

As such, it is a multiclass image classification problem.

It is a standard dataset for evaluating machine learning and deep learning algorithms.

Best results for the dataset are about 99.

79% accurate, or an error rate of about 0.

21% (e.


less than 1%).

This dataset is provided as part of the Keras library and can be automatically downloaded (if needed) and loaded into memory by a call to the keras.



load_data() function.

The function returns two tuples: one for the training inputs and outputs and one for the test inputs and outputs.

For example:We can load the MNIST dataset and summarize it.

The complete example is listed below.

Running the example first loads the dataset into memory.

Then the shape of the training and test datasets is reported.

We can see that all images are 28 by 28 pixels with a single channel for grayscale images.

There are 60,000 images for the training dataset and 10,000 for the test dataset.

We can also see that pixel values are integer values between 0 and 255 and that the mean and standard deviation of the pixel values are similar between the two datasets.

The dataset is relatively small; we will use the entire train and test datasetNow that we are familiar with MNIST and how to load the dataset, let’s review some pixel scaling methods.

We will use a convolutional neural network model to evaluate the different pixel scaling methods.

A CNN is expected to perform very well on this problem, although the model chosen for this experiment does not have to perform well or best for the problem.

Instead, it must be skillful (better than random) and must allow the impact of different data preparation schemes to be differentiated in terms of speed of learning and/or model performance.

As such, the model must have sufficient capacity to learn the problem.

We will demonstrate the baseline model on the MNIST problem.

First, the dataset must be loaded and the shape of the train and test dataset expanded to add a channel dimension, set to one as we only have a single black and white channel.

Next, we will normalize the pixel values for this example and one hot encode the target values, required for multiclass classification.

The model is defined as a convolutional layer followed by a max pooling layer; this combination is repeated again, then the filter maps are flattened, interpreted by a fully connected layer and followed by an output layer.

The ReLU activation function is used for hidden layers and the softmax activation function is used for the output layer.

Enough filter maps and nodes are specified to provide sufficient capacity to learn the problem.

The Adam variation of stochastic gradient descent is used to find the model weights.

The categorical cross entropy loss function is used, required for multi-class classification, and classification accuracy is monitored during training.

The model is fit for five training epochs and a large batch size of 128 images is used.

Once fit, the model is evaluated on the test dataset.

The complete example is listed below and will easily run on the CPU in about a minute.

Running the example shows that the model is capable of learning the problem well and quickly.

In fact, the performance of the model on the test dataset on this run is 99%, or a 1% error rate.

This is not state of the art (by design), but is not terribly far from state of the art either.

Neural network models often cannot be trained on raw pixel values, such as pixel values in the range of 0 to 255.

The reason is that the network uses a weighted sum of inputs, and for the network to both be stable and train effectively, weights should be kept small.

Instead, the pixel values must be scaled prior to training.

There are perhaps three main approaches to scaling pixel values; they are:Traditionally, sigmoid activation functions were used and inputs that sum to 0 (zero mean) were preferred.

This may or may not still be the case with the wide adoption of ReLU and similar activation functions.

Further, in centering and standardization, the mean or mean and standard deviation can be calculated across a channel, an image, a mini-batch, or the entire training dataset.

This may add additional variations on a chosen scaling method that may be evaluated.

Normalization is often the default approach as we can assume pixel values are always in the range 0-255, making the procedure very simple and efficient to implement.

Centering is often promoted as the preferred approach as it was used in many popular papers, although the mean can be calculated per image (global) or channel (local) and across the batch of images or the entire training dataset, and often the procedure described in a paper does not specify exactly which variation was used.

We will experiment with the three approaches listed above, namely normalization, centering, and standardization.

The mean for centering and the mean and standard deviation for standardization will be calculated across the entire training dataset.

Other variations you could explore include:The example below implements the three chosen pixel scaling methods and demonstrate their effect on the MNIST dataset.

Running the example first normalizes the dataset and reports the min, max, mean, and standard deviation for the train and test dataset.

This is then repeated for the centering and standardization data preparation schemes.

The results provide evidence that the scaling procedures are indeed implemented correctly.

Now that we have defined the dataset, the model, and the data preparation schemes to evaluate, we are ready to define and run the experiment.

Each model takes about one minute to run on the CPU, so we don’t want to the experiment to take too long.

We will evaluate each of the three data preparation schemes and each scheme will be evaluated 10 times, meaning that about 30 minutes will be required to complete the experiment on modern hardware.

We can define a function to load the dataset afresh when needed.

We can also define a function to define and compile our model ready to fit on the problem.

We already have functions for preparing the pixel data for the train and test datasets.

Finally, we can define a function called repeated_evaluation() that takes the name of the data preparation function to call to prepare the data and will load the dataset and repeatedly define the model, prepare the dataset, fit, and evaluate the model.

It will return a list of accuracy scores that can be used to summarize the performance of the model under the chosen data preparation scheme.

The repeated_evaluation() function can then be called for each of the three data preparation schemes and the mean and standard deviation of model performance under the scheme can be reported.

We can also create a box and whisker plot to summarize and compare the distribution of accuracy scores for each scheme.

Tying all of this together, the complete example of running the experiment to compare pixel scaling methods on the MNIST dataset is listed below.

Running the example may take about 30 minutes on the CPU and your results may vary given the stochastic nature of the training algorithm.

The accuracy is reported for each repeated evaluation of the model and the mean and standard deviation of accuracy scores are repeated at the end of each run.

Box and Whisker Plot of CNN Performance on MNIST With Different Pixel Scaling MethodsBox and Whisker Plot of CNN Performance on MNIST With Different Pixel Scaling MethodsFor brevity, we will only look at model performance in the comparison of data preparation schemes.

An extension to this study would also look at learning rates under each pixel scaling method.

The results of the experiments show that there is little or no difference (at the chosen precision) between pixel normalization and standardization with the chosen model on the MNIST dataset.

From these results, I would use normalization over standardization on this dataset with this model because of the good results and because of the simplicity of normalization as compared to standardization.

These are useful results in that they show that the default heuristic to center pixel values prior to modeling would not be good advice for this dataset.

Sadly, the box and whisker plot does not make a comparison between the spread of accuracy scores easy as some terrible outlier scores for the centering scaling method squash the distributions.

This section lists some ideas for extending the tutorial that you may wish to explore.

If you explore any of these extensions, I’d love to know.

Post your findings in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered how to choose a pixel scaling method for image classification with deep learning methods.

Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.

.. More details

Leave a Reply