Getting started with AWS Deep Learning by training Convolutional Neural Network to predict Images on CIFAR10 dataset

Getting started with AWS Deep Learning by training Convolutional Neural Network to predict Images on CIFAR10 datasetRock Your DataBlockedUnblockFollowFollowingMar 7IntroductionDeep learning is a subset of machine learning, and machine learning is a subset of AI, which is an umbrella term for any computer program that does something smart.

In other words, all machine learning is AI, but not all AI is machine learning, and so forth.

You can think of deep learning, machine learning and artificial intelligence as a set of Russian dolls nested within each other, beginning with the smallest and working out.

In one of the previous post, we met AWS Deep Learning and deploy A Multi-Layer Perceptron Neural Network Model using AWS EC2 Deep Learning AMI instance.

In this post, we may continue to use the same instance or deploy a new one.

Meet CIFARCIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset.

They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

The CIFAR-10 dataset consists of 60,000 32×32 color images in 10 classes, with 6000 images per class.

There are 50,000 training images and 10,000 test images:The dataset is divided into five training batches and one test batch, each with 10,000 images.

The test batch contains exactly 1000 randomly-selected images from each class.

The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another.

Between them, the training batches contain exactly 5000 images from each class.

The following are general ways to work with image datasets:ClassificationLocalizationSegmentationScene classificationScene parsing to segment and parse an image into different image regions associated with semantic categories, such as sky, road, person, and bed.

In this post, we will use Residual Net for Image Recognition:Connecting EC2 and Launch Jupyter NotebookWe should download our Privat key and use it for SSH to the instance.

chmod 400 p2_amy_key.

pemssh -i p2_amy_key.

pem -L 8888:localhost:8888 ec2-user@<ip>It is important to adjust security groups for instance in order to allowWhen we connected, it will show us the message:__| __|_ )_| ( / Deep Learning AMI for Amazon Linux___|___|___|Then, we will connect Jupyter Notebook:cd src nohup jupyter notebook –no-browser –port=8888 &We should run this command in order to get URL for notebooktail nohup.

outCopy/paste this URL into your browser when you connect for the first time, to login with a token: http://localhost:8888/?token=xxxxThen, we will create a new Notebook with Python3.

Creating a Convolutional Neural Network ModelWe will paste the code into our notebook:import os, sysimport argparseimport loggingimport mxnet as mximport randomfrom mxnet.

io import DataBatch, DataIterimport numpy as npimport timeimport subprocessimport errnoThen, we will define the hyperparameters for the data, add them as arguments into a parse:def add_data_args(parser): data = parser.

add_argument_group('Data', 'the input images') data.

add_argument('–data-train', type=str, help='the training data') data.

add_argument('–data-val', type=str, help='the validation data') data.

add_argument('–rgb-mean', type=str, default='123.

68,116.

779,103.

939', help='a tuple of size 3 for the mean rgb') data.

add_argument('–pad-size', type=int, default=0, help='padding the input image') data.

add_argument('–image-shape', type=str, help='the image shape feed into the network, e.

g.

(3,224,224)') data.

add_argument('–num-classes', type=int, help='the number of classes') data.

add_argument('–num-examples', type=int, help='the number of training examples') data.

add_argument('–data-nthreads', type=int, default=4, help='number of threads for data decoding') return dataNext, define the image data iterator:def get_rec_iter(args, kv=None): image_shape = tuple([int(l) for l in args.

image_shape.

split(',')]) dtype = np.

float32if kv: (rank, nworker) = (kv.

rank, kv.

num_workers) else: (rank, nworker) = (0, 1)rgb_mean = [float(i) for i in args.

rgb_mean.

split(',')]train = mx.

io.

ImageRecordIter( path_imgrec = args.

data_train, label_width = 1, mean_r = rgb_mean[0], mean_g = rgb_mean[1], mean_b = rgb_mean[2], data_name = 'data', label_name = 'softmax_label', data_shape = image_shape, batch_size = args.

batch_size, pad = args.

pad_size, fill_value = 127, preprocess_threads = args.

data_nthreads, shuffle = True, num_parts = nworker, part_index = rank) if args.

data_val is None: return (train, None) val = mx.

io.

ImageRecordIter( path_imgrec = args.

data_val, label_width = 1, mean_r = rgb_mean[0], mean_g = rgb_mean[1], mean_b = rgb_mean[2], data_name = 'data', label_name = 'softmax_label', batch_size = args.

batch_size, data_shape = image_shape, preprocess_threads = args.

data_nthreads, num_parts = nworker, part_index = rank) return (train, val)We should save the model after each epoch.

We will use a utility function _save_model for training the dataset.

Run the code:def _save_model(args, rank=0): if args.

model_prefix is None: return Nonereturn mx.

callback.

do_checkpoint(args.

model_prefix if rank == 0 else "%s-%d" % ( args.

model_prefix, rank))In order to run the model, we will use the fit function to call the model.

The add_fit_args functions add the hyperparameters needed for the fit function.

For each parameter, a help section is provided for users in the code:def add_fit_args(parser): """ parser : argparse.

ArgumentParser return a parser added with args required by fit """ train = parser.

add_argument_group('Training', 'model training') train.

add_argument('–network', type=str, help='the neural network to use') train.

add_argument('–num-layers', type=int, help='number of layers in the neural network, required by some networks such as resnet') train.

add_argument('–gpus', type=str, help='list of gpus to run, e.

g.

0 or 0,2,5.

empty means using cpu') train.

add_argument('–kv-store', type=str, default='device', help='key-value store type') train.

add_argument('–num-epochs', type=int, default=100, help='max num of epochs') train.

add_argument('–lr', type=float, default=0.

1, help='initial learning rate') train.

add_argument('–optimizer', type=str, default='sgd', help='the optimizer type') train.

add_argument('–mom', type=float, default=0.

9, help='momentum for sgd') train.

add_argument('–wd', type=float, default=0.

0001, help='weight decay for sgd') train.

add_argument('–batch-size', type=int, default=128, help='the batch size') train.

add_argument('–disp-batches', type=int, default=40, help='show progress for every n batches') train.

add_argument('–model-prefix', type=str, help='model prefix') parser.

add_argument('–monitor', dest='monitor', type=int, default=0, help='log network parameters every N iters if larger than 0') return traindef fit(args, network, data_loader, **kwargs): """ train a model args : argparse returns network : the symbol definition of the neural network data_loader : function that returns the train and val data iterators """ # kvstore kv = mx.

kvstore.

create(args.

kv_store) print("args kvstore is %s"%(kv))# logging head = '%(asctime)-15s Node[' + str(kv.

rank) + '] %(message)s' logging.

basicConfig(level=logging.

DEBUG, format=head) logging.

info('start with arguments %s', args)# data iterators (train, val) = data_loader(args, kv)# save model checkpoint = _save_model(args, kv.

rank)# devices for training devs = mx.

gpu(0)# create model model = mx.

mod.

Module( context = devs, symbol = network )optimizer_params = { 'learning_rate': args.

lr, 'momentum' : args.

mom, 'wd' : args.

wd}monitor = mx.

mon.

Monitor(args.

monitor, pattern=".

*") if args.

monitor > 0 else Noneif args.

network == 'alexnet': # AlexNet will not converge using Xavier initializer = mx.

init.

Normal() else: initializer = mx.

init.

Xavier( rnd_type='gaussian', factor_type="in", magnitude=2)# evaluation metrices eval_metrics = ['accuracy']# callbacks that run after each batch batch_end_callbacks = [mx.

callback.

Speedometer(args.

batch_size, args.

disp_batches)]# run model.

fit(train, num_epoch = args.

num_epochs, eval_data = val, eval_metric = eval_metrics, kvstore = kv, optimizer = args.

optimizer, optimizer_params = optimizer_params, initializer = initializer, batch_end_callback = batch_end_callbacks, epoch_end_callback = checkpoint, allow_missing = True, monitor = monitor)The network used for this mode is defined using the import_module function.

You will use a Residual Neural Network (Resnet) for this mode.

The code for Resnet is:logger = logging.

getLogger()if logger.

handlers: logger.

handlers[0].

close() logger.

handlers = []fhandler = logging.

FileHandler(filename='lab2.

log', mode='w')console = logging.

StreamHandler()# tell the handler to use this formatformatter = logging.

Formatter('%(asctime)s – %(message)s')fhandler.

setFormatter(formatter)console.

setFormatter(formatter)# add the handler to the root loggerlogger.

addHandler(fhandler)logger.

addHandler(console)console.

setLevel(logging.

DEBUG)logger.

setLevel(logging.

DEBUG)if __name__ == '__main__': # download data (train_fname, val_fname) = ('.

/cifar10_train.

rec','.

/cifar10_val.

rec')# parse args parser = argparse.

ArgumentParser(description="train cifar10", formatter_class=argparse.

ArgumentDefaultsHelpFormatter) add_fit_args(parser) add_data_args(parser) parser.

set_defaults( # network network = 'resnet', #Network Name num_layers = 50, #Number of layers in the network# data data_train = train_fname, # Training dataset data_val = val_fname, # Validation data num_classes = 10, # number of classes num_examples = 50000, image_shape = '3,28,28', pad_size = 4,# train batch_size = 128, num_epochs = 5, lr = .

01 ) args = parser.

parse_args('–model-prefix model'.

split())# load network from importlib import import_module net = import_module('symbols.

'+args.

network) sym = net.

get_symbol(**vars(args)) print(sym)model_prefix = 'mx_resnet' checkpoint = mx.

callback.

do_checkpoint(model_prefix) # train fit(args, sym, get_rec_iter, epoch_end_callback=checkpoint)The fit function will train your model on the CIFAR-10 dataset.

For each epoch, the training and validation accuracy is shown.

This accuracy is also logged into a file.

We may plot training accuracy and validation accuracy vs epoch curve:.. More details

Leave a Reply