Chapter 1: Intro to AWS SageMaker

Chapter 1: Intro to AWS SageMakerGeorgios DrakosBlockedUnblockFollowFollowingApr 13SageMaker is a platform for developing and deploying ML models.

It promises to ease the process of training and deploying models to production at scale providing three very useful functionalities.

Build — Ability to spin the notebook instance and preprocess the data.

Train — Run the specified algorithm.

Deploy — Deploy the model as REST end point.

Table of Contents:Benefits of Cloud ComputingSageMaker simplified WorkFlowSageMaker OverviewAutomatic Model TuningAuto ScalingBonus CodeConclusionPhoto by Thomas Jensen on UnsplashBenefits of Cloud ComputingYou will encounter people in your job that will simply love cloud computing believing to its power, while at the same time others believing that is is nothing new; just a clever marketing tool.

The reality is that simplify a lot of tasks for the people responsible for developing and maintaining traditional IT Data Center i.

e restoring a database from its backup.

Below I list five advantages of cloud computing:Trade Capital Expenses for Variable Expense:Heavy upfront capital investment are required for traditional on premise datacenters.

With cloud, you pay as you go depending on the size of the resources and the time that you consume them receiving a detailed billing report which can help you optimise the costs.

2.

Benefit from Massive Economies of ScaleShared infrastructure used by hundreds of people resulting in better utilisation of datacenters and lower pay as you go prices.

3.

Stop Guessing about CapacityEliminating guessing regarding the infrastructure capacity needs.

With cloud you can scale up or down with only a few clicks (increased speed and agility)4.

Stop spending money running and maintaining data centers (avoid having two separate redundant physical separate data centers for avoiding infrastructure failure.

Companies can focus on projects that will differentiate them and not on infrastructure issues.

5.

Go Global in MinutesEasy deployment of application in multiple regions around the world.

Keep resources closer to customers for lower latency.

SageMaker simplified WorkFlowThe typical workflow for creating ML models involve many steps.

In this context, SageMaker aims to simplify this process.

In fact, by using SageMaker’s built-in algorithms, we can deploy our models with a simple line of code.

Worth mentioning that the whole process of training, evaluating and deploying the model is all done using Jupyter notebooks.

Image Credits: SageMaker website.

Available Machine Learning LibraryFor training, SageMaker offers many of the most popular built-in ML algorithms.

Some of them include K-Means, PCA, Sequence models, Linear Learners and XGBoost.

Plus, Amazon promises outstanding performance on these implementations.

Details of the algorithm can be found on the below link:https://docs.

aws.

amazon.

com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.

htmlMoreover, if you want to train a model using a third party library like Keras, SageMaker also gets you covered.

Indeed, it supports the most popular ML frameworks.

Some of them include:In addition you can also conda/pip install any desired python library to the cluster.

Finally, all the data has to be stored in an S3 bucket.

SageMaker OverviewDescribed with four key points:Managed Jupyter Notebook environment with pre-installed python packagesDistributed Training (SageMaker managed training infrastructure; automatic model tuning when using AWS algorithm)Deploy for Real Time Prediction (High performance & availability)Worth mentioning that during the training phase, Amazon launches ML compute instances and uses the training code and dataset to carry out the training process.

Then, it saves the final model artifacts and other output in a specified S3 bucket.

Note that we can take advantage of parallel training.

This can be done via instance parallelism.

When training finished it also creates a SageMaker model.

This model can be deployed to an endpoint with options regarding the number and type of instances at which to deploy the model.

Advice: As I have mentioned already SageMaker offers a variety of popular ML estimators while at the same time provide the possibility to take a pre-trained model and deploy it.

However, it is far more easier to use its built-in implementations.

The reason is that to deploy third-party models using the SageMaker’s APIs, one needs to deal with managing containers.

Automatic Model TuningPersonally, I believe that automatic model tuning is one of the best feture of SageMaker.

Data Scientists spend a significant proportion of their time tuning ML models utilising significant computational resources.

The reason is that the available techniques rely on brute-force methods like grid-search or Random Search.

Using Automatic Model Tuning, we can select a subset of possible optimisers, say Adam and/or SGD, and a few values for the learning rate.

Then, the engine will take care of the possible combinations and focus on the set of parameters that yields the best results.

Also, this process scales.

We can choose the number of jobs to run in parallel along with the maximum number of jobs to run.

After that, Auto Tuning will do the work.

This feature is supported also for third-party libraries and it comes at no extra charge.

Another significant feature is that you can either train a new model using the Amazon Cloud or use it to serve a pre-existing model.

In other words, you can take advantage of the serving part of SageMaker to deploy models that were trained outside it.

Bonus CodeFunctions which can be used to save csv or any other format file from Jupyter folder to S3 bucket and similarly to download from s3 bucket back to Jupyter folder.

Auto ScalingAs stated on the AWS website:AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost.

In short, SageMaker Auto Scaling makes it easier to build scaling plans for various resources across many services.

These services include Amazon EC2, Spot Fleets, Amazon ECS tasks, and more.

The idea is to adjust the number of running instances in response to changes in the workload.

It is important to note that Auto Scaling might fail in some situations.

More specific, when your application suffers some kind of spikes in traffic, Auto Scaling may not help at all.

We know that for new (EC2) instances, Amazon needs some time to set up and configure the machine before it is able to process requests.

This setup time might take from 5 to 7 minutes.

If your application has small spikes (let’s say 2 to 4 minutes) in the number of incoming requests, by the time the EC2 instance setup time finishes, the need for more computing power might be over.

To address this situation, Amazon implements a simple policy to scale new instances.

Basically, after a scaling decision takes place, a cool down period has to be satisfied before another scale activity occurs.

In other words, each action to issue a new instance is interleaved by a fixed (configurable) amount of time.

This mechanism aims to ease the overhead to launch a new machine.

ConclusionTo sum up I have listed below some of the best features of SageMaker:Ability to spin a Jupyter notebook instance and preprocess the data.

Auto-scaling and auto hyper-parameter tuning take away many of the boring tasks of ML.

Useful built-in algorithms where deployment is very straight-forward.

Supports third-party ML libraries.

Thanks for reading and I am looking forward to hear your questions :)Stay tuned and Happy Coding.

P.

S If you want to learn more of the world of machine learning/coding you can also follow me on Instagram, email me directly or find me on linkedin.

I’d love to hear from you.

Resources:https://docs.

aws.

amazon.

com/sagemaker/.. More details

Leave a Reply