System & Language Agnostic Hyperparameter Optimization at Scale

Maybe.

But imagine a microservice which can optimize any ML model because it is invariant to language, infrastructure, and result/model storage desires.

At Capital One, I am proud to have helped build a cloud-based, system- and language-agnostic hyperparameter optimization framework that has helped us achieve state-of-the-art results.

Before getting into the how, it is important to understand the purpose as well as the considerations that went into its development within the Capital One development and deployment environments on AWS.

What is Hyperparameter Optimization?During the training phase of a ML model such as image object classification, we are iteratively optimizing the model’s parameters (weights and biases) to meet an expectation.

Such parameters are called model parameters, which are separate from a different class of parameters called hyperparameters, the focus of this article.

In the most basic sense:Hyperparameters — those which alter the manner in which a model learns to do its task (i.

e.

optimization method, batch normalization, etc.

) or the model’s learning capabilities (the structure of the model itself, i.

e the number of hidden layers, size of the layers, etc.

).

Model parameters — those which are iteratively learned while training the model (model weights and biases).

For example, if I want to teach an animal a new trick, some of the hyperparameters could be: animal type/breed/age; how I train the animal; treats used during training; number of training sessions, etc.

because these either affect their brain structure or learning process; however, the process of teaching the animal is what optimizes the animal’s brain, the model parameters.

Figure 1: Neural network with labeled hyperparameters and model parameters.

Model parameter weights, wij, and biases, bi, are contained within the hidden layers of the neural network.

The hyperparameters number of hidden layers, layer size, and node connection patterns control the architecture of this neural network.

Ultimately, the purpose of hyperparameter optimization is to tune the training methodology and model architecture until the model achieves its best possible performance given available training, development, and test data.

Importance of ScalingDepending on the optimization technique driving the hyperparameter tuning, multiple sets of hyperparameters can be selected concurrently.

Each set of hyperparameters for a given model can be trained independently, making model training an extremely parallelizable process.

For instance, each hyperparameter set from either grid or random search (non-learning optimizer) is determined independently from one another, whereas bayesian optimization (learning optimizer) requires the previous results to determine the next best hyperparameter set.

Hence, non-learning optimizers can be easily parallelized while learning optimizers may require a few tricks.

Importance of AgnosticismThe most advanced techniques for model development and optimization change quite rapidly.

Therefore, the ideal framework must be modular in order to easily swap to the newest capabilities.

For hyperparameter optimization, this requires three primary agnostic components:Programming languageOptimization libraryModel management serviceAdditionally, for portability or security, there are the following three secondary agnostic requirements:1.

Infrastructure2.

Deployability/containerization3.

Configurable/controllable via RESTful APIsHyperparameter Optimization at Capital OneLocal Development on a LaptopToday’s ML capabilities make it fairly straightforward for the average user to download the latest data science packages (i.

e.

scikit-learn, etc.

) and begin developing models locally.

However, it is fairly unlikely for the model to perform optimally, and local hyperparameter optimization may not be scalable depending on the size of the model being optimized, the size of the hyperparameter space being evaluated, the optimization technique being applied, and/or the model training data.

These limitations derive from the local bounds on memory and CPU speed/parallelization of the hyperparameter tasks.

Manually ControlledCloud OptimizationCapital One is heavily invested in cloud infrastructure, which means spinning up VMs or containers to parallelize the hyperparameters tasks is relatively easy.

These can be controlled all within the cloud or locally via remote access.

However, managing results and deploying new hyperparameter searches to each container/VM requires an extensive DevOps background.

Additionally, security limitations may not allow such communication to occur.

Automated Cloud OptimizationFigure 2: Communication diagram of the hyperparameter optimization framework.

The hyperparameter microservice is constantly determining new hyperparameter sets from the optimization service, sending individual hyperparameter sets to each parameter testing node, and sending the results to the model management service.

Here is where things get exciting.

Instead of managing the optimization ourselves, why not let a service handle it?.I was fortunate enough to build such a product and it has only two requirements to begin training:A GitHub repo with your modelA short JSON configuration script.

Below is an example script sent to this hyperparameter tuning microservice:“`{ “code_github”: “https://github.

com/.

/optimizer_example", “command”: “python model_script.

py”, # This could be running java, c++, etc.

“dir”: “path/in/repo/to/script”, “install_script”: “install.

sh”, # pre installation script o setup the environment “primary_result_key”: “accuracy”, # what we are optimizing “backend”: “aws”, # service we are deploying our infrastructure within “resources”: { “instance_count”: 10, “instance_type”: “p3.

8xlarge” }, “optimizer”: { # In here we specify the optimizer service type and any optimizer setup specifics }, “search”: { # In here, we add the configuration sent to the optimizer service # For instance, with SVM we could search kernel and penalty }, “model_storage”:{ # where we store our model after running a hyperparameter set “type”: “s3”, “bucket”: “<YOUR_S3_BUCKET_S3://>” “model_dir”: “saved_model”, “clean_model_dir_flag”: true }}“`Once the script is received by the hyperparameter mircoservice, it will spin up parameter nodes testing each set of parameters it gets from an optimizer service until it receives a stop command or no more search parameters from the optimizer.

As hyperparameter sets are being evaluated, completed training results are simultaneously sent to the optimizer and model management service, which can store both model and results for later retrieval.

Figure 3: Front-end of the hyperparameter optimization microservice.

Once the configuration script is sent to the microservice, the user can submit the configuration for optimization and track its completion status.

This hyperparameter optimization framework makes it easy to:Submit models to be tuned using the engineer’s desired optimization techniquesStore the trained modelsStore the associated hyperparameters, results, and source of the datasetWhile the first two points are necessary/useful, it is the model training history which is critical for AutoML, which I’ll address next.

Importance and Application to AutoMLUltimately, the goal is to allow the services to build models with limited human effort.

Simply point to data and allow the service to generate the model automatically (AutoML).

However, without a knowledge set of what are unsuccessful/successful model architectures, new models would have to be built from scratch, without using the knowledge gained from building prior models.

Instead, an AutoML service can be built upon a catalog of previously attempted architectures and results (as mentioned above) to predict what architectures may work best based on previous searches.

In other words, this allows a model which is trained on the knowledge set of previous architectures and attempts to predict our next steps in architecture search, based on all previous history rather than just that done while training the current model.

Current and Future ApplicationsFor the most part, this microservice has mostly been applied to the tuning of deep learning models.

One use case includes the models mentioned by my colleague in his post, “Why You Don’t Necessarily Need Data for Data Science.

” These models, especially GANs, can be brittle in nature due to the dependencies between key/value pairs in data.

Utilizing this service led to an increased success in both structured and unstructured synthetic data generation by as much as 30%, and also reduced manual model manipulation time from weeks/months to less than a day.

Figure 4: Example 2D mapping of accuracy in relation to batch size and number of time steps for an RNN when utilizing the hyperparameter microservice.

For this case, A larger batch size increased accuracy; however, increasing the amount of time steps decreased accuracy.

As I mentioned,the hope is to be able to map the hyperparameter space for a given model and use said information to be able to predict ideal architectures for future models.

In order to build said predictor, a plethora of data would be required to make such a prediction, but with the simplicity of challenger model generation and storage of each hyperparameter iteration’s results, model, and metadata, this possibility is becoming reality.

Data Innovation team at Capital One’s University of Illinois, Urbana-Champaign lab.

Thanks for the edits: Anh Truong, Mark Watson, Reza Farivar and Austin Walters.

These opinions are those of the author.

Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by any of the companies mentioned.

All trademarks and other intellectual property used or displayed are the ownership of their respective owners.

This article is © 2019 Capital One.

.

. More details

Leave a Reply