Neural Networks with push button, AI for allNeural Architecture Search with NASBench from Google Research— Can we design network architectures automatically, instead of relying on expert experience and knowledge?Sharmistha ChatterjeeBlockedUnblockFollowFollowingJun 25MotivationRecent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes restrictions to researchers who do not have access to large-scale computation.
Neural architecture search (NAS) was introduced for automating the design of artificial neural networks (ANN)The search space defines the type(s) of ANN that can be designed and optimized.
The search strategy defines the approach used to explore the search space.
The performance estimation strategy evaluates the performance of a possible ANN from its design (without constructing and training it).
Jeff Dean’s slide showing that neural architecture search can try 20 different models to find the most accurateThis blog is structured as follows:State NAS and its benefits and applications in different domains.
Summarize Google research paper NASBench -101 with simple code example.
Cite an example of efficient neural network search with pytorch.
Neural Architecture Search (NAS) ApplicationsDesigning discriminative deep learning models such as image classification, object detection, and semantic segmentation.
Searching generative models, or specifically, auto-encoder based universal style transfer, which lacks systematic exploration.
Combining abstract representations of different input types (i.
e.
, RGB and optical flow) and resolutions for video based CNN architectures.
This helps to allow different types or sources of information to interact with each other, by enhancing a population of overly-connected architectures with their connection weight learning.
Evolution of NASBench-101In the field of image classification, research has produced numerous ways of combining neural network layers into unique architectures, such as Inception modules, Residual connections, or Dense connections.
This led research to investigate more into neural architecture search (NAS), and discovery of new architectures as an optimization problem, not only to the domain of image classification, but also to other domains like sequence modeling.
Due to large training time and different training procedures with different search spaces, NASBench-101, the first architecture-dataset for NAS was introduced by Google research to serve the above purposes byExploiting graph isomorphisms for identifying 423k unique convolutional architectures.
Listing their evaluation metrics including run time and accuracy.
Easily evaluating the quality of a diverse range of models in milliseconds by querying the precomputed dataset.
NASBench typically consists of a tabular dataset which maps convolutional neural network architectures to their trained and evaluated performance on CIFAR-10.
All networks share the same network “skeleton”, with the only change of neural network operations linked in an arbitrary graph-like structure.
The 3 figures in the below represent the following from left to right:Directed acyclic graphs with up to 9 vertices and 7 edges, operations at each vertex are “3×3 convolution”, “1×1 convolution”, and “3×3 max-pooling.
Inception-like cell within the dataset.
High-level overview of the interior filter counts of each module.
ArchitectureNASBench architectural choices strongly depends on the performance of the network.
For each chosen layer of the network, the algorithm samples the architecture on that latency-prediction model.
It is found that replacing a 3 × 3 convolution with a 1 × 1 convolution or 3 × 3 max-pooling operation generally leads to a drop in absolute final validation accuracy by 1.
16% and 1.
99%, respectively.
This is also reflected in the relative change in training time, which decreases by 14.
11% and 9.
84%.
Even though 3 × 3 max-pooling is parameter-free, it appears to be on average 5.
04% more expensive in training time than 1 × 1 convolution and also has an average absolute validation accuracy 0.
81% lower.
Selection of the right network depends on evaluating the accuracy in terms of depth vs width of the neural network.
The training time of networks increases as networks get deeper and wider with one exception: width 1 networks are the most expensive.
This is due to the fact, all width 1 networks are simple feed-forward networks with no branching, and thus the activation maps are never split via their channel dimension.
NAS aims to find best neural network architectures using HPO (Hyper-parameter Optimization) that involves finding the best set of robust training hyper-parameters by performing a coarse grid search.
HPO operates by tuning various numerical neural network training parameters (e.
g.
, learning rate) as well as categorical choices (e.
g.
, optimizer type) to optimize the training process.
NASBench uses meta-architecture as a part of its design to evaluate how many cells shall be used and how should they be connected to build the actual model.
The below figure is an example of NAS in-built hand-crafted architecture such as skip connections, allow to build complex, multi-branch networks.
This figure is composed of : Left-> Two different cells, e.
g.
, a normal cell (top) and a reduction cell (bottom), Right-> an architecture built by stacking the cells sequentially.
The cells can also be combined in a more complex manner, such as in multi-branch spaces, by simply replacing layers with cells.
Source: https://www.
ml4aad.
org/wp-content/uploads/2018/07/automl_book_draft_neural_architecture_search.
pdfLocalityNASBench exhibits locality, a property by which architectures that are “close by” tend to have similar performance metrics.
The “closeness” is defined in terms of edit-distance: the smallest number of changes required to turn one architecture into another; one change entails flipping the operation at a vertex or the presence/absence of an edge.
Locality is also measured by the random-walk autocorrelation (RWA), defined as the autocorrelation of the accuracies of points visited as we perform a long walk of random changes through the space.
The RWA shows high correlations for lower distances, indicating locality.
The correlations become indistinguishable from noise beyond a distance of about 6.
Metrics for EvaluationThe structure and connectivity of a neural network can be typically specified by a variable-length string, where a recurrent neural network (RNN) — the controller can be used to generate such string.
Training the network specified by the string — the “child network” — on the real data will result in an accuracy on a validation set.
Using this accuracy as the reward signal, we can compute the policy gradient to update the controller.
As a result, in the next iteration, the controller will give higher probabilities to architectures that receive high accuracies.
The helps the controller to learn and improve its search over time, resulting in an architecture that runs as quickly as possible.
The improvement can be measured by training accuracy, validation accuracy, testing accuracy, training time in seconds and number of trainable model parameters.
ImplementationNASBench — by Google Researchfrom absl import appfrom nasbench import api# Replace this string with the path to the downloaded nasbench.
tfrecord before# executing.
NASBENCH_TFRECORD = '.
/.
/nasbench_full.
tfrecord'INPUT = 'input'OUTPUT = 'output'CONV1X1 = 'conv1x1-bn-relu'CONV3X3 = 'conv3x3-bn-relu'MAXPOOL3X3 = 'maxpool3x3'def execute_nasbench(argv): del argv # Unused # Load the data from file (this will take some time) nasbench = api.
NASBench(NASBENCH_TFRECORD) # Create an Inception-like module (5×5 convolution replaced with two 3×3 # convolutions).
model_spec = api.
ModelSpec( # Adjacency matrix of the module matrix=[[0, 1, 1, 1, 0, 1, 0], # input layer [0, 0, 0, 0, 0, 0, 1], # 1×1 conv [0, 0, 0, 0, 0, 0, 1], # 3×3 conv [0, 0, 0, 0, 1, 0, 0], # 5×5 conv (replaced by two 3×3's) [0, 0, 0, 0, 0, 0, 1], # 5×5 conv (replaced by two 3×3's) [0, 0, 0, 0, 0, 0, 1], # 3×3 max-pool [0, 0, 0, 0, 0, 0, 0]], # output layer # Operations at the vertices of the module, matches order of matrix ops=[INPUT, CONV1X1, CONV3X3, CONV3X3, CONV3X3, MAXPOOL3X3, OUTPUT]) # Query this model from dataset, returns a dictionary containing the metrics # associated with this model.
print('Querying an Inception-like model.
') data = nasbench.
query(model_spec) print(data) print(nasbench.
get_budget_counters()) # prints (total time, total epochs) # Get all metrics (all epoch lengths, all repeats) associated with this # model_spec.
This should be used for dataset analysis and NOT for # benchmarking algorithms (does not increment budget counters).
print('.Getting all metrics for the same Inception-like model.
') fixed_metrics, computed_metrics = nasbench.
get_metrics_from_spec(model_spec) print(fixed_metrics) for epochs in nasbench.
valid_epochs: for repeat_index in range(len(computed_metrics[epochs])): data_point = computed_metrics[epochs][repeat_index] print('Epochs trained %d, repeat number: %d' % (epochs, repeat_index + 1)) print(data_point) # Iterate through unique models in the dataset.
Models are unqiuely identified # by a hash.
print('.Iterating over unique models in the dataset.
') for unique_hash in nasbench.
hash_iterator(): fixed_metrics, computed_metrics = nasbench.
get_metrics_from_hash( unique_hash) print(fixed_metrics) # For demo purposes, break here instead of iterating through whole set.
breakSample Response:Loaded dataset in 381 secondsGetting all metrics for the same Inception-like model.
{‘module_adjacency’: array([[0, 1, 1, 1, 1, 0, 0], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0]], dtype=int8), ‘module_operations’: [‘input’, ‘conv3x3-bn-relu’, ‘conv1x1-bn-relu’, ‘maxpool3x3’, ‘conv3x3-bn-relu’, ‘conv3x3-bn-relu’, ‘output’], ‘trainable_parameters’: 2694282}Epochs trained 108, repeat number: 1{‘halfway_training_time’: 577.
0859985351562, ‘final_validation_accuracy’: 0.
9376001358032227, ‘halfway_validation_accuracy’: 0.
825120210647583, ‘final_train_accuracy’: 1.
0, ‘halfway_test_accuracy’: 0.
8106971383094788, ‘halfway_train_accuracy’: 0.
875901460647583, ‘final_training_time’: 1155.
85302734375, ‘final_test_accuracy’: 0.
9311898946762085}.
.
.
Iterating over unique models in the dataset.
{‘module_adjacency’: array([[0, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 1, 1, 0], [0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0]], dtype=int8), ‘module_operations’: [‘input’, ‘conv1x1-bn-relu’, ‘conv3x3-bn-relu’, ‘conv3x3-bn-relu’, ‘maxpool3x3’, ‘conv1x1-bn-relu’, ‘output’], ‘trainable_parameters’: 22421642}Efficient Neural Network Search using Pytorchconda install graphvizpip install -r requirements.
txt#To train ENAS to discover a recurrent cell for RNNpython main.
py –network_type rnn –dataset ptb –controller_optim adam –controller_lr 0.
00035 –shared_optim sgd –shared_lr 20.
0 –entropy_coeff 0.
0001python main.
py –network_type rnn –dataset wikitext#To train ENAS to discover CNN architecturepython main.
py –network_type cnn –dataset cifar –controller_optim momentum –controller_lr_cosine=True –controller_lr_max 0.
05 –controller_lr_min 0.
0001 –entropy_coeff 0.
1#To generate gif image of generated samples:python generate_gif.
py –model_name=ptb_2018-02-15_11-20-02 –output=sample.
gifConclusionMIT researchers develop NAS algorithm, as an objective to directly learn specialized neural networks for target hardware platforms — when run on a massive datasets — in only 200 GPU hours.
NAS uses Network embedding to encode an existing network to a trainable embedding vector.
The embedding helps the controller network to generate transformations of the target network.
A multi-objective reward function considers network accuracy, computational resource and training time, that are pre-trained or co-trained with the controller network via policy gradient.
The resulting network is evaluated by both an accuracy network and a training time network.
The results are combined by a reward engine that passes its output back to the controller network.
NAS-Bench-101 developed by Google research represents a benchmark for neural architecture search with the following characteristics:Inexpensive to evaluate, that helps to rigorously compare various algorithms and benchmark a range of architecture optimization algorithms.
Analyze the properties of an exhaustively evaluated set of convolutional neural architectures.
Yields better style-transferred images with details preserving, using a tiny number of operators/parameters, with 500x inference time speed-up.
Referenceshttps://en.
wikipedia.
org/wiki/Neural_architecture_searchhttps://arxiv.
org/pdf/1902.
09635.
pdfhttp://news.
mit.
edu/2019/convolutional-neural-network-automation-0321https://github.
com/carpedm20/ENAS-pytorchhttps://cs.
jhu.
edu/~cxliu/slides/pnas-talk-eccv.
pdf.