Explained: GPipe — Training Giant Neural Nets using Pipeline Parallelism

training a single machine learning model on multiple hardware devices..Some ML architectures, especially small models, are conducive to parallelism and can be divided quite easily between hardware devices, but in large models synchronization costs lead to degraded performance, preventing them from being used.A new paper from Google Brain, GPipe, presents a novel technique in model parallelism which allows training of large models on multiple hardware devices with an almost 1:1 improvement in performance (paper shows 3.5x processing power on 4x hardware)..The GPipe library, which will be open sourced, automatically analyzes the structure of a TensorFlow neural network model and delegates the training data and model onto multiple hardware devices, while applying a unique backpropagation optimization technique.GPipe helps ML models include significantly more parameters, allowing for better results in training..To demonstrate the effectiveness of the technique, the Google team created a larger version of AmoebaNet, the 2017 ImageNet winner, with larger images (480×480) as input, and achieved state-of-the-art (SOTA) results on ImageNet, CIFAR-10, CIFAR-100, and additional Computer Vision metrics.BackgroundParallelism in machine learning is commonly divided into two categories:Model Parallelism — When using model parallelism in training, the machine learning model is divided across K hardware devices, with each device holding a part of the model..Standard model parallelism allows to train larger neural networks but suffers from a large hit in performance since devices are constantly waiting for each other and only one can perform calculations at a given time.Data Parallelism — In data parallelism, the machine learning model is replicated across K hardware devices and a mini-batch of training samples is divided into K micro-batches..The phenomenon of communication overhead encourages to create very large mini-batches but these are often the wrong choice for training a network and can present inferior results in production.PipeDream: Fast and Efficient Pipeline Parallel DNN TrainingHow GPipe worksGPipe uses both model and data parallelism, a combination commonly known as ‘pipelining’.. More details

Leave a Reply