Distributed TensorFlow using Horovod

On the other, to the computational speed of the process.The accuracy is independent of the platform, and it is the performance metric to compare multiple models, whereas the computation speed depends on the platform on which the model is deployed and in this post we will measure it by metrics such as:Speedup: ratio of solution time for the sequential algorithms versus it parallel counterpart.Efficiency: ratio of speedup to the number of CPUs/GPUs or nodes.Scalability: efficiency as a function of an increasing number of CPUs/GPUs or nodes.These metrics will be highly dependent on the cluster configuration, the type of network used or the efficiency of the framework using the libraries and managing resources.1.3 Types of parallelismTo achieve the distribution of the training step, there are two principal implementations, and it will depend on the needs of the application to know which one will perform better, or even if a mix of both approaches can increase the performance.For example, different layers in a Deep Learning model may be trained in parallel on different GPUs..This training procedure is commonly known as Model parallelism..Another approach is Data parallelism, where we use the same model for every execution unit, but train the model in each computing device using different training samples.Data parallelismIn this mode, the training data is divided into multiple subsets, and each one of them is run on the same replicated model in a different node (worker nodes)..These will need to synchronize the model parameters (or its “gradients”) at the end of the batch computation to ensure they are training a consistent model (just as if the algorithm run on a single processor) because each device will independently compute the errors between its predictions for its training samples and the labeled outputs (correct values for those training samples)..Therefore, each device must send all of its changes to all of the models at all the other devices.One interesting property of this setting is that it will scale with the amount of data available and it speeds up the rate at which the entire dataset contributes to the optimization..Also, it requires less communication between nodes, as it benefits from high amount of computations per weight..On the other hand, the model has to entirely fit on each node, and it is mainly used for speeding computation of convolutional neural networks with large datasets.Model parallelismIn this case (also known as Network Parallelism), the model will be segmented into different parts that can run concurrently, and each one will run on the same data in different nodes..The scalability of this method depends on the degree of task parallelization of the algorithm, and it is more complex to implement than the previous one..It may decrease the communication needs, as workers need only to synchronize the shared parameters (usually once for each forward or backward-propagation step) and works well for GPUs in a single server that share a high-speed bus..It can be used with larger models as hardware constraints per node are no more a limitation.Due that, in general, the parallelization of the algorithm is more complex to implement than run the same model in a different node with a subset of data.In this post, we will focus on Data Parallelism approach.2..Concurrency in data parallelism trainingIn distributed environments, there may be multiple instances of Stochastic gradient descent (SGD) running independently, and thus the overall algorithm must be adapted and should consider different issues related with the model consistency or parameters distribution.2.1 Synchronous versus asynchronous distributed trainingStochastic gradient descent (SGD) is an iterative algorithm that involves multiple rounds of training, where the results of each round are incorporated into the model in preparation for the next round..The rounds can be run on multiple devices either synchronously or asynchronously.Each SGD iteration runs on a mini-batch of training samples.. More details

Leave a Reply