cashgift.blogg.se - Efficient processing of deep neural networks

Moreover, TensorPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. By pipelining different sub-sequences of layers on separate accelerators, TensorPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. To address the need for efficient and task-independent model parallelism, we introduce TensorPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. These solutions are often architecture-specific and do not transfer to other machine learning tasks.

In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. a systematic memory footprint analysis, revealing that feature maps are the major memory consumers in the DNN training process. AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental So IPUs have a very much smaller memory footprint than GPUs, small enough to fit on the processing chip even for large networks.