Apple Reveals a Multi-Mode Planar Engine for a Neural Processor that could be used in A-Series & screamingly fast M1 Processors
Back in 2017, Apple introduced the A11 which included their first dedicated neural network hardware that Apple calls a "Neural Engine." At the time, Apple's neural network hardware was able to perform up to 600 billion operations per second and used for Face ID, Animoji and other machine learning tasks. The neural engine allows Apple to implement neural network and machine learning in a more energy-efficient manner than using either the main CPU or the GPU. Today Apple's Neural Engine has advanced to their new M1 processor that delivers 15X faster machine learning performance of the Neural Engine, according to Apple.
Apple revealed back in Q4 that the "M1 features their latest Neural Engine. Its 16‑core design is capable of executing a massive 11 trillion operations per second. In fact, with a powerful 8‑core GPU, machine learning accelerators and the Neural Engine, the entire M1 chip is designed to excel at machine learning." There's an excellent chance that today's patent covers technology built into the M1 processor to help it achieve its breakthrough performance. While the patent was published today, it was filed in Q4 2019 before the M1 surfaced.
Today, the U.S. Patent Office published a patent application from Apple titled "Multi-Mode Planar Engine for Neural Processor." Apple's invention relates to a circuit for performing operations related to neural networks, and more specifically to a neural processor that include a plurality of neural engine circuits and one or more multi-mode planar engine circuits.
An artificial neural network (ANN) is a computing system or model that uses a collection of connected nodes to process input data. The ANN is typically organized into layers where different layers perform different types of transformation on their input. Extensions or variants of ANN such as convolution neural network (CNN), recurrent neural networks (RNN) and deep belief networks (DBN) have come to receive much attention. These computing systems or models often involve extensive computing operations including multiplication and accumulation. For example, CNN is a class of machine learning technique that primarily uses convolution between input data and kernel data, which can be decomposed into multiplication and accumulation operations.
Depending on the types of input data and operations to be performed, these machine learning systems or models can be configured differently. Such varying configuration would include, for example, pre-processing operations, the number of channels in input data, kernel data to be used, non-linear function to be applied to convolution result, and applying of various post-processing operations. Using a central processing unit (CPU) and its main memory to instantiate and execute machine learning systems or models of various configuration is relatively easy because such systems or models can be instantiated with mere updates to code. However, relying solely on the CPU for various operations of these machine learning systems or models would consume significant bandwidth of a central processing unit (CPU) as well as increase the overall power consumption.
Apple's invention specifically relates to a neural processor that includes a plurality of neural engine circuits and a planar engine circuit operable in multiple modes and coupled to the plurality of neural engine circuits.
At least one of the neural engine circuits performs a convolution operation of first input data with one or more kernels to generate a first output. The planar engine circuit generates a second output from a second input data that corresponds to the first output or corresponds to a version of input data of the neural processor.
The input data of the neural processor may be data received from a source external to the neural processor, or outputs of the neural engine circuits or planar engine circuit in a previous cycle. In a pooling mode, the planar engine circuit reduces the spatial size of a version of second input data. In an elementwise mode, the planar engine circuit performs an elementwise operation on the second input data. In a reduction mode, the planar engine circuit reduces the rank of a tensor.
Apple's patent FIG. 3 below is a block diagram illustrating a neural processor circuit; FIGS. 6A, 6B, and 6C are conceptual diagrams respectively illustrating a pooling operation, an elementwise operation, and a reduction operation.
To review the deeper details, review Apple's patent application 20210103803.
Considering that this is a patent application, the timing of such a product to market is unknown at this time.