Apple's Machine Learning Journal Issue 7 Published today covers On-Device Deep Neural Network for Face Detection
On October 18th Patently Apple posted a report titled "Apple Publishes another Issue in their Machine Learning Journal in a PR War with Google & Others for Next-Gen Developers." This morning Apple's Computer Vision Machine Learning Team published volume 1, issue 7 titled: An On-device Deep Neural Network for Face Detection.
The team notes that "Apple started using deep learning for face detection in iOS 10. With the release of the Vision framework, developers can now use this technology and many other computer vision algorithms in their apps. We faced significant challenges in developing the framework so that we could preserve user privacy and run efficiently on-device. This article discusses these challenges and describes the face detection algorithm.
Apple first released face detection in a public API in the Core Image framework through the CIDetector class. This API was also used internally by Apple apps, such as Photos. The earliest release of CIDetector used a method based on the Viola-Jones detection algorithm. We based subsequent improvements to CIDetector on advances in traditional computer vision.
With the advent of deep learning, and its application to computer vision problems, the state-of-the-art in face detection accuracy took an enormous leap forward. We had to completely rethink our approach so that we could take advantage of this paradigm shift. Compared to traditional computer vision, the learned models in deep learning require orders of magnitude more memory, much more disk storage, and more computational resources.
As capable as today's mobile phones are, the typical high-end mobile phone was not a viable platform for deep-learning vision models. Most of the industry got around this problem by providing deep-learning solutions through a cloud-based API. In a cloud-based solution, images are sent to a server for analysis using deep learning inference to detect faces. Cloud-based services typically use powerful desktop-class GPUs with large amounts of memory available. Very large network models, and potentially ensembles of large models, can run on the server side, allowing clients (which could be mobile phones) to take advantage of large deep learning architectures that would be impractical to run locally.
Apple's iCloud Photo Library is a cloud-based solution for photo and video storage. However, due to Apple's strong commitment to user privacy, we couldn't use iCloud servers for computer vision computations. Every photo and video sent to iCloud Photo Library is encrypted on the device before it is sent to cloud storage, and can only be decrypted by devices that are registered with the iCloud account. Therefore, to bring deep learning based computer vision solutions to our customers, we had to address directly the challenges of getting deep learning algorithms running on iPhone.
We faced several challenges. The deep-learning models need to be shipped as part of the operating system, taking up valuable NAND storage space. They also need to be loaded into RAM and require significant computational time on the GPU and/or CPU. Unlike cloud-based services, whose resources can be dedicated solely to a vision problem, on-device computation must take place while sharing these system resources with other running applications. Finally, the computation must be efficient enough to process a large Photos library in a reasonably short amount of time, but without significant power usage or thermal increase.
The rest of this article discusses our algorithmic approach to deep-learning-based face detection, and how we successfully met the challenges to achieve state-of-the-art accuracy. We discuss:
how we fully leverage our GPU and CPU (using BNNS and Metal)
memory optimizations for network inference, and image loading and caching
how we implemented the network in a way that did not interfere with the multitude of other simultaneous tasks expected of iPhone.
Figure 1. A revised DCN architecture for face detection
Figure 2. Face detection workflow
Apple's journal continues here with topics 1) Moving From Viola-Jones to Deep Learning, (2) Optimizing the Image Pipeline (3) Optimizing for On-device Performance and (4) Using Vision Framework.
About Making Comments on our Site: Patently Apple reserves the right to post, dismiss or edit any comments. Those using abusive language or negative behavior will result in being blacklisted on Disqus.