An Apple patent points to possibly using Liquid Lens Camera Modules for future iPhones
Apple TV+ has released the trailers for a new Black Family comedy titled 'Government Cheese,' and an animated, music-driven comedy series

Apple has filed a patent for Vision-Based Hand Gesture Customization using advanced Machine Learning

1 cover custom gestures

In the last few months, a series of patents have shown that Apple has been continually working on advancing and perfecting hand gestures (01, 02 and 03). Today the U.S. Patent Office published yet another patent application from Apple's that relates to vision-based hand gesture customization.

Vision-Based Hand Gesture Customization

Apple notes that machine learning has seen a significant rise in popularity in recent years due to the availability of training data, and advances in more powerful and efficient computing hardware. Machine learning may utilize models that are executed to provide predictions in particular applications, for example, hand gesture recognition.

Hand gesture recognition can facilitate seamless and intuitive communication between humans and machines, with applications ranging from virtual reality to gaming and smart home control.

However, automatic recognition of hand gestures have presented challenges in supporting human-computer interaction applications across diverse domains. A need has arisen to move beyond the mere identification of predefined gestures, allowing users to define and personalize their own gestures by customization. This customization can yield numerous advantages, including enhanced memorability, increased efficiency, and broader inclusivity for individuals with specific needs, for example. Effectively enabling customization may demand an efficient and user-friendly data collection procedure while also addressing a challenge of learning from limited samples, referred to as Few-Shot Learning (FSL).

FSL presents a demanding task in which models may effectively synthesize prior knowledge with minimal new information to avoid overfitting. Various algorithms have been explored to address the challenges of FSL in gesture recognition, encompassing strategies such as transfer learning, fine-tuning, and augmenting few-shot data through various techniques. Nevertheless, the suitability of these strategies can be limited, particularly when the source gestures, upon which the model was initially trained, diverge significantly from the target gestures, involving a novel set of classes.

Furthermore, different types of data necessitate distinct augmentation approaches. For example, augmentation techniques suitable for images may not be appropriate for time-series sensor data. Generative modeling has encountered challenges, such as issues related to data hallucination, rendering them less reliable for data synthesis. Alternatively, aspects of meta-learning can address the challenges of FSL by enhancing models' capacity to effectively learn.

Embodiments of the subject technology address the challenges of FSL in gesture recognition by introducing a comprehensive framework for gesture customization based on meta-learning.

In contrast to other techniques that may support only limited types of gestures, embodiments of the subject technology provide for utilizing one or more imaging sensors, such as RGB cameras, and accommodates a wide spectrum of gestures, encompassing static, dynamic, single-handed, and two-handed gestures.

The subject technology enables customization with a single demonstration (e.g., by capturing a gesture over a sequence of frames). The subject technology incorporates graph transformers, transfer learning, and meta-learning techniques. In this regard, few-shot learning is facilitated through the utilization of a pre-trained graph transformer deep neural network, bolstered by the integration of both meta-learning and meta-augmentation techniques.

Implementations of the subject technology improve the ability of a given electronic device to provide sensor-based, machine-learning generated feedback to a user (e.g., a user of the given electronic device). 

2 Custom Gestures

3 custom gestures

Apple notes that in one or more implementations, the ML model #620 can learn distinct mappings for each gesture, emphasizing the unique spatial characteristics of each gesture. In this regard, the ML model can attend to different keypoint indices (e.g., 810 of FIG. 8C above) of a hand based on the gesture.

For full details, review Apple's patent application 20250078577.

Some of Apple's Inventors

  • Cori Park: ML Prototyping Engineer
  • Dr. Gierad Laput: Computer Scientist and a Senior Engineering Manager
  • Abdelkareem Bedri: ML Research Manager.
  • Runchang(Richard) Kang: Senior Research Engineer

 

10.51FX - Patent Application Bar