Article Hero Image
Apple researchers have hit on a new multi-modal method of quickly training large language models (LLMs) that can enable more flexible and powerful machine-learning and "AI" type systems.
A research paper posted by the company to research site arxiv.org earlier this week revealed that Apple has used what it calls a "careful mix" of image-caption, interleaved image-text, and text-only data to train LLMs. The mix of visual and language data allowed the models to handle tasks like intelligently captioning images or infer natural-language meanings.
As part of the research, it was determined that the choice of image encoder and the resolution of images it processes has a big impact on performance, more than the design of the vision-language connector.
In one instance, using a 30-billion-parameter MM1 model, it was found that there were strong in-context learning abilities. The discovery means it can perform multi-step reasoning over multiple images with few "chain of thought" prompts.
According to Venturebeat, Apple is continuing its tradition of being a "fast follower" rather than a "first mover" when it comes to groundbreaking technologies. CEO Tim Cook recently acknowledged that the company was spending $1 billion per year on incorporating "AI" into its existing technologies.
Cook said the company would be sharing "details of our ongoing work in AI later this year." Apple is expected to make some announcements about its advances at WWDC this June.
The company is both catching up to rivals in the use of AI-related technologies. It is also developing methods that would preserve user privacy while augmenting its existing machine-learning abilities.
The latter concern for privacy and security has not been a feature of existing "chatbot" type services, and increases the challenge for Apple.
Apple's interest in multi-model training of neural networks has resulted in state-of-the-art performance, allowing for multi-step reasoning. This suggests that the company has found a path for rapid advancement of machine-learning abilities as well as giving them advanced "intelligence" capabilities.