Project Astra is the latest AI prototype from DeepMind, Google’s AI division focused on artificial general intelligence (AGI). Presented at Google I/O 2024, the Project Astra demo showed groundbreaking technology for the future of AI assistants. Although the video showing the prototypes’ abilities was short, it was impressive and garnered a positive response from developers.
The demo shows two continuous takes, showing that Project Astra’s responses aren’t cherry-picked and the prototype can handle a range of tasks and questions. One take is on a Google Pixel phone, and the other on a prototype glasses device. Project Astra takes in a constant stream of audio and video input, interprets things about its environment in real time, and interacts with the user in a conversational manner.
What does Project Astra do?
Project Astra is an AI-powered universal assistant that enhances users’ interactions with their phones or other devices. Project Astra goes beyond the capabilities of current AI assistant models. Its heavily multimodal-based input takes in speech and video. It continuously encodes the video frames, combines them with the speech, and orders them in a timeline of events. Caching this data provides efficient recall and greater context in a human-like conversational flow.
The aim is for Astra to comprehend the context of real-world environments while responding to user commands rather than focusing on an individual question. Remembering what else is around you and what else you’ve asked about creates a natural feeling of interaction. For it to feel natural, latency must be low. Although there was a noticeable delay in the demo, it delivered responses with intelligence and speed.
This can be impressive when holding your phone’s camera to show Astra something. Imagine what it could do in an AR wearable like Google Glass. By remembering what you’ve seen, Astra can find your lost keys when you’re scrambling to get out the door. Gathering and storing visual data, combined with the power of real-time multimodal analysis, looks like the next stage of AI.
Related
Project Astra is the Google Glass we deserve
Seeing the glasses in action has left us both nostalgic and hopeful
Processing in multiple dimensions with multimodal AI
An impressive breakthrough of Project Astra is its ability to handle multimodal inputs seamlessly. The current state of AI typically relies on one type of input at a time. Astra integrates data from visual and auditory sources simultaneously, contextualizing it with your surrounding environment. This could eliminate the need to give a more detailed description than you would with a human because Astra knows what you’re looking at and sees what you see.
Astra’s visual recognition capabilities stand out in the demo video, but audio and video aren’t the only inputs. The video opens with the user asking Astra, “Tell me something that makes a sound,” as they use their phone’s camera to scan an office work environment. When a speaker monitor comes into view, Astra recognizes it. Bringing the phone’s camera closer to the speaker, the user draws an arrow pointing to one of the two circles on the speaker and asks what it’s called. Astra correctly identifies that part of the speaker as the tweeter that produces high-frequency sounds.
Astra’s memory capabilities are more than input recall
As they walk past the office desk, you can see a pair of glasses on the desk. They point their camera out the window and ask what neighborhood they are in. From that limited data, Astra recognizes where they are. Then, we see Astra’s visual recall capabilities when asked where the user left their glasses. Remembering what they had seen earlier but wasn’t mentioned, Astra says the glasses are on the office desk and adds that they are near a red apple to make them easier to find.
While Astra is still in the prototype phase and phone memory is limited, Astra’s recall is short-term and likely session-based. When persistent memory becomes possible and more integrated into AI assistants, these memory functions could look back through previous sessions. This likely cloud-based feature could lead to highly personalized AI experiences, where Astra learns about your ongoing projects, personal preferences, and personality.
Related
The real problem we have with the Pixel 9 Pro’s AI-only RAM
It’s fine for now, but nothing is forever
More of Project Astra in action
Astra’s versatility was demonstrated by showing a multitude of real-world assistance tasks. The examples were creative and well-thought-out. Pointing the camera at a cup of colored pencils and asking Astra for an alliteration about them showed its language abilities. Unlike many AI responses, the alliteration wasn’t half bad when trying to get creative output using natural language prompts.
Asking Astra what a part of the developer’s code displayed on a computer monitor in the office elicited a correct response. The user then switched to the glasses device prototype and looked at a diagram on a whiteboard that appeared to be for a Network Load Balancing (NLD) system. They drew an arrow between the drawing of the servers and the database, asking what could be added to make the system faster. The response that adding a cache could improve speed was impressive, based only on the visual input of a hand-drawn diagram.
Injecting a little humor, next was a simple drawing of two cats’ faces, one with red X’s for eyes. Holding up a small cardboard box with a question mark, Astra was asked, “What does this remind you of?” The response was Schrödinger’s Cat, a thought experiment devised by the Austrian physicist Erwin Schrödinger. This experiment illustrates a quantum paradox wherein a cat can be considered dead and alive simultaneously because its fate is left to a future event that might not occur.
The demo ended with a tiger plushie held up next to a real golden retriever dog. Astra was asked for a band name that suited the two of them. The answer was Golden Stripes, which, like the alliteration from earlier, was a good response. Project Astra’s multimodal nature increases its output.
Cloud-based processing is currently powering Astra’s intelligence
The keynote shows that Google’s highly optimized Tensor Processing Units (TPUs) run Project Astra. Astra does not run on the device. Google has the lead in hardware technology when it comes to processing large language models (LLMs). Fully trained AI models are typically smaller. It seems Google is hinting that it will eventually run on mobile devices.
That wouldn’t be surprising, as Google’s mobile SoC TPUs are powerful, and each generation has been more than an incremental improvement. However, we don’t know enough about the direction of this early prototype. Astra could introduce latency issues after a public release if it relies on the cloud and constant internet connectivity.
Related
The future of AI assistants
While Project Astra is still in its early stages, and artificial intelligence development is moving at breakneck speeds, it seems Google is the first to achieve a useful AI assistant. With its ability to process many information sources in real time, it could become an everyday tool for mobile users. The technology could be extended to smart homes, educational settings, and creative projects.
Looking ahead, Google plans to incorporate elements of Astra into its Gemini app, potentially giving us a hands-on opportunity. This shift towards a natural and responsive interaction with artificial intelligence, along with real-world context awareness, is a welcome change. Google Gemini has grown leaps and bounds since its early days as Bard. With technology as innovative as Project Astra, we’ll soon see some of its features on our Android devices.
GIPHY App Key not set. Please check settings