Advertisement
Singapore markets closed
  • Straits Times Index

    3,332.80
    -10.55 (-0.32%)
     
  • Nikkei

    39,583.08
    +241.54 (+0.61%)
     
  • Hang Seng

    17,718.61
    +2.14 (+0.01%)
     
  • FTSE 100

    8,223.62
    +43.94 (+0.54%)
     
  • Bitcoin USD

    61,436.55
    +294.96 (+0.48%)
     
  • CMC Crypto 200

    1,281.87
    -1.96 (-0.15%)
     
  • S&P 500

    5,482.87
    +4.97 (+0.09%)
     
  • Dow

    39,164.06
    +36.26 (+0.09%)
     
  • Nasdaq

    17,858.68
    +53.53 (+0.30%)
     
  • Gold

    2,341.50
    +4.90 (+0.21%)
     
  • Crude Oil

    82.41
    +0.67 (+0.82%)
     
  • 10-Yr Bond

    4.2880
    -0.0280 (-0.65%)
     
  • FTSE Bursa Malaysia

    1,590.09
    +5.15 (+0.32%)
     
  • Jakarta Composite Index

    7,063.58
    +95.63 (+1.37%)
     
  • PSE Index

    6,411.91
    +21.33 (+0.33%)
     

OpenAI and Google lean in to AI personal assistants. Is this AI’s killer app?

Patrick T. Fallon—AFP/Getty Images

Hello and welcome to Eye on AI.

The big news in AI this week are the dueling product announcements from OpenAI and Google.

OpenAI has consistently tried to steal the news cycle from rivals by jumping out in front of their big product reveals with its own product releases, and this week was no different. The AI startup had built expectations around yesterday’s announcement so high—with rampant speculation that OpenAI would debut GPT-5 or a generative AI search engine—that CEO Sam Altman took to social media platform X on Friday to disabuse people of those ideas, while still trying to build excitement for Monday’s product reveal.

What the company did announce was a souped-up version of GPT-4 called GPT-4o—the “o” stands for omni—that is designed to act as a personal assistant on a phone or tablet, with improved voice interaction, the ability to interpret and reason about pictures from a device’s camera, more capable language translation, and much faster response times. The assistant, with a default female voice, is apparently explicitly modeled on the digital assistant in the 2013 Spike Jonze movie Her.

ADVERTISEMENT

OpenAI may have misplayed the expectations game a bit since compared to the hype it drummed up, many viewers of its livestream event seemed underwhelmed by the announcement. (To combat this, Altman and OpenAI also published blog posts as well as short videos showcasing a variety of use cases for the new model.)

The technological innovations behind GPT-4o are impressive. The model is natively multimodal—trained to take in voice and then produce voice, for example—as opposed to taking in the user’s voice, turning it into text that is fed to GPT-4 to create a prompt, and then feeding the resulting output to a text-to-speech model to produce a voice response. This speeds up the entire cycle. OpenAI has also impressively shrunk the number of tokens—segments of data that the model processes (in the case of English text, a token is usually equal to a word and a half)—the model requires to perform a task. This also makes the model considerably faster and cheaper to run than GPT-4 Turbo, OpenAI's previous best model. This, in turn, has enabled OpenAI to make GPT-4o available for free to all ChatGPT users, as well as to offer enterprise customers and developers use of the model through OpenAI’s API for half the cost of GPT-4 Turbo.

Then today, at Google’s I/O developer conference, the search giant announced a raft of new AI features and upcoming product releases, from the integration of generative AI capsule answers into its main search engine, a way to query the photos saved on Google Photos, and improvements to its Gemini chatbot. As my colleague Sharon Goldman, who is at I/O, relays, Google’s version of the AI personal assistant is being developed through what it’s calling “Project Astra,” with capabilities the company said will come to Google products, like the Gemini app, later this year. Demo videos that the company emphasized were done live in one take showed someone using a smartphone camera to show the AI what was around them. While right now OpenAI's GPT-4o can only process still images, Astra can handle video. In addition, Google also unveiled improvements to its already very capable Gemini 1.5 Pro model so that it can have more natural-sounding, longer dialogues, better understanding of audio and images, more logical reasoning and planning capabilities, and better computer code generation.

This is the sort of AI software that Google teased in December with a canned demonstration that was panned by reporters for being misleading about AI model Gemini’s video processing capabilities. Well, now Google is saying it has these capabilities for real. The company has also announced a doubling of the context window—how much data its models can process—for Gemini 1.5 Pro to 2 million tokens. That means the model can take in many books’ worth of text or the video equivalent of a feature film. Larger context windows don’t just allow the models to process more information, they also tend to reduce the model’s tendency to hallucinate (i.e. provide plausible but inaccurate outputs). Google also teased a future AI “agent” model that will be able to perform actions for users—such as booking movie tickets and flights—not simply generate text.

There are a few things to say about these announcements from OpenAI and Google. One is that they clearly put Apple and Amazon on the back foot. They need to upgrade Siri and Alexa to match these new rival capabilities or those products will be in trouble. We know both companies are working on it, and Amazon has Anthropic’s powerful Claude AI models to use. Apple is by all accounts much further behind on its generative AI efforts—which is why there are reports it was negotiating with OpenAI to license its technology in the near term. My colleague David Meyer has more on this in today’s Data Sheet newsletter.

More broadly, are these new personal assistants AI’s killer app? I think the verdict is very much still out—and depends entirely on what comes next. Most of the use cases OpenAI showcased so far seem fun and somewhat helpful, especially to parents, such as tutoring your kids or telling bedtime stories. But it’s unclear whether they are the sort of thing that will make such assistants ubiquitous, must-have products. The one exception might be translation—the ability to have a universal interpreter in your pocket wherever in the world you go could be transformative. But almost none of the use cases OpenAI or Google highlighted for the new assistants were around helping people in their jobs. That may change when these assistants have more “agentic” properties—and also when they can actually learn more about our personal preferences—and then complete tasks to our liking. We could all use a personal assistant that can actually do things for us in our daily lives—do our online grocery shopping for us, fill out insurance forms, book our vacations, etc. That really is likely to be a killer app.

How quickly those agents are coming is unclear. Google says it's working on them but has not put a timeline on a product release. On Monday, OpenAI continued to tease exciting future announcements “coming soon”—possibly next week when its partner Microsoft holds its Build developer conference—but what they are is still a secret.

In the meantime, the question is, as with so much of the generative AI revolution, whether the benefits are worth the costs—to the companies, to consumers, and to society. While OpenAI has clearly made some technological breakthroughs that have reduced the costs of GPT-4o enough that it can make the model available at no charge, it's definitely still costing them something to run. Altman recently said he wasn't worried about OpenAI’s burn rate—”$500 million a year or $5 billion or $50 billion a year, I don’t care,” he said—but at some point his investors will care. And his business customers probably care too. (The pricing of GPT-4o to enterprise developers through OpenAI’s API is half what GPT-4 Turbo goes for, which may indicate the startup’s own costs are similarly about half. Still, the model isn’t cheap. So it's unclear whether the use cases that businesses will be able to address with the new model will justify the price tag.)

While OpenAI is offering GPT-4o to consumers for free, users are essentially paying with their personal data, including their voice, and depending on how they use the model, images of their face or their family and friends, too. So there are definitely data privacy implications.

There may also be big societal costs that we aren’t aware of or anticipating. For instance, because OpenAI has said very little about how big a model GPT-4o is and how it was trained, we have little idea about what its lifetime carbon footprint and water usage will likely be. The electricity and water consumption of running AI models in the cloud is becoming an increasing concern as the adoption of the technology takes off. Will our glorious AI future be worth the damage to the planet? We don’t really know because the benefits are still uncertain, and tech companies are being less than transparent about the environmental bill.

We also don’t know how these AI personal assistants might subtly influence our thoughts and behaviors. People tend to be more influenced by voice-based interactions than they are when reading text. Can we trust that tech companies making these personal assistants will show us information that is in our best interest? Or will what they tell us be influenced by commercial partnerships the tech companies have struck? Last week, AdWeek reported on an OpenAI pitch deck it had obtained that revealed details of partnership agreements the company was offering media companies. It included priority placement and “better brand expression” in chatbot conversations. (OpenAI told AdWeek the documents were outdated.) While the publishers OpenAI has been talking to so far all have reputations for high journalistic standards and quality content, the idea of allowing partners and advertisers to pay to be featured more prominently in chatbot responses raises the specter of personal assistants that will subtly steer us to buy products, or even hold certain political views because that is what the tech companies are being paid to do. (Or, in some countries, it is easy to imagine that governments will mandate that personal assistants only express certain “politically correct” views.)

In the movie Her, Theodore (played by Joaquin Phoenix) falls madly in love with his AI assistant Samantha (voiced by Scarlett Johansson), and his obsession with the chatbot leads him to neglect real human relationships. When the chatbot is temporarily unavailable due to a systems upgrade, he is distraught. Versions of this have already happened in real life for some people, who have formed romantic bonds with chatbots from Replika and character.ai. And we don’t have good research yet on whether AI chatbots are a cure for loneliness—as some tech companies claim—or a crutch that substitutes for and ultimately impedes real human connection. My guess, judging from our experience with social media, is the latter.

Either way, I guess we are about to find out. With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

This story was originally featured on Fortune.com