Advertisement
Singapore markets open in 7 hours 5 minutes
  • Straits Times Index

    3,332.80
    -10.55 (-0.32%)
     
  • S&P 500

    5,460.48
    -22.39 (-0.41%)
     
  • Dow

    39,118.86
    -45.24 (-0.12%)
     
  • Nasdaq

    17,732.60
    -126.10 (-0.71%)
     
  • Bitcoin USD

    61,578.32
    +680.69 (+1.12%)
     
  • CMC Crypto 200

    1,281.15
    -2.68 (-0.21%)
     
  • FTSE 100

    8,164.12
    -15.56 (-0.19%)
     
  • Gold

    2,336.90
    +0.30 (+0.01%)
     
  • Crude Oil

    81.46
    -0.28 (-0.34%)
     
  • 10-Yr Bond

    4.3430
    +0.0550 (+1.28%)
     
  • Nikkei

    39,583.08
    +241.58 (+0.61%)
     
  • Hang Seng

    17,718.61
    +2.11 (+0.01%)
     
  • FTSE Bursa Malaysia

    1,590.09
    +5.15 (+0.32%)
     
  • Jakarta Composite Index

    7,063.58
    +95.63 (+1.37%)
     
  • PSE Index

    6,411.91
    +21.33 (+0.33%)
     

Google unveils a more capable Gemini Pro AI as it seeks an edge against OpenAI and other rivals in tech’s hottest battlefield

Toby Melville—WPA Pool/Getty Images

Another week, another Gemini model.

Alphabet-owned Google has been furiously pushing out new Gemini-branded AI models, which can power business applications like chatbots and coding assistants, as it tries to outdo rivals OpenAI and Microsoft in a generative AI battle that keeps getting more intense.

Earlier this week, OpenAI announced it was adding a persistent memory to ChatGPT—which means the chatbot will remember facts about the user's preferences and past dialogues and apply them to future responses—through a new underlying large language model GPT-4.5. Now Google is countering by releasing a yet more capable and compact Gemini model.

And, while a Google DeepMind researcher who worked on the models said the company is sharing information on the safety of these new AI systems with government regulatory bodies in the U.S. and U.K., it isn’t exactly waiting for their greenlight before releasing them to the public.

ADVERTISEMENT

While this information sharing is voluntary, the U.K. government has often implied that its AI Safety Institute will help keep dangerous models from posing a risk to the public, a role it cannot fulfil if tech companies keep pushing out models far faster than it can evaluate them. In the U.S., a similarly named AI Safety Institute is only charged with issuing standards for companies to use in evaluating their own models, not undertaking the evaluations themselves.

Only last week, Alphabet put its most powerful Gemini Ultra 1.0 model into wide release, charging users $20 monthly for access to a better AI assistant to advise you on how to change a tire, help you design a birthday card, or analyze financial statements for you.

Today it is announcing a more limited “research release” of a new version of its Gemini Pro model—Gemini 1.5 Pro —that delivers similar performance to the Ultra 1.0 but in a much smaller model. Smaller models use less computing power to train and run, which also makes them less costly to use.

The 1.5 Pro is also built using a “mixture of experts” design, which means that rather than being a single giant neural network, it is actually an assemblage of several smaller ones, each specialized for a particular task. This too makes the model cheaper to train and to run.

Google charges customers of the current Gemini 1.0 Pro model $.0025 per image the model generates, $.002 per second of audio it outputs, and $.00025 per 1,000 characters of text it produces. The company has not said what it plans to charge for the new 1.5 Pro version.

Like Google's other Gemini models, 1.5 Pro is multi-modal, meaning it has been trained on text, images, audio, and video. It can process inputs or provide outputs in any of these forms.

But, in addition to being smaller, the new Pro does something that even the larger Ultra 1.0 can’t do. It can ingest and analyze far more data than any other AI model on the market, including its bigger, older cousin.

The new Gemini 1.5 Pro can take in about seven books’ worth of text, or a full hour of video, or 11 hours of audio. This makes it easier to ask the AI system questions that involve searching for an answer amid a lot of data, such as trying to find a particular clip in a long video, or trying to answer a complex question about some portion of the federal criminal code.

The new model can do this because its “context window”—or the maximum length of a prompt—can be as long as 1 million tokens. A token is a chunk of data that is about a word and a bit long. So one million tokens is about 700,000 words. The next closest publicly available large language model, Anthropic’s Claude 2.0, has a context window of 200,000 tokens.

For now, the new 1.5 Pro is being aimed at corporate customers and AI researchers. It's being made available to users with access to the Gemini API through Google’s AI Studio sandbox as well as select Google Cloud customers being invited to a “private preview” of the model through Google Cloud’s Vertex AI platform.

Google is desperate to convince big businesses to start building their generative AI applications on top of its AI models. It is hoping this will help it grow its Cloud Computing business, which has consistently been in third place behind Microsoft Azure and Amazon's AWS. But Google's new AI features have given it the best opportunity it has had in years to gain market share, particularly from AWS, which has been much slower than its rivals in offering cutting-edge generative AI models.

Last month, Alphabet reported that its cloud revenue grew 25% year over year in the last quarter, a figure that is below the 30% cloud revenue growth Microsoft's reported. But Alphabet's cloud sales were expanding at a rate almost double that reported by AWS. Amazon has sought to rectify its genAI laggard status through a partnership with AI startup Anthropic, although it's unclear if that alliance will enable it to keep it from losing ground to Microsoft Azure and Google Cloud.

Oriol Vinyals, a vice president of research at Google DeepMind who helped develop the latest Gemini model, showed reporters a video demonstration that highlighted how the new model could exhibit a sophisticated understanding of both video and language. When asked about the significance of a piece of paper in an old Buster Keaton silent film, the model not only answered correctly that the paper was a pawn ticket and explained its importance in the film’s plot, it could also cite the correct scene in the film where it was featured. It could also pull out examples of astronauts joking in transcripts of the Apollo 11 mission.

Vinyals also showed how a person could use a simple sketch to ask the model to find scenes or instances in the transcript that matched the sketch.

But, he noted, the model is still fallible. Like all LLM-based AI models, it remains prone to “hallucinations,” in which the model simply invents information. He said the 1.5 Pro’s hallucination rate was “no better or worse” than Google’s earlier Gemini models, but he did not disclose a specific error rate.

In response to journalists' questions, Vinyals also implied that the demonstration videos he had just played to show off the capabilities of the Gemini 1.5 Pro may have depicted examples Google had cherry picked from among other similar attempts that were less successful.

Many journalists and technologists criticized Google for editing a video demonstration that accompanied the unveiling of its Gemini models in December that made the models seem more capable of understanding scenes in live video as well as speedier in answering questions than they actually are.

The new 1.5 Pro also does not have a persistent long-term memory, unlike Google’s GPT-4.5. This means that while the 1.5 Pro can find information within a fairly large dataset, it cannot remember that information for future sessions. For instance, Vinyals said that a 1.5 Pro user could give the model an entire dictionary for an obscure language and then ask it to translate from that language. But if the user came back a month later, the model wouldn’t instantly know how to do the same translation. The user would have to feed it the dictionary again.

The U.K. government’s newly created AI Safety Institute is supposed to be conducting independent evaluations of the most powerful models that AI companies develop. In addition, AI companies including Google DeepMind have agreed to share information about their own internal safety testing with both the U.K. and U.S. government. Vinyals said that Google is complying with the promises it made to these governments at international AI Safety Summit this past summer, but he did not specify whether the U.K. AI Safety Institute has evaluated 1.5 Pro or any of the Gemini models.

Last week, The Financial Times reported that many leading AI companies have been frustrated with the time it is taking the AI Safety Institute to conduct its evaluations of their models.

This story was originally featured on Fortune.com