Google has launched its flagship large language model (LLM) and GPT-4 competitor, Gemini.
Gemini, which was first announced at Google I/O in June, is now generally available to the public and is intended long-term to be integrated across virtually every Google product. Google is stressing Gemini’s “multimodal” qualities, which means it can process and leverage different versions of data — not just text, which the average generative AI user will be most familiar with to date, but also images, code, audio and video.
Demis Hassabis, CEO and Co-Founder of Google DeepMind, said in a blog post celebrating the launch:
Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.”
Reports last month suggested that Gemini had been delayed until Q1 2024, so Gemini’s launch during its initially planned December date is something of a surprise.
Google has also optimized Gemini in three sizes — Ultra, Pro and Nano, which the tech giant says enables flexibility across use cases, meaning it is “able to efficiently run on everything from data centers to mobile devices”. Ultra is Google’s largest and most capable model for highly complex tasks, Pro is its most appropriate model for scaling across a wide range of tasks, and Nano is the model best for on-device tasks.
Google also stressed that its Ultra Gemini version surpasses “current state-of-the-art results on 30 of the 32 widely-used academic benchmarks” used in LLM research and development.
“Introducing Gemini 1.0, our most capable and general AI model yet,” added Google CEO Sundar Pichai on X. “Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes – Ultra, Pro, and Nano. Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks.”
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes – Ultra, Pro, and Nano
Gemini Ultra’s performance exceeds current state-of-the-art results on… pic.twitter.com/pzIw6iCPPN
— Sundar Pichai (@sundarpichai) December 6, 2023
Additionally, Google says that Gemini Ultra is the first LLM to outperform human experts on massive multitask language understanding (MMLU). This framework uses a combination of 57 subjects, including maths, physics, history, law, medicine and ethics for benchmarking knowledge and problem-solving capabilities.
Not missing a trick, Google’s announcement blog compares Gemini’s MMLU (and other metrics) against OpenAI’s GPT-4, with its 90.0 percent MMLU beating GPT-4’s 86.4 percent.
Gemini 1.0 is now rolling out across a range of Google products and platforms, including Bard and Google’s Pixel 8 Pro device.
So Why Was This a Little Unexpected?
Only three weeks ago, The Information reported that Google representatives had informed some of the tech giant’s cloud customers and partners that the AI model shouldn’t be expected until Q1 of 2024.
The Information’s report suggested that a factor in the delay was the uncertainty of whether Gemini could equal or surpass OpenAI’s most advanced LLM in GPT-4. Those fears, clearly, have since been allayed by the latest iteration of the product.
The Information’s sources said that Gemini’s delay was also based on wanting to reaffirm its consumer offerings with the new AI-powered technology before providing external software developers access to it. According to the report, Google was approaching Gemini’s release with caution, including around using Gemini in Bard, its answer to ChatGPT and a less sophisticated LLM than Gemini.
So Gemini is Now Being Used in Google Bard?
In what Google is describing as “Bard’s biggest upgrade yet”, what’s available now is Bard Pro will leverage a specifically tuned version of Gemini Pro in English for advanced reasoning, planning, and understanding.
Users can try out Bard with Gemini Pro today for text-based prompts for now, with support for other modalities like images and video scheduled to come soon. The solution is available in English in more than 170 countries and territories to begin with, with more languages and locations, with Google namechecking Europe specifically, in the “near future”.
Google says Gemini Pro in Bard is “far more capable at things like understanding, summarizing, reasoning, coding and planning” than GPT-3.5, which currently underpins the free version of OpenAI’s ChatGPT.
Early next year, Google says it will also release Bard Advanced, which gives users the first access to its most advanced models and feature sets, beginning with Gemini Ultra.
What Does This Mean for the AI Race?
A lot, most likely.
Given the turmoil at OpenAI last month — in which CEO Sam Altman was fired and rehired within four days in a plot twist-strewn saga that will almost certainly be an HBO or Netflix drama within the next five years —you’d have imagined the AI business and its largest investor, Microsoft, would have felt secure in focusing on its governance issues and not having to worry too much about product competition until the new year.
Gemini, and Google’s confident tables of comparison with GPT-4, have drawn the battlelines of the AI arms race for 2024. GPT-4 and Gemini. OpenAI and DeepMind. Microsoft and Google.
Inevitably, however, comparison tables and computational claims mean little to the average user — success in the AI race will likely hinge on tangible, evidential use cases. How will Gemini and GPT-4, in whatever product iteration they’re delivered, meaningfully improve people’s lives and businesses’s operational practices and financial bottom lines?
If you thought AI defined 2023, it’s likely you’ve seen nothing yet. Next year, its impact could be significantly more seismic.
 
                                                                      
                                             
         
         
         
         
        