Google has launched its flagship large language model (LLM) and GPT-4 competitor, Gemini.
Gemini, which was first announced at Google I/O in June, is now generally available to the public and is intended long-term to be integrated across virtually every Google product. Google is stressing Geminiβs βmultimodalβ qualities, which means it can process and leverage different versions of data β not just text, which the average generative AI user will be most familiar with to date, but also images, code, audio and video.
Demis Hassabis, CEO and Co-Founder of Google DeepMind, said in a blog post celebrating the launch:
Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.β
Reports last month suggested that Gemini had been delayed until Q1 2024, so Geminiβs launch during its initially planned December date is something of a surprise.
Google has also optimized Gemini in three sizes β Ultra, Pro and Nano, which the tech giant says enables flexibility across use cases, meaning it is βable to efficiently run on everything from data centers to mobile devicesβ. Ultra is Googleβs largest and most capable model for highly complex tasks, Pro is its most appropriate model for scaling across a wide range of tasks, and Nano is the model best for on-device tasks.
Google also stressed that its Ultra Gemini version surpasses βcurrent state-of-the-art results on 30 of the 32 widely-used academic benchmarksβ used in LLM research and development.
βIntroducing Gemini 1.0, our most capable and general AI model yet,β added Google CEO Sundar Pichai on X. βBuilt natively to be multimodal, itβs the first step in our Gemini-era of models. Gemini is optimized in three sizes β Ultra, Pro, and Nano. Gemini Ultraβs performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks.β
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, itβs the first step in our Gemini-era of models. Gemini is optimized in three sizes β Ultra, Pro, and Nano
Gemini Ultraβs performance exceeds current state-of-the-art results onβ¦ pic.twitter.com/pzIw6iCPPN
β Sundar Pichai (@sundarpichai) December 6, 2023
Additionally, Google says that Gemini Ultra is the first LLM to outperform human experts on massive multitask language understanding (MMLU). This framework uses a combination of 57 subjects, including maths, physics, history, law, medicine and ethics for benchmarking knowledge and problem-solving capabilities.
Not missing a trick, Googleβs announcement blog compares Geminiβs MMLU (and other metrics) against OpenAIβs GPT-4, with its 90.0 percent MMLU beating GPT-4βs 86.4 percent.
Gemini 1.0 is now rolling out across a range of Google products and platforms, including Bard and Googleβs Pixel 8 Pro device.
So Why Was This a Little Unexpected?
Only three weeks ago, The Information reported that Google representatives had informed some of the tech giantβs cloud customers and partners that the AI model shouldnβt be expected until Q1 of 2024.
The Informationβs report suggested that a factor in the delay was the uncertainty of whether Gemini could equal or surpass OpenAIβs most advanced LLM in GPT-4. Those fears, clearly, have since been allayed by the latest iteration of the product.
The Informationβs sources said that Geminiβs delay was also based on wanting to reaffirm its consumer offerings with the new AI-powered technology before providing external software developers access to it. According to the report, Google was approaching Geminiβs release with caution, including around using Gemini in Bard, its answer to ChatGPT and a less sophisticated LLM than Gemini.
So Gemini is Now Being Used in Google Bard?
In what Google is describing as βBardβs biggest upgrade yetβ, whatβs available now is Bard Pro will leverage a specifically tuned version of Gemini Pro in English for advanced reasoning, planning, and understanding.
Users can try out Bard with Gemini Pro today for text-based prompts for now, with support for other modalities like images and video scheduled to come soon. The solution is available in English in more than 170 countries and territories to begin with, with more languages and locations, with Google namechecking Europe specifically, in the βnear futureβ.
Google says Gemini Pro in Bard is βfar more capable at things like understanding, summarizing, reasoning, coding and planningβ than GPT-3.5, which currently underpins the free version of OpenAIβs ChatGPT.
Early next year, Google says it will also release Bard Advanced, which gives users the first access to its most advanced models and feature sets, beginning with Gemini Ultra.
What Does This Mean for the AI Race?
A lot, most likely.
Given the turmoil at OpenAI last month β in which CEO Sam Altman was fired and rehired within four days in a plot twist-strewn saga that will almost certainly be an HBO or Netflix drama within the next five years βyouβd have imagined the AI business and its largest investor, Microsoft, would have felt secure in focusing on its governance issues and not having to worry too much about product competition until the new year.
Gemini, and Googleβs confident tables of comparison with GPT-4, have drawn the battlelines of the AI arms race for 2024. GPT-4 and Gemini. OpenAI and DeepMind. Microsoft and Google.
Inevitably, however, comparison tables and computational claims mean little to the average user β success in the AI race will likely hinge on tangible, evidential use cases. How will Gemini and GPT-4, in whatever product iteration theyβre delivered, meaningfully improve peopleβs lives and businessesβs operational practices and financial bottom lines?
If you thought AI defined 2023, itβs likely youβve seen nothing yet. Next year, its impact could be significantly more seismic.