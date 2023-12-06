Enlarge / Google Gemini logo.

On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini “exceeds current state-of-the-art results on 30 of 32 widely used academic benchmarks used in large language model (LLM) research and development.” It is the follow-up to PaLM 2, an earlier AI model that Google hoped would match GPT-4 in capability.

A specially tuned English version of its mid-tier Gemini model is now available in more than 170 countries as part of the Google Bard chatbot – though not in the EU or UK due to potential regulation issues.

Like GPT-4, Gemini can handle multiple types (or “modes”) of input, making it multimodal. This means it can process text, code, images, and even audio. The goal is to create a type of artificial intelligence that can accurately solve problems, give advice, and answer questions in a variety of fields, from the mundane to the scientific. Google says it will power a new era in computing, and it hopes to tightly integrate the technology into its products.

Google writes, “Gemini 1.0’s sophisticated multimodal reasoning capabilities can help understand complex written and visual information.” “Its remarkable ability to extract insights from hundreds of thousands of documents by reading, filtering and understanding information will help deliver new breakthroughs at digital speed in many fields from science to finance.”

Google says Gemini will be available in three sizes: Gemini Ultra (“for highly complex tasks”), Gemini Pro (“for completing a wide variety of tasks”), and Gemini Nano (“for tasks on the device “) such as Google’s Pixel 8 Pro smartphone). Each is possibly distinguished in complexity by parameter count. More parameters means a larger neural network which is generally more capable of performing more complex tasks but requires more computational power to run. This means that the Nano, the smallest, is designed to run locally on consumer devices, while the Ultra can only run on data center hardware.

“These are the first models of the Gemini era and the first realization of the vision we envisioned when we formed Google DeepMind earlier this year,” Google CEO Sundar Pichai wrote in a statement. “This new era of models represents one of the largest science and engineering efforts we have ever undertaken as a company. I’m really excited about what’s coming next and what opportunities Gemini will bring to people everywhere. Will open, excited for it.”

Although the Gemini will come in three sizes, only the mid-tier model is available for public use. As mentioned above, Google Bard now runs a specially tuned version of Gemini Pro. From our informal testing so far, Gemini Pro appears to perform much better than the previous version of Bard, which was based on Google’s PaLM 2 language model.

Google also claims that Gemini is more scalable and efficient than its previous AI models when running on Google’s custom Tensor Processing Units (TPUs). “On TPU,” Google says, “Gemini runs significantly faster than earlier, smaller and less capable models.”

And it’s reportedly very good at coding. Google trained a special coding-focused version of Gemeni called AlphaCode 2, which, according to Google, “excels at solving competitive programming problems that involve complex mathematics and theoretical computer science beyond coding.” Gemini is also excellent at amplifying Google’s PR language – if the models were less capable and revolutionary, would the marketing copy be less breathless? that’s suspicious.

