Gemini:
Google's Largest and Most Capable AI Model
beats GPT4
Sota Performance: Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code.
Three different sizes of Gemini:
- Gemini Ultra — our largest and most capable model for highly complex tasks.
- Gemini Pro — our best model for scaling across a wide range of tasks.
- Gemini Nano — our most efficient model for on-device tasks.
With a score of 90.0%
Gemini Ultra is the first model to outperform human experts on MMLU
which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.
ShowCase of Gemini
Gemini Ultra model is prompted with one example of interleaved image and text where the user provides two colors (blue and yellow) and image suggestions of creating a cute blue cat or a blue dog with yellow ear from yarn. The model is then given two new colors (pink and green) and asked for two ideas about what to create using these colors. The model successfully generates an interleaved sequence of images and text with suggestions to create a cute green avocado with pink seed or a green bunny with pink ears from yarn
Verifying a student’s solution to a physics problem. The model is able to correctly recognize all of the handwritten content and verify the reasoning. On top of understanding the text in the image, it needs to understand the problem setup and correctly follow instructions to generate LATEX.
Consider a cooking scenario about making an omelet where we prompt the model with a sequence of audio and images. Table 13 indicates a turn-by-turn interaction with the model, providing pictures and verbally asking questions about the next steps for cooking an omelet. We note that the model response text is reasonably accurate, and shows that model processes fine-grained image details to
evaluate when the omelet is fully cooked. See demo on the website
Explanation of humor in a meme. The model is showing the ability to not only describe what is happening in the image but also what it means even though the cultural context is not mentioned explicitly in the image or the prompt. Source: Hwang and Shwartz (2023).
Common-sense reasoning in images. The model is able to understand the relationships represented in the graphs and reason about them in a multilingual setting. Source: image created by an author from the Gemini team
Multimodal reasoning capabilities applied to code generation. Gemini Ultra needs to perform inverse graphics task to infer the code that would have generated the plots, perform additional mathematical transformations, and generate relevant code.
Frequently Asked Questions about Google Gemini
What is Gemini?
Gemini is Google’s largest and most capable AI model, created to bring enormous benefits to people and society by accelerating human progress and improving lives.
What makes Gemini different from other AI models?
Gemini possesses state-of-the-art performance across many leading benchmarks and is optimized for different sizes: Ultra, Pro, and Nano. It was built from the ground up to be multimodal, meaning it can understand, operate across, and combine different types of information including text, code, audio, image, and video.
How was Gemini created?
Gemini was the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research.
What are the significant features of Gemini?
Gemini can understand, explain, and generate high-quality code in the world’s most popular programming languages like Python, Java, C++, and Go. It can understand and reason about all kinds of inputs from the ground up far better than current multimodal AI models.
What are the safety measures implemented in Gemini?
Gemini has undergone the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity. Also, Google has worked with a diverse group of external experts and partners to stress-test our models across a range of issues.
How can I use Gemini for programming?
Starting December 13, developers and enterprises can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.
Will Gemini be integrated into Google’s products?
Yes, Gemini will be integrated into a range of Google’s products such as Google Search, Ads, Chrome, and Duet AI, among others.
What is the future plan for Gemini?
Google plans to continue innovating and responsibly advance the capabilities of Gemini, with a focus on advances in planning and memory, and increasing the context window for processing more information.
Tweets Related to Google Gemini
Let's go hands-on with #GeminiAI.
— Google (@Google) December 6, 2023
Our newest AI model can reason across different types of inputs and outputs — like images and text. See Gemini's multimodal reasoning capabilities in action ↓ pic.twitter.com/tikHjGJ5Xj
Gemini + Flutter 🤯
— Erick Ghaumez (@rxlabz) December 6, 2023
pic.twitter.com/kXfAGUfBde
We believe in making AI helpful for everyone. That’s why we’re launching Gemini, our most capable model that’s inspired by the way people understand and interact with the world. #GeminiAI pic.twitter.com/gNG9ha9xMO
— Google (@Google) December 6, 2023
【⚡️速報:Googleが秘密兵器「Gemini」を公開、ほぼ全ての指標でGPT-4を凌駕】
— チャエン | 重要AIニュースを毎日発信⚡️ (@masahirochaen) December 6, 2023
Googleスマホ「Pixel」や「Bard」が更に進化する
もし、無料のBardに画像・ファイル解析、画像生成機能が搭載されたらChatGPTからユーザーが流れそう
Gemini API公開も楽しみに
AI開発も加速する
↓アプデ概要 pic.twitter.com/2QlQgcbpCv
#TeamPixel, we come bearing gifts!🎁#Pixel8 Pro is now running Gemini Nano that powers AI features like Summarize in Recorder📝& Smart Reply in Gboard.💬
— Made by Google (@madebygoogle) December 6, 2023
But that’s not all! Learn how a new #FeatureDrop makes your Pixel (even older ones) feel new again: https://t.co/E3xkAYBYoz pic.twitter.com/MZtMN48DV9
Google Gemini AI模型官方测试视频 (中文翻译)
— 小互 (@xiaohuggg) December 6, 2023
通过这个视频你可以全面的了解Gemini AI的能力!
根据这个测试来看确实是很强大,进行了全方位的测试,从正常对话、视图能力、逻辑推理能力、语言翻译能、图像生成能力等都进行了各种测试演示。 pic.twitter.com/JwU4X1HwAd