Gemini 1.5: A Leap Forward In AI Innovation

Last week, Google unveiled its latest model, Gemini 1.0 Ultra, marking a significant advancement in enhancing the helpfulness of Google products, beginning with Gemini Advanced. Developers and Cloud customers can now start building with 1.0 Ultra through the Gemini API in AI Studio and in Vertex AI.

The teams at Google are continuously pushing the boundaries of their latest models, with safety as a top priority. Progress has been swift, leading to the introduction of the next generation: Gemini 1.5. This new model showcases significant improvements across various dimensions, and 1.5 Pro achieves comparable quality to 1.0 Ultra while utilizing less computational resources.

One of the key advancements in this new generation is the breakthrough in long-context understanding. Google has managed to substantially increase the amount of information its models can process, consistently handling up to 1 million tokens, thus achieving the longest context window of any large-scale foundation model to date.

The ability to process longer context windows presents exciting possibilities for enabling entirely new capabilities and assisting developers in building much more useful models and applications. Google is thrilled to offer a limited preview of this experimental feature to developers and enterprise customers. Demis provides further insights on capabilities, safety, and availability below.

Table of Contents

Introduction of Gemini 1.5

In a period of exciting AI advancement, Google has introduced Gemini 1.5, representing a significant step forward in enhancing AI’s helpfulness for billions worldwide. Since the launch of Gemini 1.0, Google has been rigorously testing, refining, and augmenting its capabilities.

Performance Enhancements

Gemini 1.5 brings about a substantial improvement in performance, marking a paradigm shift in approach. This new model builds upon research and engineering innovations across nearly all aspects of Google’s foundation model development and infrastructure. Notably, Gemini 1.5 has been made more efficient to train and serve, featuring a new Mixture-of-Experts (MoE) architecture.

Long-Context Understanding

The initial Gemini 1.5 model released for early testing is Gemini 1.5 Pro. This mid-size multimodal model is optimized for scalability across a wide range of tasks and performs at a level similar to 1.0 Ultra, Google’s largest model to date. Additionally, it introduces a breakthrough experimental feature in long-context understanding.

Efficient Architecture

Built upon leading research on Transformer and MoE architecture, Gemini 1.5 boasts a highly efficient architecture. Unlike a traditional Transformer that operates as one large neural network, MoE models are divided into smaller “expert” neural networks. These models learn to activate only the most relevant expert pathways in their neural network based on the input provided, significantly enhancing efficiency. Google has been at the forefront of adopting and pioneering the MoE technique for deep learning through various research projects.

Greater Context for Better Capabilities

The ability to process longer context windows opens up new possibilities, enabling developers to create, discover, and build using AI in ways previously unattainable. Google looks forward to enabling people, developers, and enterprises to leverage these advancements for their benefit.

Complex Reasoning Abilities

In performance evaluations, Gemini 1.5 Pro surpasses its predecessor, 1.0 Pro, in 87% of the benchmarks used for developing large language models (LLMs). Moreover, when compared to 1.0 Ultra, Gemini 1.5 Pro performs at a broadly similar level.

Multimodal Understanding and Reasoning

Gemini 1.5 Pro demonstrates impressive performance across various tasks and modalities. It excels in analyzing, classifying, and summarizing large amounts of content within a given prompt. For instance, when provided with the 402-page transcripts from Apollo 11’s mission to the moon, the model can efficiently reason about conversations, events, and details found across the document.

Problem-Solving with Longer Code

When it comes to problem-solving with longer blocks of code, Gemini 1.5 Pro excels. It can effectively address relevant problem-solving tasks across extensive codebases. For example, when given a prompt with over 100,000 lines of code, the model can reason through examples, suggest modifications, and provide explanations on how different parts of the code work.

Enhanced Performance

Gemini 1.5 Pro maintains high performance levels even as its context window increases. In the Needle In A Haystack (NIAH) evaluation, where a specific fact or statement is intentionally embedded within a long block of text, 1.5 Pro successfully identifies the embedded text 99% of the time, even in blocks of data as long as 1 million tokens.

Ethics and Safety Testing

In terms of ethics and safety, Google ensures that its models undergo extensive testing to align with its AI Principles and robust safety policies. Research findings are integrated into governance processes, model development, and evaluations to continually enhance AI systems.

Responsible Deployment

Since the introduction of 1.0 Ultra in December, Google has refined the model to make it safer for a broader release. Extensive evaluations have been conducted across various areas, including content safety and representational harms, with ongoing efforts to expand testing. Additionally, Google is developing new tests to account for the unique long-context capabilities of 1.5 Pro.

Gemini 1.5: A Leap Forward in AI Innovation Click To Tweet

Availability for Testing and Experimentation

Starting today, a limited preview of 1.5 Pro is available to developers and enterprise customers via AI Studio and Vertex AI. More details are available on the Google for Developers blog and Google Cloud blog.

Gemini 1.5 Pro will initially be introduced with a standard 128,000 token context window. Future plans include introducing pricing tiers starting at the standard 128,000 context window, scaling up to 1 million tokens as the model improves.

Early testers can try the 1 million token context window at no cost during the testing period, though they should expect longer latency times with this experimental feature. Significant speed improvements are also expected in the future.

Developers interested in testing Gemini 1.5 Pro can sign up now in AI Studio, while enterprise customers can reach out to their Vertex AI account team.

Discover the capabilities of Gemini and explore its functionality.

Category Collection