Google’s Gemini 3.1 has arrived, and it’s pushing AI research agents to new heights. The new model’s scores on key AI benchmarks show major jumps over its predecessor, Gemini 3 Pro. Experts are taking notice of what the numbers reveal.
On the ARC-AGI-2 benchmark, which tests abstract reasoning, Gemini 3.1 scored 77.1%. That’s more than double Gemini 3 Pro’s 31.1%. The model also hit the highest score ever recorded on the GPQA Diamond test, which measures graduate-level science knowledge.
Gemini 3.1 also made big strides in agentic tasks. These are jobs where an AI works on its own to complete multi-step goals. On the APEX-Agents score, it reached 33.5%, compared to 18.4% for Gemini 3 Pro. That’s an 82% relative improvement. For web research tasks measured by BrowseComp, it scored 85.9% versus 59.2% for the older model.
One of the model’s standout features is its ability to reduce hallucinations. That’s when AI makes up wrong answers instead of admitting it doesn’t know something. Gemini 3.1 cut its hallucination rate by 38 percentage points compared to Gemini 3 Pro Preview. It’s now much better at saying “I don’t know” rather than guessing incorrectly.
In coding tasks, the model’s performance was also strong. It hit 80.6% on SWE-Bench Verified, a test for fixing real software bugs. It also scored a 2887 Elo rating on LiveCodeBench Pro, which measures competitive coding skills.
For businesses, Google launched Deep Research Max integration. It turns the model into a research tool that can gather data from multiple sources, check facts, and produce detailed, cited reports. Companies in finance, life sciences, and market research are among those expected to benefit. With the global AI market projected to grow from $391 billion in 2025 to $1.81 trillion by 2030, tools like this are arriving at a critical moment for enterprise adoption.
The tool works through a single API call and can blend proprietary data with open web sources.
Gemini 3.1’s 1 million token context window also helps it handle huge amounts of information at once. That makes it useful for tasks like literature reviews and hypothesis generation in research settings. The model also supports multimodal input, allowing it to process text, images, audio, and video within the same workflow.
Developers can access the model directly through Google AI Studio using the gemini-3.1-pro-preview identifier, giving teams a straightforward path to integrate these research capabilities into their own applications.
References
- https://www.datacamp.com/pt/blog/gemini-3-1
- https://almcorp.com/blog/gemini-3-1-pro-complete-guide/
- https://artificialanalysis.ai/articles/gemini-3-1-pro-preview-new-leader-in-ai
- https://blog.google/innovation-and-ai/models-and-research/gemini-models/next-generation-gemini-deep-research/
- https://deepmind.google/models/model-cards/gemini-3-1-pro/
- https://www.youtube.com/watch?v=2_DPnzoiHaY
- https://www.mindstudio.ai/blog/google-gemini-deep-research-max-api-review/