Gemini 1.5 Pro released this week with the promise.
"The model delivers dramatically enhanced performance, with a breakthrough in long-context understanding across modalities."
It was designed to be a mid-size multi modal model that matches the performance of 1.0 Ultra (their largest model) while simultaneously managing to use less compute than the prized heifer. 1.5 uses a transformer, and mixture of experts architecture. MoE allows the model to be split into smaller "expert" narrow llms rather than the traditional monolith neural net. Meaning for any given input, only relevant expert pathways active, leading to more effective training and inference.
The defining feature of 1.5 Pro is still it's context window however.
| Metric | Value |
|---|---|
| Standard context window | 128,000 tokens |
| Max context window (preview) | 1 million tokens |
| Tested in research up to | 10 million tokens |
A context window of 1 million tokens is equivalent to 1 hour of video, 11 hours of audio, >30K lines of code, >700K words.
The defining feature of 1.5 Pro is its context window:
| Metric | Value |
|---|---|
| Standard context window | 128,000 tokens |
| Max context window (preview) | 1 million tokens |
| Tested in research up to | 10 million tokens |
What 1 million tokens can hold:
- 1 hour of video
- 11 hours of audio
- 30,000+ lines of code
- 700,000+ words