A close look at apple's analysis of reasoning in LLMs through problem complexity reveals limitations in current benchmark design and model interpretability.
Anthropic launched Haiku, Sonnet, and Opus under the Claude 3 family, offering powerful capabilities across tasks.
Gemini 1.5 Pro featured industry-leading long-context capabilities, enabling advanced reasoning over vast documents.
The first Gemini model from Google DeepMind, merging strengths from AlphaCode, Pathways, and large-scale training.
An optimized and cost-efficient variant of GPT-4 powering ChatGPT with custom GPTs, tools, and longer context.
An improved Claude model with stronger reasoning, fewer hallucinations, and increased openness for public use.
Anthropic’s first Claude model, focused on safety through Constitutional AI, emphasizing harmlessness and transparency.
A multimodal leap forward for OpenAI, capable of reasoning over images and text with more nuanced capabilities.
The 175B parameter model that revolutionized natural language interfaces and powered the first wave of AI API tools.
GPT-2 demonstrated surprisingly coherent text generation, sparking debate over AI safety and open-sourcing.
OpenAI's first generative pre-trained transformer model, laying the foundation for large language models.