Skip to content

Begins w/ AI

  • AI for WorkExpand
    • Try Claude
    • Build A Claude AI Army – Course
    • Claude in Your Workflow – Service
  • ResourcesExpand
    • Prompting for Claude
    • Claude Pro: Should You Upgrade?
    • Claude Team Plan: Should You Upgrade?
    • What You Can Do with Claude
    • Claude 2/3/3.5 Benchmarks & Reviews
    • Claude Glossary, Terms & Concepts
    • Claude in Unsupported Countries (Fix)
    • Claude Errors
    • ChatGPT Alternatives
    • Circuits Updates
    • Is Claude Down?
    • Is ChatGPT Down?
  • Free Prompts
  • Contact
Begins w/ AI

Claude 2/3/3.5 Benchmarks & Reviews

This page compiles assessments and evaluations of Claude models, The benchmarks measure Claude models’ capabilities as a language model across diverse NLP tasks including textual entailment, question answering, summarization, and dialogue. I hope this page can give you a comprehensive overview of Claude models’ language proficiencies and how they compare to other state-of-the-art AI systems.

Claude in the history of Large Language Models

Benchmarks & Reviews

Here are the scores of Claude 3.5 Sonnet:

Here are the scores of Claude 3 Models:

Here are the scores of Claude 2 in all the popular tests:

  • 76.5% (Claude 2 score on Bar exam multiple choice)
  • 73.0% (Claude 1.3 score on Bar exam multiple choice)
  • 90th percentile (Claude 2 GRE reading/writing score compared to grad school applicants)
  • median (Claude 2 GRE quantitative reasoning score compared to grad school applicants)
  • 71.2% (Claude 2 score on Codex HumanEval)
  • 56.0% (Previous Claude score on Codex HumanEval)
  • 88.0% (Claude 2 score on GSM8k math problems)
  • 85.2% (Previous Claude score on GSM8k math problems)
  • 2x better (Claude 2 vs Claude 1.3 at giving harmless responses)

Reviews from various sources:

  • How Good is the Claude 2 AI at Working With PDFs? – Let’s Find Out – page
  • Model Card and Evaluations for Claude Models – PDF
  • Claude 3.5 Sonnet Model Card Addendum – PDF
  • Claude 3 Model Card – PDF
  • ARB: Advanced Reasoning Benchmark for Large Language Models – PDF
  • LLM hallucinations graded – Google Sheet
  • Llama 2 vs Claude 2 vs GPT-4 – video
  • After using Claude 2 by Anthropic for 12 hours straight, here’s what I found – Reddit Discussion
  • How strong is Claude 2? – video
  • What to Know About Claude 2, Anthropic’s Rival to ChatGPT – page

Got a question or a recommendation? Please send me a message at [email protected].


More Claude Basics

  • What you can do with Claude 2
  • App Unavailable error – How to use Claude in unsupported countries
Newsletter Form (#9)

👋Stay in the know of all the cool Claude tricks, no BS.

We only send tricks we've tested, you won't regret it.



© 2025 Begins w/ AI, Made with ❤️ by Brady

Claude 3.5 Sonnet

  • Try Claude
  • Workflow Customization
  • Claude Army Course

Is ChatGPT/OpenAI Down?

12 Days of OpenAI

  • AI for Work
    • Try Claude
    • Build A Claude AI Army – Course
    • Claude in Your Workflow – Service
  • Resources
    • Prompting for Claude
    • Claude Pro: Should You Upgrade?
    • Claude Team Plan: Should You Upgrade?
    • What You Can Do with Claude
    • Claude 2/3/3.5 Benchmarks & Reviews
    • Claude Glossary, Terms & Concepts
    • Claude in Unsupported Countries (Fix)
    • Claude Errors
    • ChatGPT Alternatives
    • Circuits Updates
    • Is Claude Down?
    • Is ChatGPT Down?
  • Free Prompts
  • Contact
Search