ChartMuseum Logo

ChartMuseum Leaderboard

NeurIPS 2025
ChartMuseum Overview

ChartMuseum is a chart question answering benchmark designed to evaluate reasoning capabilities of large vision-language models (LVLMs) over real-world chart images. We categorize the questions into four types:

  • Textual reasoning questions can be solved almost exclusively with textual reasoning.
  • Visual reasoning questions are most easily answerable from visual aspects of the chart.
  • Text/Visual reasoning questions can be answered by either primarily text or primarily visual reasoning.
  • Synthesis reasoning questions require both textual and visual reasoning.

Human overall accuracy on ChartMuseum is 93%, with 98.2% on the visual reasoning questions. Examples from ChartMuseum are available here.

Model Comparison

Compare model performance across different metrics.

20 / 34

selected / available models
GPT-5-mini (high)
Gemini-2.5-Pro
GPT-5 (high)
o4-mini (high)
o3 (high)
Claude-3.7-Sonnet
Claude-4.1-Opus
Claude-4-Sonnet
Qwen3-VL-30B-A3B-Thinking
GPT-4.1
Qwen3-VL-8B-Thinking
GLM-4.5V
Qwen3-VL-30B-A3B-Instruct
Qwen3-VL-8B-Instruct
Qwen2.5-VL-72B
Qwen3-VL-4B-Instruct
Qwen3-VL-4B-Thinking
Qwen3-VL-2B-Thinking
Qwen2.5-VL-7B
Qwen3-VL-2B-Instruct
ModelSize
Visual
Synthesis
Visual/Text
Text
Overall
GPT-5-mini (high)-52.662.473.589.463.3
Gemini-2.5-Pro-53.364.770.187.863.0
GPT-5 (high)-53.764.768.488.662.9
o4-mini (high)-51.266.268.486.261.5
o3 (high)-50.463.269.785.460.9
Claude-3.7-Sonnet-50.655.669.288.660.3
Claude-4.1-Opus-50.454.166.287.059.1
Claude-4-Sonnet-41.052.662.482.152.6
Qwen3-VL-30B-A3B-Thinking30B38.847.457.782.149.7
GPT-4.1-37.153.454.378.948.4
Qwen3-VL-8B-Thinking8B32.445.953.076.444.4
GLM-4.5V108B32.733.847.467.540.6
Qwen3-VL-30B-A3B-Instruct30B31.434.645.373.240.2
Qwen3-VL-8B-Instruct8B27.841.450.070.040.0
Qwen2.5-VL-72B72B30.435.342.368.338.5
Qwen3-VL-4B-Instruct2B25.339.945.771.537.7
Qwen3-VL-4B-Thinking4B26.134.643.265.936.1
Qwen3-VL-2B-Thinking2B20.627.835.062.630.1
Qwen2.5-VL-7B7B19.424.836.341.526.8
Qwen3-VL-2B-Instruct2B17.718.827.838.222.7