Scale Logo
SEAL Logo

VISTA

Visual Language Understanding

Last updated: April 4, 2025

Performance Comparison

1

54.65 ±1.46

2

48.23 ±0.70

2

47.32 ±1.78

3

45.50 ±1.20

3

45.25 ±0.40

5

43.25 ±1.26

6

43.02 ±1.14

6

42.11 ±1.39

6

41.25 ±0.85

8

39.95 ±0.80

9

39.85 ±0.71

10

Claude 3.5 Sonnet (October 2024)

38.72 ±0.51

12

Claude 3.5 Sonnet (June 2024)

38.37 ±0.70

12

38.33 ±0.55

12

ChatGPT-4o-latest (November 2024)

37.99 ±0.48

12

Gemini 1.5 Pro

37.07 ±1.34

17

GPT-4o (August 2024)

34.94 ±0.23

17

Gemini 1.5 Flash 002

34.03 ±1.41

18

Pixtral Large (November 2024)

33.89 ±0.69

18

32.69 ±1.40

21

Qwen2-VL-72B-Instruct

28.56 ±1.37

21

Claude 3 Opus

27.82 ±0.55

21

26.79 ±0.65

23

Nova Pro

26.27 ±0.61

23

Pixtral 12B (September 2024)

25.97 ±0.74

23

Nova Lite

25.50 ±0.77

25

Llama 3.2 90B Vision Instruct

24.61 ±0.80

28

Llama 3.2 11B Vision-Instruct

20.47 ±0.15

29

Phi 3.5 Vision-Instruct

15.18 ±0.81