Scale Logo
SEAL Logo

VISTA

Visual Language Understanding

Last updated: April 4, 2025

Performance Comparison

1

54.65±1.46

1

50.03±4.45

2

51.34±0.63

3

50.49±0.13

4

49.70±0.58

5

48.23±0.70

5

47.32±1.78

6

45.89±1.14

6

45.50±1.20

6

45.36±0.72

6

45.25±0.40

11

43.25±1.26

12

43.02±1.14

12

42.11±1.39

12

41.25±0.85

14

39.95±0.80

15

39.85±0.71

16

Claude 3.5 Sonnet (October 2024)

38.72±0.51

18

Claude 3.5 Sonnet (June 2024)

38.37±0.70

18

38.33±0.55

18

ChatGPT-4o-latest (November 2024)

37.99±0.48

18

Gemini 1.5 Pro

37.07±1.34

23

GPT-4o (August 2024)

34.94±0.23

23

Gemini 1.5 Flash 002

34.03±1.41

24

Pixtral Large (November 2024)

33.89±0.69

24

32.69±1.40

27

Qwen2-VL-72B-Instruct

28.56±1.37

27

Claude 3 Opus

27.82±0.55

27

26.79±0.65

29

Nova Pro

26.27±0.61

29

Pixtral 12B (September 2024)

25.97±0.74

29

Nova Lite

25.50±0.77

31

Llama 3.2 90B Vision Instruct

24.61±0.80

34

Llama 3.2 11B Vision-Instruct

20.47±0.15

35

Phi 3.5 Vision-Instruct

15.18±0.81