The Future is Multilingual: A New Evaluation Benchmark | Scale AI