Smoothing Out LLM Variance for Reliable Enterprise Evals | Scale AI