Please rotate your device for the best experience.

Log inBook demoBook demo
←Blog

Guangze Luo

April 20, 2026

Research

HiL-Bench: Your Agent is Smart. It Just Won't Ask for Help.

HiL-Bench: Your Agent is Smart. It Just Won't Ask for Help.

Frontier agents ace tasks with complete specs, then crash to 4% when key details are missing. They never ask for help. HiL-Bench is the first benchmark that tests whether they know when to.

Read more

Scale AI's logo

Products

Scale data engineScale GenAI PlatformScale Donovan

Solutions

EnterpriseInsuranceHealthcareUS Public SectorGlobal Public Sector

Company

AboutCareersSecurityTermsPrivacyModern Slavery Statement

Resources

BlogContact UsEventsDocumentation

Guides

Data LabelingML Model TrainingDiffusion ModelsGuide to AI for eCommerceComputer Vision ApplicationsLarge Language Models

Reliable AI for the world’s most important decisions

Manage your 

Copyright © 2026 Scale AI, Inc. All rights reserved

Terms of Use & Privacy Policy