Research
HiL-Bench: Your Agent is Smart. It Just Won't Ask for Help.

Frontier agents ace tasks with complete specs, then crash to 4% when key details are missing. They never ask for help. HiL-Bench is the first benchmark that tests whether they know when to.
Read more