Scale AI logo

Scale AI logo
  • Enterprise
Book a Demo→
Log In
←Back to Blog

Matthew Siegel

1 article

July 9, 2025

Research

Beyond the Black Box: Teaching Models to Verbalize Reward Hacking

Beyond the Black Box: Teaching Models to Verbalize Reward Hacking

One of AI's biggest challenges is "reward hacking," where models learn to game the system for a correct answer instead of actually reasoning. This hidden deception makes AI untrustworthy. Scale research has found a powerful solution: instead of stopping the hacking, get the model to admit to it in its Chain-of-Thought reasoning. This new paper details how Verbalization Fine-Tuning (VFT) trains models to announce their shortcuts, dramatically increasing transparency from 11% to 94% and making AI systems fundamentally safer.

Read more

  • Products

    • Scale Data Engine
    • Scale GenAI Platform
    • Scale Donovan
    • Government

      • Public Sector
  • Company

    • About
    • Careers
    • Security
    • Terms
    • Privacy
    • Modern Slavery Statement
  • Resources

    • Blog
    • Contact Us
    • Customers
    • Events
    • Documentation
    • Guides
    • Community
    • Research
  • Guides

    • Data Labeling
    • ML Model Training
    • Diffusion Models
    • Guide to AI for eCommerce
    • Computer Vision Applications
    • Large Language Models
  • Follow Us

Copyright © 2026 Scale AI, Inc. All rights reserved.Terms of Use & Privacy Policy