Planning In Natural Language Improves LLM Search For Code Generation
Evan Wang 1, 2, Federico Cassano o3,4, Catherine Wu o, Yunfeng Bai 1, Will Song 1, Vaskar Nath 1, Ziwen Han 1, Sean Hendryx 1, Summer Yue 1, Hugh Zhang 1
1 Scale AI , 2 California Institute of Technology, 3 Northeastern University, 4 Cursor AI,
o Work conducted while at Scale AI
This study identifies a key limitation in scaling inference compute for LLMs: the lack of diversity in generated outputs, which leads to inefficient problem-solving. To address this, the authors propose PLANSEARCH, a novel search algorithm that generates diverse observations in natural language and constructs plans for solving problems based on these observations. PLANSEARCH significantly improves search diversity compared to baseline methods, achieving state-of-the-art results across coding benchmarks, including a 77% pass@200 on LiveCodeBench. The study also shows that performance gains from search algorithms can be predicted by measuring diversity in generated ideas.
The results show that both PLANSEARCH and IDEASEARCH significantly outperform basic sampling across all models and benchmarks, with PLANSEARCH achieving the highest overall scores. Interestingly, IDEASEARCH performs slightly better due to splitting the solution sketch into two model responses. Pass@k results vary across models, with the diversity of generated ideas playing a key role in these differences. The diversity score correlates strongly with performance improvements from scaling inference compute, making it a useful predictor for pass@k gains. However, PLANSEARCH can slightly reduce pass@1 by increasing idea diversity, but this trade-off results in better pass@k performance by improving the chance of generating at least one correct solution.
Bibtex Citation
@misc{wang2024planningnaturallanguageimproves,
title={Planning In Natural Language Improves LLM Search For Code Generation},
author={Evan Wang and Federico Cassano and Catherine Wu and Yunfeng Bai and Will Song and Vaskar Nath and Ziwen Han and Sean Hendryx and Summer Yue and Hugh Zhang},
year={2024},
eprint={2409.03733},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2409.03733},
}