RLHF for Large Language Models

Learn more about reinforcement learning with human feedback (RLHF)


Why is ChatGPT so good?

OpenAI applied reinforcement learning with human feedback (RLHF) to enhance ChatGPT. Understand the role RLHF plays in enhancing large language models and how to implement it.

Read more →

How much better is OpenAI’s newest GPT-3 model?

We evaluate davinci-003 across a range of classification, summarization, and generation tasks. We show where davinci-003 significantly outperforms the prior version and where it still has room to improve.

Read more →

Meet Claude: Anthropic’s rival to Chat GPT

A new LLM from Anthropic called Claude is competitive with ChatGPT and offers great promise. We evaluate both models head to head and give our thoughts on how they compare.

Read more →

How to label 1M data points / week

How do you scalably maintain the quality of labels, without having annotators check each other’s work? Take a deep dive into how we solved this problem while working with OpenAI on fine tuning their GPT-2 model.

Read more →

