General

Submit Your Toughest Questions for Humanity's Last Exam

byon September 16, 2024

Scale AI and CAIS are excited to announce the launch of Humanity's Last Exam, a project aimed at measuring how close we are to achieving expert-level AI systems. The exam is aimed at building the world's most difficult public AI benchmark gathering experts across all fields. People who submit successful questions will be invited as coauthors on the paper for the dataset and have a chance to win money from a $500,000 prize pool. 

Scale & CAIS

Scale’s Safety, Evaluations, and Alignment Lab (SEAL) is dedicated to researching robust evaluation methods for frontier models, and enhancing the transparency of LLM progress. As part of this mission, SEAL periodically publishes leaderboards based on expert-driven private evaluations, helping the AI community gain deep insights into leading models.

The Center for AI Safety (CAIS) is a San Francisco-based research and field-building nonprofit. CAIS’s mission is to reduce societal-scale risks associated with AI by conducting safety research, building the field of AI safety researchers, and advocating for safety standards.

Why Participate?

AI is developing at a rapid pace. Just a few years ago, AI systems performed no better than random chance on MMLU, the AI community’s most-downloaded benchmark (developed by CAIS). But just last week, OpenAI’s newest model performed around the ceiling on all of the most popular benchmarks, including MMLU, and received top scores on a variety of highly competitive STEM olympiads. As existing tests become too easy, we lose the ability to distinguish between AI systems which can ace undergrad exams, and those which can genuinely contribute to frontier research and problem solving. 

Because AI systems of the future will eventually surpass all static benchmarks which can be created, pushing the boundary of benchmarks and evaluation is paramount. To keep track of how far AI systems are from expert-level capabilities, Scale and CAIS are developing Humanity’s Last Exam, which aims to be the world’s most difficult AI test.

Your Role

We're assembling the largest, broadest coalition of experts in history to design questions that test how far AIs are from the human intelligence frontier. If there is a question that would genuinely impress you if an AI could solve it, we’d like to hear it from you!

  • If one or more of your questions is accepted, you will be offered optional co-authorship of the resulting paper. We have already received questions from researchers from MIT, UC Berkeley, Stanford, and more. The more questions accepted, the higher your name will appear.

  • The top 50 questions will earn $5000 each.

  • The next top 500 questions will earn $500 each.

Prizes may be awarded on question quality or question novelty compared to other questions. People who have already submitted questions prior to this announcement are also eligible for these prizes. A small set of questions will be kept private to catch if an AI is memorizing answers to public questions, but prizes can and co-authorship can be awarded to people who have their questions kept part of the private set.

Submission Guidelines

  • Challenge Level: Questions should be difficult for non-experts and not easily answerable via a quick online search. Avoid trick questions. Frontier AI systems are very good at answering even masters-level questions. It’s strongly encouraged that question-writers have 5+ years of experience in a technical industry job (e.g., SpaceX, Boston Dynamics, Siemens, etc) or are a PhD student or above in academic training. In preparation for Humanity’s Last Exam, we found questions written by undergraduates tend to be too easy for the models. As a rule of thumb, if a randomly selected undergraduate can understand what is being asked, it is likely too easy for the frontier LLMs of today and tomorrow.

  • Objectivity: Answers should be accepted by other experts in the field and free from personal taste, ambiguity, or subjectivity. Provide all necessary context and definitions within the question. Use standard, unambiguous jargon and notation.

  • Originality: Questions must be your own work and not copied from others.

  • Confidentiality: Questions and answers should not be publicly available. You may use questions from past exams you've given if they're not accessible to the public.

  • Weaponization Restrictions: Do not submit questions related to chemical, biological, radiological, nuclear, cyberweapons, or virology.

Terms and conditions here.

Deadline: November 1, 2024

For a detailed list of instructions and example questions, please visit agi.safe.ai/submit.

 


 

Be part of a project that could redefine the future of artificial intelligence. Your expertise could help create the ultimate challenge for AI systems worldwide!

This prize pool is sponsored by the Scale AI. We cannot give awards to teams on US terrorist lists or those subject to sanctions. Sponsor may confirm the legality of sending prize money to winners who are residents of countries outside of the United States. Winners will be emailed prior to the paper publication, and Scale AI and Center for AI Safety organizers will judge the question submissions. Only authors on awarded questions are winners. All decisions of judges are final. The legality of accepting the prize in his or her country is the responsibility of the winners. All taxes are the responsibility of the winners. Employees or current contractors of Scale AI and contest organizers are not eligible to win prizes. Entrants must be over the age of 18. By entering the contest, entrants agree to the Terms & Conditions. Entrants agree that Scale AI and Center for AI Safety shall not be liable to entrants for any type of damages that arise out of or are related to the contest and/or the prizes. By submitting an entry, entrant represents and warrants that, consistent with the terms of the Terms and Conditions: (a) the entry is entrant’s original work; (b) entrant owns any copyright applicable to the entry; (c) the entry does not violate, in whole or in part, any existing copyright, trademark, patent or any other intellectual property right of any other person, organization or entity; (d) entrant has confirmed and is unaware of any contractual obligations entrant has which may be inconsistent with these Terms and Conditions and the rights entrant is required to have in the entry, including but not limited to any prohibitions, obligations or limitations arising from any current or former employment arrangement entrant may have; (e) entrant is not disclosing the confidential, trade secret or proprietary information of any other person or entity, including any obligation entrant may have in connection arising from any current or former employment, without authorization or a license; and (f) entrant has full power and all legal rights to submit an entry in full compliance with these Terms and Conditions.


The future of your industry starts here.