Advancing AI Alignment through Human-Judged LLM Debates

Learn how Pareto helped MATS obtain high quality data for their research

Companies

MATS, UC London, and Anthropic

Industry

Large Language Models

Advancing AI Alignment through Human-Judged LLM Debates

Overview

The ML Alignment & Theory Scholars (MATS) Program, together with researchers from Anthropic and University College London needed high-quality data to test scalable AI oversight mechanisms for ensuring AI safety and alignment.

The project demanded a highly-skilled AI training workforce to judge debates between language models, adaptable task guidelines to reflect the iterative nature of research, and direct collaboration channels for quick feedback between researchers and workers.

Pareto.AI assisted the consortium of researchers in collecting high-quality data from human-judged debates between LLMs.

Our advantage: Why they chose Pareto.AI

“We first tried to set this up with other crowd work providers. It took weeks to even get on a zoom call with them, and the call made it clear that they didn't have the flexibility to meet our needs. Pareto.AI, on the other hand, got on a call with us within 24 hours of us first reaching out, came up with a plan tailored for our needs, and we were running our first experiment with them within a couple of weeks.”

-Dan Valentine

The researchers ran into significant problems finding the right partner for this project. Crowd work providers were hesitant to embrace the project's complexity, particularly its need for iterative weekly changes, customized workflows, and direct communication with labelers during the project.

Some notable providers in this space demonstrated a lack of responsiveness, taking over a month to reply to initial inquiries and showing reluctance to facilitate direct communication between the researchers and the workers. Their disinterest in supporting a project of this size and complexity revealed a gap between the researchers' needs and what most data labeling vendors could offer.

Existing crowd work platforms were not built to handle projects of this nature, as they are primarily designed for high-volume, repeatable tasks, making them unsuitable for projects that require nuanced understanding, expert judgment, or the ability to rapidly iterate based on evolving guidelines.

Additionally, most existing systems are structured around minimizing direct contact with workers to simplify management and reduce overhead, which unfortunately can lead to misunderstandings, generic feedback, and a slower resolution of issues. For a project of this nature, the lack of direct engagement and adaptability could hinder its success.

Solution

Pareto sourced, onboarded, and trained 20 experts in less than a month through referral-based sourcing, leveraging the strength of our extensive network of highly-skilled workers.

We secured and retained top-tier talent by committing upfront to guaranteed working hours for a carefully vetted group of expert workers over an extended period. We also implemented a thorough testing and qualification process, which included a week of feedback.

The data collection project involved judging debates between LLM responses, where the goal was to choose the correct answer to a question presented in the debate. Pareto oversaw the data collection process from start to finish, providing daily updates on debate judgements from labelers and ensuring adherence to quality standards.

Bugs and high latency were identified as significant issues that could impact workers' performance. To address this, we proactively aided researchers in implementing a robust error recovery system and conducted thorough testing of the platform before its rollout to workers. This was made possible because our system is intentionally designed to accommodate quick pivots and adjustments.

To facilitate collaboration between workers and requesters, Pareto established direct communication channels, enabling real-time interaction between researchers and workers.

Pareto's model, by prioritizing open dialogue and immediate feedback, fostered a more collaborative and adaptive environment that traditional platforms cannot easily implement for such projects.

“The workers Pareto provided were fantastic, super communicative and responsive, and very skilled at this unusual, difficult task. We required a written explanation with each label - I've read hundreds of these, and am amazed at the consistently high quality and attention to detail provided throughout the experiments.”

-Dan Valentine

Results

“Working with Pareto was crucial to the success of our human experiments. They'll be my first choice for any crowd work we need in the future.”

-Dan Valentine

As a result of Pareto's high-quality, comprehensive data, researchers from MATS, Anthropic, and UCL were able to derive crucial insights:

As language models get more capable, debating with LLM responses enables scalable oversight by non-expert human evaluators.
Language models optimized for "judge approval" become more truthful in the process of debates. In other words, debating with persuasive LLMs lead to more truthful answers.
Debate with language models leads to higher judge accuracy than consultancy.

Powerful insights like these allow MATS, Anthropic, and UCL researchers to better understand where they need to focus their efforts next. Furthermore, these results pave the way for future research on adversarial oversight methods and protocols that enable non-experts to elicit truth from sophisticated language models.

“Working with Pareto was great. They understood our problem immediately, sourced high-quality annotators and understood the importance of rigorous human evaluation.”

-Akbir Khan

You can read MATS' research paper on the viability of aligning AI models with debate in the absence of ground truth here.

Need help with LLM development, fine-tuning, or research?

Reach out to discover how we can help you obtain premium-quality data for your projects.

Get ready to join forces!

When do you want to get started?

Join leading AI companies in harnessing world-class AI talent.

Harness world-class AI talent.