Perspectives from the frontier
Updates, insights and stories from the people building Pareto.

LLM Metacognition: Shared and Shallow?
Across 19 frontier models, metacognitive confidence on question and answer tasks tracks a shared difficulty heuristic with only a weak relationship to actual performance.
LLM Metacognition: Shared and Shallow?
Across 19 frontier models, metacognitive confidence on question and answer tasks tracks a shared difficulty heuristic with only a weak relationship to actual performance.

The bar exam was not designed for this
AI models pass the bar. Credentials weren't built for that and the methodology to fix them is already being built in post-training.
The bar exam was not designed for this
AI models pass the bar. Credentials weren't built for that and the methodology to fix them is already being built in post-training.

Confidence needs calibration
Frontier labs spend billions on reasoning and accuracy. Almost nobody trains models to know when to say, "I'm not sure."
Confidence needs calibration
Frontier labs spend billions on reasoning and accuracy. Almost nobody trains models to know when to say, "I'm not sure."

You can't prompt your way to safety
AI models are giving medical and mental health advice to millions of people. Can you prevent harmful advice by adding safety instructions to the prompt? The UK's AI Safety Institute (AISI) recently tested this.
You can't prompt your way to safety
AI models are giving medical and mental health advice to millions of people. Can you prevent harmful advice by adding safety instructions to the prompt? The UK's AI Safety Institute (AISI) recently tested this.

90% of human expertise is not verifiable
RLVR's verification crisis exposes a fundamental gap in how AI measures expert judgment across professional domains
90% of human expertise is not verifiable
RLVR's verification crisis exposes a fundamental gap in how AI measures expert judgment across professional domains

Advancing AI alignment through human-judged LLM debates
Learn how Pareto helped MATS obtain high quality data for their research.
Advancing AI alignment through human-judged LLM debates
Learn how Pareto helped MATS obtain high quality data for their research.

A Community-Driven Vision for a New Knowledge Resource for AI
Insights from 50+ researchers at an AAAI workshop toward an open engineering framework for knowledge modules in AI
A Community-Driven Vision for a New Knowledge Resource for AI
Insights from 50+ researchers at an AAAI workshop toward an open engineering framework for knowledge modules in AI

Annotation fatigue: Why human data quality declines over time
Learn how prolonged annotation tasks lead to fatigue, reduced data quality, and slower output, and discover research-backed strategies Pareto AI uses to keep annotators engaged.
Annotation fatigue: Why human data quality declines over time
Learn how prolonged annotation tasks lead to fatigue, reduced data quality, and slower output, and discover research-backed strategies Pareto AI uses to keep annotators engaged.

The micro-decisions made by AI trainers that define data quality
Discover how micro-decisions by AI trainers shape data quality, safety, and alignment in LLMs.
The micro-decisions made by AI trainers that define data quality
Discover how micro-decisions by AI trainers shape data quality, safety, and alignment in LLMs.

The false dichotomy of "synthetic data vs. human data"
We provide actionable strategies on how AI companies can effectively combine synthetic and human data to enhance model performance
The false dichotomy of "synthetic data vs. human data"
We provide actionable strategies on how AI companies can effectively combine synthetic and human data to enhance model performance