Who summarizes the summarizer?

Introducing DPR — measuring what the synthesizer does with objections

February 24, 2026 · ThoughtProof · pot-sdk v0.1.4

Everyone in multi-agent AI focuses on two things: which model generates, and which model criticizes. Nobody talks about the synthesizer — the model that reads all the outputs and produces the final answer.

That's a problem. Because in a 3-rotation deep analysis we ran this week with identical inputs, rotating only the synthesizer produced a 47-point confidence spread: from 45% to 92%, same question, same generators, same critic.

SAME INPUT, ROTATED SYNTHESIZER
Run 1 (Anthropic synthesizes) 92%
Run 2 (xAI synthesizes)        68%
Run 3 (Moonshot synthesizes)   45%
Δ = 47 points. MDI = 0.75. Same question throughout.

The synthesizer doesn't just combine outputs — it decides which arguments survive. A critic can raise five objections. The synthesizer can quietly discard four of them and produce a confident-sounding summary. To any downstream consumer, that looks like consensus. It isn't.

We call this false consensus. And we built two metrics to detect it.

SAS and DPR: auditing the synthesis layer

SAS — Synthesis Audit Score

Measures how evenly the synthesis covers all generator proposals. SAS < 0.5 + one generator dominating >60% of coverage = bias flag. Dominance isn't the problem. Undocumented dominance is.

DPR — Dissent Preservation Rate

Measures what fraction of critic objections actually appeared in the synthesis. A high DPR means the synthesis engaged with the objections. A low DPR means it didn't — regardless of what the confidence score says.

The formula is simple:

DPR = objections preserved in synthesis / total objections raised
false_consensus = DPR < 0.4 AND SAS warned AND ≥2 objections detected

The false_consensus flag only fires when all three conditions are true together — which prevents false positives when the critic simply agrees with the generators.

What the benchmarks show

We ran DPR against 8 adversarial test cases in pot-benchmarks v2.0.0, covering scenarios from complete objection suppression to perfect preservation:

False consensus detected: 2/8 cases

Both flagged cases: synthesizer ignored all critic objections despite a SAS warning. DPR = 0.0 in both. The synthesis read as confident and coherent — that was the tell.

DPR = 1.0 when critic agreed

When no objections were raised (critic affirmed the generators), DPR correctly returns 1.0 — no dissent to preserve means no dissent was lost.

DPR also handles markdown bullet-point critiques natively — critics that write - This claim is unsupported are treated the same as prose objections. This matters because most critic outputs in practice use list formatting.

The uncomfortable implication

Most multi-agent pipelines assume that running multiple models improves reliability. That's true for the generation and critique layers. But if a single model synthesizes all of that, you've introduced a single point of epistemic failure at exactly the step that produces your output.

Put differently: if you're running multi-agent pipelines without auditing the synthesizer, you might be doing single-agent reasoning with extra API costs.

The 47-point spread isn't a bug. It's the synthesizer's prior, expressed as a confidence score. SAS and DPR don't eliminate that — they make it visible.

Bias profiles we observed

Across three deep runs, the critic rotation revealed consistent patterns:

None of these is wrong. All three are useful — but only when the synthesizer documents which perspective it weighted and why. That documentation is what DPR measures.

Using it

DPR and SAS ship in pot-sdk v0.1.4:

npm install pot-sdk

import { computeDPR } from 'pot-sdk';

const result = computeDPR(critiqueText, synthesisText, sasWarning);
// result.score           → 0.0–1.0
// result.false_consensus → boolean
// result.total_objections
// result.preserved

In pot-cli, DPR runs automatically on every ask and deep command and is stored in block.metadata.dpr. The CLI displays 🟢 / 🟡 / 🔴 based on score thresholds.

The goal of multi-model verification isn't to produce a louder consensus. It's to surface where genuine disagreement exists — and then be honest about how that disagreement was resolved. SAS and DPR are the instrumentation for that honesty.

A synthesis that documents "I weighted the conservative estimate because two of three critics flagged the aggressive projection as speculative" is more trustworthy than one that returns 87% confidence without explaining why the 45% estimate was discarded.

Try it

npm install -g pot-cli
pot deep "Your question" --runs 3 --lang en

pot-sdk · pot-benchmarks · npm · Protocol Specification