Protocol Blog pot PLV API

Blog

Technical deep-dives on AI verification, security audits, and trust infrastructure.

⚖️ Three AI Models Said "Block." Only the Process Got It Right.

A real-world case study from a live ERC-8183 co-evaluator exercise. Two independent evaluators, two on-chain settlements, and a verification pipeline that caught its own mistakes before they became permanent.

Case Study ERC-8183 RV AHM · June 4, 2026 · 10 min

📊 We Calibrated 10 AI Models with Rasch. Here's What Broke.

150 sentinel cases, 10 models, 1500 evaluations. Rasch psychometric calibration reveals which AI judges to trust — and which broke under production load. SERV models outperform frontier models; Gemini and nano flagged as misfits.

Research Rasch Calibration · May 2026

📊 A 120-case PLV benchmark — and why reliability is the number that matters

SERV Reasoning vs. our prior production cascade across 120 plan-level verification cases. 107× performance per dollar, 0 false ALLOWs, 0 API failures. Why the reliability number matters more than the cost headline for banking and MRM — and why this changes what PLV can be deployed as.

Benchmark PLV SERV Reasoning Banking MRM · May 11, 2026 · 7 min

🏦 We Audited AI Chatbots on Banking Regulation — They Get the Risk Right but the Rules Wrong

Three chatbots. 15 questions a bank risk officer would ask. Every answer verified against the actual regulation. Correct risk identification ≠ regulatorily compliant answer.

Banking MRM Verification EU AI Act PLV · May 3, 2026 · 12 min

🏗️ Two Layers, One Stack: Why Agent Verification Has to Happen Twice

A pattern keeps showing up in the agent-verification literature: a single verification stage is not enough. You can verify what an agent plans to do before it acts. You can verify what an agent actually produced after it acts. These are different problems.

Standards Verification ERC-8004 ERC-8210 · May 2, 2026 · 8 min

🏥 The Missing Verification Layer in AI Prior Authorization

AI prior authorization has workflow, decision, and governance layers. What it still lacks is a verification layer. The next bottleneck is no longer throughput. It is defensibility.

Healthcare Prior Auth Verification · April 14, 2026 · 7 min

🧭 Why Step-Level Verification Breaks on Compressed Plans

Good agent plans often look incomplete when verifiers assume the same granularity as the reference trace. ThoughtProof v2 shows why segment-aware support is the structural fix.

ThoughtProof v2 Plan-Level Verification Architecture · April 18, 2026 · 7 min

🔬 The Reasoning Gap: Why Cryptographic Proofs Aren't Enough for Safe AI Agents

A new Oxford/NYU paper on ERC-8004 trust architectures identifies the gap between cryptographic integrity and decision alignment. Proofs guarantee integrity, not alignment. Here's the missing layer.

Research ERC-8004 Trust Architecture · March 21, 2026 · 6 min

🛡️ We Audit AI Agents. Can Someone Hack Our Verifiers?

We found a theoretical attack vector in our own multi-model verification pipeline. Then we shipped the fix — in the same session. Three layers of injection defense, inspired by Anthropic's Sectioning pattern.

Security Verification Prompt Injection · March 8, 2026 · 8 min

⛓️ ERC-8004 Needs a Verification Engine. Here's Ours.

Ethereum's Trustless Agents standard defines identity, reputation, and validation registries for AI agents. The Validation Registry is still a design space. Here's how epistemic verification fills it.

ERC-8004 Ethereum Trust Validation · March 7, 2026 · 10 min

🔥 Agents of Chaos: What 30 Researchers Found — and What They Missed

Harvard, MIT, and Stanford red-teamed AI agents for two weeks. They found 11 failure categories. We found the same patterns in our own audits — and we're building the fix.

Security Multi-Agent Red Teaming Trust · March 7, 2026 · 11 min

🇪🇺 EU AI Act Compliance with Epistemic Verification

Article-by-article mapping: how multi-model verification satisfies Art. 9, 13, 14, and 43 — with code examples and an honest assessment of limitations.

EU AI Act Compliance High-Risk AI · March 5, 2026 · 12 min

🛡️ The 4 Layers of AI Agent Security

Discovery → Orchestration → Verification → Trust. Most teams stop at Layer 0. The real risk lives in Layer 2.

Security Framework · March 3, 2026 · 8 min

🔍 From CVEs to Semgrep Rules: Building AI Agent Security Scanners

How we turned 20+ real security findings into automated detection rules — and what the false positive journey teaches about AI security tooling.

Security Semgrep Tooling · March 3, 2026 · 10 min

🔬 Auditing MCP Servers: A Methodology

How to systematically audit Model Context Protocol servers for prompt injection, authority bypass, and trust boundary violations.

MCP Security Methodology · March 2, 2026 · 9 min

📦 Epistemic Blocks: The Atomic Unit of AI Verification

Claim → Perspectives → Synthesis → Hash. Everything else builds on this: receipts, trust scores, cross-agent verification.

Architecture Core Concepts · February 28, 2026 · 7 min

⚖️ Dissent Preservation Ratio: Why Disagreement Matters

When models disagree, most systems pick the majority. We preserve the dissent — because the minority opinion is often right.

Metrics Architecture · February 26, 2026 · 6 min

🔄 Orchestration vs. Verification: Why They're Different Problems

Perplexity, LangChain, and CrewAI solve orchestration (Layer 1). ThoughtProof solves verification (Layer 2). Here's why the distinction matters.

Architecture Positioning · February 25, 2026 · 5 min

🤖 Grok vs. PoT Pipeline: Single Model vs. Multi-Model

We asked Grok to verify its own output, then ran the same claim through the full pipeline. The results are instructive.

Benchmarks · February 24, 2026 · 5 min

🪞 Can Grok Audit Itself? A Self-Verification Experiment

Spoiler: No. But the failure mode is fascinating and reveals exactly why multi-model verification exists.

Experiments Benchmarks · February 23, 2026 · 6 min

🔗 AI Supply Chain Auditing with Epistemic Blocks

When AI agents call other agents, how do you verify the chain? Epistemic blocks create a verifiable provenance trail.

Architecture Trust · February 22, 2026 · 7 min

⚡ Verification Latency: How Fast Can Multi-Model Consensus Be?

Parallel execution, smart routing, and when to skip verification entirely. Latency numbers from production.

Performance Architecture · February 21, 2026 · 5 min

📦 pot-sdk v0.1: First Public Release

The first npm package for epistemic verification. What's in it, how to use it, and what's coming next.

Release SDK · February 20, 2026 · 4 min