Technical deep-dives on AI verification, security audits, and trust infrastructure.
A real-world case study from a live ERC-8183 co-evaluator exercise. Two independent evaluators, two on-chain settlements, and a verification pipeline that caught its own mistakes before they became permanent.
150 sentinel cases, 10 models, 1500 evaluations. Rasch psychometric calibration reveals which AI judges to trust — and which broke under production load. SERV models outperform frontier models; Gemini and nano flagged as misfits.
SERV Reasoning vs. our prior production cascade across 120 plan-level verification cases. 107× performance per dollar, 0 false ALLOWs, 0 API failures. Why the reliability number matters more than the cost headline for banking and MRM — and why this changes what PLV can be deployed as.
Three chatbots. 15 questions a bank risk officer would ask. Every answer verified against the actual regulation. Correct risk identification ≠ regulatorily compliant answer.
A pattern keeps showing up in the agent-verification literature: a single verification stage is not enough. You can verify what an agent plans to do before it acts. You can verify what an agent actually produced after it acts. These are different problems.
AI prior authorization has workflow, decision, and governance layers. What it still lacks is a verification layer. The next bottleneck is no longer throughput. It is defensibility.
Good agent plans often look incomplete when verifiers assume the same granularity as the reference trace. ThoughtProof v2 shows why segment-aware support is the structural fix.
A new Oxford/NYU paper on ERC-8004 trust architectures identifies the gap between cryptographic integrity and decision alignment. Proofs guarantee integrity, not alignment. Here's the missing layer.
We found a theoretical attack vector in our own multi-model verification pipeline. Then we shipped the fix — in the same session. Three layers of injection defense, inspired by Anthropic's Sectioning pattern.
Ethereum's Trustless Agents standard defines identity, reputation, and validation registries for AI agents. The Validation Registry is still a design space. Here's how epistemic verification fills it.
Harvard, MIT, and Stanford red-teamed AI agents for two weeks. They found 11 failure categories. We found the same patterns in our own audits — and we're building the fix.
Article-by-article mapping: how multi-model verification satisfies Art. 9, 13, 14, and 43 — with code examples and an honest assessment of limitations.
Discovery → Orchestration → Verification → Trust. Most teams stop at Layer 0. The real risk lives in Layer 2.
How we turned 20+ real security findings into automated detection rules — and what the false positive journey teaches about AI security tooling.
How to systematically audit Model Context Protocol servers for prompt injection, authority bypass, and trust boundary violations.
Claim → Perspectives → Synthesis → Hash. Everything else builds on this: receipts, trust scores, cross-agent verification.
When models disagree, most systems pick the majority. We preserve the dissent — because the minority opinion is often right.
Perplexity, LangChain, and CrewAI solve orchestration (Layer 1). ThoughtProof solves verification (Layer 2). Here's why the distinction matters.
We asked Grok to verify its own output, then ran the same claim through the full pipeline. The results are instructive.
Spoiler: No. But the failure mode is fascinating and reveals exactly why multi-model verification exists.
When AI agents call other agents, how do you verify the chain? Epistemic blocks create a verifiable provenance trail.
Parallel execution, smart routing, and when to skip verification entirely. Latency numbers from production.
The first npm package for epistemic verification. What's in it, how to use it, and what's coming next.