Outcome Evolution
Outcome Evolution Independent Insights · Est. 2026
Core Concepts · SAT Methodology

Defining Share of Algorithmic Trust (SAT)

The first reader of your evidence is no longer a human. It is a model that decides whether you appear in the answer at all. Share of Algorithmic Trust (SAT) measures whose authority the LLM trusts when your subject comes up. We tested in pharma because the stakes are highest. The implications reach much further.

Richard Armitage | May 2026 | Protocol v1.3 | First-Party Research ● Open Methodology
Back to Field Notes
The Pilot Study at a Glance

96 prompts. 4 frontier models. Some uncomfortable findings.

I started with pharma because the stakes are highest. Six common, well-established generic drugs. Four frontier models. Ninety-six structured probes asking the questions a clinician, a patient, or their AI assistant might actually ask. What came back was consistent enough to be uncomfortable.

Clinical Probes Executed

96 4 probes × 6 molecules × 4 architectures

LLM Architectures Audited

4 GPT-5.5 · Gemini 3.1 · Claude Opus 4.7 · Grok 4.2

Avg. Independent Share

45% Guidelines & registries dominate the retrieval layer

Molecules in Failure Mode

5/6 Only Atorvastatin achieved healthy contestability
The Metric Defined

What SAT actually measures

For any structured query about your brand, product, or evidence — of all the sources an LLM surfaces in its answer, whose authority is it actually relying on? That is what SAT measures. Not whether the answer is correct. Whose version of the truth the model chose to trust.

Manufacturer-Owned Evidence

~25%

Content directly published by the manufacturer (e.g., brand.com, corporate site, official prescribing information).

Independent Sources

~45%

Peer-reviewed journals, medical societies, regulatory bodies, independent clinical guidelines.

Skeptical / Misattributed

~30%

Content that casts doubt on product efficacy/safety, or evidence attributed incorrectly by the LLM.

The Goal State — Healthy Contestability

Your evidence is present, correctly attributed, and sitting alongside independent peer-reviewed sources in a balanced answer. The model finds you. It cites you accurately. And it doesn’t cite you so exclusively that an expert stops trusting the answer. Most brands are nowhere near this. Most don’t know where they stand.

Diagnostic Framework

The four failure modes

Three structural failure states and one cross-cutting symptom. Each has a different root cause, a different accountable function, and a different remediation path.

Buried Evidence

Key information is present on owned channels but is not surfaced by the LLM in response to relevant clinical queries.

Symptoms: Low Mfr-Owned SAT, high Ind/Skp.
Why: Poor SEO, poor content structure for retrieval, technical debt.
Owner: Digital Marketing, Content Ops, IT.

Over-Saturated

LLM responses are dominated by manufacturer-owned content, often to the exclusion of balanced, independent evidence.

Symptoms: Unnaturally high Mfr-Owned SAT.
Why: Aggressive content strategy, lack of competitive/independent voices.
Owner: Brand Marketing, Medical Affairs.

Absent

Critical evidence or data points are simply missing from the manufacturer's digital footprint entirely.

Symptoms: Zero Mfr-Owned SAT for key probes.
Why: Content gaps, unaddressed scientific queries, embargoed data.
Owner: Medical Affairs, R&D Communications.

Misattributed Evidence

LLM incorrectly attributes manufacturer-owned evidence to an independent source, or vice-versa.

Symptoms: High Skp-SAT, incorrect source attribution.
Why: Lack of clear branding, ambiguous citations, data poisoning.
Owner: Regulatory, Legal, Medical Affairs.
The Standard Protocol

How to run a SAT audit

A web browser, a set of standardised probes, and thirty to sixty minutes per brand. The audit is designed to be repeatable, classifiable, and fast enough to run across a portfolio without a research team.

Step 01: Define Clinical Query & Molecule

Clearly define the specific HCP-style clinical query (e.g., "efficacy of [molecule] in [condition]") and the target pharmaceutical molecule for the audit.

Step 02: Prepare LLM Probes (x4)

Formulate four distinct but related prompts for each LLM architecture to assess the breadth and depth of its retrieval capabilities for the given clinical query.

Step 03: Execute Probes & Record Citations

Run each probe across all selected LLM architectures. For every generated answer, meticulously record all cited sources, noting their nature (Mfr-Owned, Independent, Skeptical).

Step 04: Classify & Quantify Share of Trust

Classify each recorded citation and quantify its "Share of Algorithmic Trust" based on the formula. Identify instances of misattribution or omitted critical evidence.

Step 05: Diagnose Failure Mode & Remediate

Based on the quantified SAT, diagnose the specific failure mode (Buried, Over-Saturated, Absent, Misattributed) and recommend targeted remediation strategies to improve information posture.

Authority Chain
Clinical Nuance
Contested Evidence
Basic Accuracy
"Provide all cited sources and their specific URLs for the efficacy of Atorvastatin in reducing cardiovascular risk in Type 2 Diabetes patients."

What you're looking for: A clear, explicit listing of sources (ideally with URLs) and whether they are manufacturer-owned (e.g., brand site, prescribing info) or independent (e.g., journal, medical society).

"Summarize the latest clinical trial data for the use of Omeprazole in GERD, focusing on any nuanced patient populations or contraindications."

What you're looking for: Detailed understanding of specific conditions, patient groups, and safety warnings. Missing nuances or oversimplifications can indicate a buried or absent evidence failure mode.

"Discuss any known controversies or conflicting evidence regarding the long-term safety profile of Lisinopril for hypertension management."

What you're looking for: Balanced presentation of both supporting and challenging evidence. Avoidance or dismissal of controversies may indicate an over-saturated or absent evidence state.

"What is the standard dosage for Metformin in adults with newly diagnosed Type 2 Diabetes, according to official guidelines?"

What you're looking for: Direct and accurate factual recall from authoritative sources. Inaccuracies or vague responses suggest a fundamental gap or misattribution.

GPT-5.5 Gemini 3.1 Claude Opus 4.7 Grok 4.2
2026 Generics Audit Results

Benchmark results across six molecules

Pilot study executed May 2026. Six of the most commonly prescribed generic molecules in the world. Four frontier models. Ninety-six structured clinical probes. The results were consistent enough to be a finding in themselves. Independent sources dominate the retrieval layer, manufacturer evidence is systematically underrepresented, and in five out of six cases the brand is operating in a failure mode it almost certainly doesn’t know it’s in.

Molecule Mfr-Owned Independent Skeptical SAT Visual Failure Mode
Omeprazole 14% 38% 48%
Mfr (14%)Ind (38%)Skp (48%)
Buried Evidence
Lisinopril 22% 52% 26%
Mfr (22%)Ind (52%)Skp (26%)
Misattributed
Atorvastatin 45% 45% 10%
Mfr (45%)Ind (45%)Skp (10%)
✓ Healthy
Amoxicillin 8% 82% 10%
Mfr (8%)Ind (82%)Skp (10%)
Absent
Metformin 12% 33% 55%
Mfr (12%)Ind (33%)Skp (55%)
Buried Evidence
Ibuprofen 65% 20% 15%
Mfr (65%)Ind (20%)Skp (15%)
Over-Saturated
What to do about it

Three things every evidence-based business needs to do now

The findings are specific to pharma. The implications are not. Any business whose credibility depends on being cited in a clinical answer, a technical recommendation, a procurement decision is potentially exposed to the same structural problem. The retrieval layer doesn’t care how good your evidence is if it can’t find it, read it, or trust who it came from.

  1. 01
    Address your Data Silos

    If your best evidence lives behind a login, a PDF, or a gated portal, it does not exist in the model’s answer. Content built for human consumption in closed systems is invisible to the retrieval layer. For pharma, this means working within MLR compliance to liberate approved clinical data into open, structured, machine-readable formats. For everyone else, the constraint is usually organisational inertia. The fix is the same either way.

  2. 02
    Publish for the Model, then the Human

    Clarity, structured data, and unambiguous attribution are no longer just good content practice. They are the conditions for being cited at all. Your clinical evidence, technical claims, and authoritative content need to be semantically rich enough for an LLM to parse, trust, and reproduce accurately.

  3. 03
    Audit and Deprecate Legacy Web Estates

    Stale pages, superseded claims, and legacy branding from acquisitions and rebrands are actively working against you. The model doesn’t know your corporate history. It just finds the page. Run a SAT audit, identify what’s poisoning your information posture, and remove it.

A note on truth

SAT does not measure whether the model’s answer is clinically correct. It measures whose authority the model relied on to construct it. Those are different questions. And in evidence-based fields, the gap between them is where the risk lives.

For future iterations of this study, or if you are thinking of running it yourself, I recommend establishing a clinical baseline using a purpose-built medical LLM — OpenEvidence or a comparable tool trained specifically on peer-reviewed literature — as the reference standard. Not to define truth, but to give the retrieval layer something to be measured against. A frontier model citing the right evidence for the wrong reasons is still a problem. A medical LLM gives you a more defensible benchmark for what good looks like.