PhaseFolio Validation Study

Back-Test Results: RA Drug Cohort

16 historical rheumatoid arthritis drugs evaluated against PhaseFolio's rNPV engine using indication-specific transition rates computed from 679 curated clinical trials. Pairwise AUC of 0.625 passes the 0.60 target; phase-controlled AUC of 0.65 confirms the discriminative signal within decision phase.

2026-05-29 · 16 drugs · 679 enriched trials · 10,000 MC iterations per drug

Pairwise AUC

0.625

40/64 pairs · target ≥0.60

PASS

Phase-Controlled AUC

0.65

target ≥0.55

PASS

Risk Flag Sensitivity

87.5%

7/8 failures flagged · target ≥70%

PASS

Best-Threshold Accuracy

62.5%

at ≥30% PoS · 66.7% precision

BEST CUT

Key finding: Indication-specific transition rates computed from 679 curated RA clinical trials let the model rank eventual successes above failures: pairwise AUC of 0.625 (passes the 0.60 target) and phase-controlled AUC of 0.65. A central driver is correcting the NDA/BLA success rate from the 91% "given-an-NDA-was-filed" benchmark to a computed rate of ~42% drawn from actual FDA approval outcomes, which captures the full attrition an investor faces at the decision point.

Data Foundation

How We Built the Dataset

Raw ClinicalTrials.gov data lacks the drug-level structure needed for transition rate computation. We built a 9-phase enrichment pipeline to transform 1,304 raw RA trials into 679 curated records with CMO-grade intelligence.

Ingest Raw CT.gov Data

192,411 interventional studies ingested via ClinicalTrials.gov API. Linked condition mappings (420K rows) and intervention data (424K rows) stored in Supabase.

Filter for Rheumatoid Arthritis

1,304 unique RA trials identified by condition text matching across Phase 1 through Phase 4, spanning 1990s to present.

Cross-Reference 4 Data Sources

Each trial enriched by AI agent cross-referencing: ClinicalTrials.gov (structured fields), FDA Drugs@FDA (regulatory data + approval dates), PubMed (published efficacy), and web search (press releases, analyst reports). Confidence score computed per trial.

Drug-Class Knowledge Mapping

Pharmacology domain knowledge applied per drug class: drug_class, mechanism_of_action, molecular_target, modality, route, dosing. Batched by class — Anti-TNF first (~180 trials), then JAK (~120), IL-6, Anti-CD20, etc. 32 drug classes identified and consolidated.

Outcome & Efficacy Extraction

Published pivotal trial results mapped: ACR20/50/70 response rates, p-values, comparator results. Terminated trials mapped via CT.gov’s why_stopped field. Strict anti-hallucination rules: only include numbers with high confidence, always cite study name and timepoint.

Verification & Bias Checks

Random sample spot checks, drug class distribution sanity checks, FDA date cross-referencing. Completion rates verified between raw (1,304) and enriched (679) datasets — identical within 0.5pp at every phase. No survivorship bias.

679

Enriched Trials

Distinct Drugs

Drug Classes

Columns Per Trial

Outcome Summary Coverage100%

Drug Class / MoA / Target99.9%

FDA Regulatory Linkage73%

Quantitative Efficacy Data55%

Data integrity verified: We compared completion-to-termination ratios between raw CT.gov data (1,304 RA trials) and the enriched dataset (679 trials). Rates are virtually identical at every phase (within 0.5pp), confirming the enrichment process did not selectively retain successful trials. The 625 excluded trials lacked drug-level metadata (non-drug interventions, unmappable entries), not outcomes.

Methodology

How the Back-Test Works

Each drug is evaluated using only information available before its real-world decision point. No future data leaks into the model.

Curate 679 RA Trials

Deep Dive agent enriches raw CT.gov data with FDA, PubMed, and web sources. 71 distinct drugs, 45 structured columns.

Compute Drug-Level Transition Rates

Time-gated: only data before decision date. Drug-level counting (not trial-level). 3-tier fallback: drug-class (n≥5) → RA-overall → BIO/QLS benchmark.

Reconstruct Decision Point

For each drug, identify what was known at its go/no-go moment. Costs, competitive landscape, target validation history.

Apply Multipliers

Target validation (0-2+ prior class approvals), competitive density, risk flags. All via logistic adjustment to keep PoS bounded.

Run rNPV Engine + Monte Carlo

10,000 iterations per drug with Bernoulli stage gates. Same production engine used by PhaseFolio customers.

Score Against Actual Outcomes

Pairwise AUC, phase-controlled AUC, go/no-go threshold sweep, risk flag sensitivity.

Results

Predicted Cumulative PoS by Drug

Bars show the model's predicted cumulative probability of success for each drug, sorted within group. All values computed prospectively (no hindsight).

Approved (8 drugs)

AdalimumabHumira · Anti-TNF

57.8%

EtanerceptEnbrel · Anti-TNF

44.4%

RituximabRituxan · Anti-CD20

36.3%

SarilumabKevzara · IL-6

31.6%

AbataceptOrencia · T-cell

25.7%

TofacitinibXeljanz · JAK

25%

BaricitinibOlumiant · JAK

24.4%

UpadacitinibRinvoq · JAK

13.4%

Failed (8 drugs)

OcrelizumabAnti-CD20

39.5%

FilgotinibJAK

39.3%

PeficitinibJAK

27.3%

FostamatinibSYK

26.6%

TabalumabAnti-BAFF

25.1%

DecernotinibJAK

13.7%

VobarilizumabIL-6

11.7%

AtaciceptBAFF/APRIL

7.9%

Mean PoS (approved): 32.3% · Mean PoS (failed): 23.9% · Separation: +8.4pp (target 10pp — below threshold at n=16)

Cohort

16-Drug RA Back-Test Cohort

Drug	Brand	Mechanism	Outcome
Adalimumab	Humira	Anti-TNF	Approved
Etanercept	Enbrel	Anti-TNF	Approved
Tofacitinib	Xeljanz	JAK	Approved
Upadacitinib	Rinvoq	JAK	Approved
Baricitinib	Olumiant	JAK	Approved
Abatacept	Orencia	T-cell	Approved
Sarilumab	Kevzara	IL-6	Approved
Rituximab	Rituxan	Anti-CD20	Approved
Tabalumab	—	Anti-BAFF	Failed
Fostamatinib	—	SYK	Failed
Filgotinib	—	JAK	Failed
Peficitinib	—	JAK	Failed
Atacicept	—	BAFF/APRIL	Failed
Ocrelizumab	—	Anti-CD20	Failed
Decernotinib	—	JAK	Failed
Vobarilizumab	—	IL-6	Failed

Case Studies

Deep Dives

Strongest No-Go Signal

Atacicept

BAFF/APRIL inhibitor · Merck Serono · Decision: January 2008

PhaseFolio assigned the lowest cumulative PoS in the cohort (7.9%) with three risk flags: FIRST_IN_CLASS_RISK, NOVEL_MODALITY, and LIMITED_TRIAL_DATA. The target validation multiplier applied 0.60x for zero prior approvals. Monte Carlo showed 92.1% probability of negative outcome.

Actual outcome: Phase 2 terminated due to severe immunoglobulin reduction and fatal infections.

7.9%

Predicted PoS

$6M

rNPV

92.1%

P(Negative)

Methodology

Computed Transition Rates

NDA/BLA transition rate correction · 679 enriched trials

Static BIO/QLS benchmarks assign 91% NDA/BLA success — but that measures "given an NDA was filed, did it succeed?" Indication-specific rates computed from the enriched-trials corpus (used when n≥5 drugs at a phase, with BIO/QLS 2021 immunology benchmarks as fallback) instead answer the real investment question: "given a drug reached Phase 3, did it ultimately get FDA approval?" The computed rate of ~42% captures the full attrition the static benchmark hides.

This reframing of the NDA/BLA question is a central source of discriminative signal — pairwise AUC 0.625, phase-controlled AUC 0.65.

91%

BIO/QLS NDA

~42%

Computed NDA

0.625

Pairwise AUC

Limitations

This validation uses 16 drugs — sufficient for proof of concept, but not statistically powered for calibration. Discrimination passes: pairwise AUC 0.625 (target 0.60) and phase-controlled AUC 0.65 (target 0.55) confirm the model ranks eventual successes above failures beyond structural phase bias. Calibration and separation are weak at this sample size: the separation gap is +8.4pp (below the 10pp target) and the false-confidence rate at the 25% PoS cut is 50% (above the 20% target). Cross-indication validation (oncology) and larger cohorts are the planned next steps. The computed indication-specific transition rates described here are a research approach; current production uses static BIO/QLS 2021 base rates. See the full research report for detailed methodology.