PhaseFolio Validation Study

Back-Test Results: RA Drug Cohort

16 historical rheumatoid arthritis drugs evaluated against PhaseFolio's rNPV engine using indication-specific transition rates computed from 679 curated clinical trials. Pairwise AUC of 0.625 passes the 0.60 target; phase-controlled AUC of 0.65 confirms the discriminative signal within decision phase.

2026-05-29 · 16 drugs · 679 enriched trials · 10,000 MC iterations per drug
Pairwise AUC
0.625
40/64 pairs · target ≥0.60
PASS
Phase-Controlled AUC
0.65
target ≥0.55
PASS
Risk Flag Sensitivity
87.5%
7/8 failures flagged · target ≥70%
PASS
Best-Threshold Accuracy
62.5%
at ≥30% PoS · 66.7% precision
BEST CUT

Key finding: Indication-specific transition rates computed from 679 curated RA clinical trials let the model rank eventual successes above failures: pairwise AUC of 0.625 (passes the 0.60 target) and phase-controlled AUC of 0.65. A central driver is correcting the NDA/BLA success rate from the 91% "given-an-NDA-was-filed" benchmark to a computed rate of ~42% drawn from actual FDA approval outcomes, which captures the full attrition an investor faces at the decision point.

How We Built the Dataset

Raw ClinicalTrials.gov data lacks the drug-level structure needed for transition rate computation. We built a 9-phase enrichment pipeline to transform 1,304 raw RA trials into 679 curated records with CMO-grade intelligence.

1
Ingest Raw CT.gov Data
192,411 interventional studies ingested via ClinicalTrials.gov API. Linked condition mappings (420K rows) and intervention data (424K rows) stored in Supabase.
2
Filter for Rheumatoid Arthritis
1,304 unique RA trials identified by condition text matching across Phase 1 through Phase 4, spanning 1990s to present.
3
Cross-Reference 4 Data Sources
Each trial enriched by AI agent cross-referencing: ClinicalTrials.gov (structured fields), FDA Drugs@FDA (regulatory data + approval dates), PubMed (published efficacy), and web search (press releases, analyst reports). Confidence score computed per trial.
4
Drug-Class Knowledge Mapping
Pharmacology domain knowledge applied per drug class: drug_class, mechanism_of_action, molecular_target, modality, route, dosing. Batched by class — Anti-TNF first (~180 trials), then JAK (~120), IL-6, Anti-CD20, etc. 32 drug classes identified and consolidated.
5
Outcome & Efficacy Extraction
Published pivotal trial results mapped: ACR20/50/70 response rates, p-values, comparator results. Terminated trials mapped via CT.gov’s why_stopped field. Strict anti-hallucination rules: only include numbers with high confidence, always cite study name and timepoint.
6
Verification & Bias Checks
Random sample spot checks, drug class distribution sanity checks, FDA date cross-referencing. Completion rates verified between raw (1,304) and enriched (679) datasets — identical within 0.5pp at every phase. No survivorship bias.
679
Enriched Trials
71
Distinct Drugs
32
Drug Classes
45
Columns Per Trial
Outcome Summary Coverage100%
Drug Class / MoA / Target99.9%
FDA Regulatory Linkage73%
Quantitative Efficacy Data55%

Data integrity verified: We compared completion-to-termination ratios between raw CT.gov data (1,304 RA trials) and the enriched dataset (679 trials). Rates are virtually identical at every phase (within 0.5pp), confirming the enrichment process did not selectively retain successful trials. The 625 excluded trials lacked drug-level metadata (non-drug interventions, unmappable entries), not outcomes.

How the Back-Test Works

Each drug is evaluated using only information available before its real-world decision point. No future data leaks into the model.

1
Curate 679 RA Trials
Deep Dive agent enriches raw CT.gov data with FDA, PubMed, and web sources. 71 distinct drugs, 45 structured columns.
2
Compute Drug-Level Transition Rates
Time-gated: only data before decision date. Drug-level counting (not trial-level). 3-tier fallback: drug-class (n≥5) → RA-overall → BIO/QLS benchmark.
3
Reconstruct Decision Point
For each drug, identify what was known at its go/no-go moment. Costs, competitive landscape, target validation history.
4
Apply Multipliers
Target validation (0-2+ prior class approvals), competitive density, risk flags. All via logistic adjustment to keep PoS bounded.
5
Run rNPV Engine + Monte Carlo
10,000 iterations per drug with Bernoulli stage gates. Same production engine used by PhaseFolio customers.
6
Score Against Actual Outcomes
Pairwise AUC, phase-controlled AUC, go/no-go threshold sweep, risk flag sensitivity.

Predicted Cumulative PoS by Drug

Bars show the model's predicted cumulative probability of success for each drug, sorted within group. All values computed prospectively (no hindsight).

Approved (8 drugs)
AdalimumabHumira · Anti-TNF
57.8%
EtanerceptEnbrel · Anti-TNF
44.4%
RituximabRituxan · Anti-CD20
36.3%
SarilumabKevzara · IL-6
31.6%
AbataceptOrencia · T-cell
25.7%
TofacitinibXeljanz · JAK
25%
BaricitinibOlumiant · JAK
24.4%
UpadacitinibRinvoq · JAK
13.4%
Failed (8 drugs)
OcrelizumabAnti-CD20
39.5%
FilgotinibJAK
39.3%
PeficitinibJAK
27.3%
FostamatinibSYK
26.6%
TabalumabAnti-BAFF
25.1%
DecernotinibJAK
13.7%
VobarilizumabIL-6
11.7%
AtaciceptBAFF/APRIL
7.9%
Mean PoS (approved): 32.3% · Mean PoS (failed): 23.9% · Separation: +8.4pp (target 10pp — below threshold at n=16)

16-Drug RA Back-Test Cohort

DrugBrandMechanismOutcome
AdalimumabHumiraAnti-TNFApproved
EtanerceptEnbrelAnti-TNFApproved
TofacitinibXeljanzJAKApproved
UpadacitinibRinvoqJAKApproved
BaricitinibOlumiantJAKApproved
AbataceptOrenciaT-cellApproved
SarilumabKevzaraIL-6Approved
RituximabRituxanAnti-CD20Approved
TabalumabAnti-BAFFFailed
FostamatinibSYKFailed
FilgotinibJAKFailed
PeficitinibJAKFailed
AtaciceptBAFF/APRILFailed
OcrelizumabAnti-CD20Failed
DecernotinibJAKFailed
VobarilizumabIL-6Failed

Deep Dives

Strongest No-Go Signal

Atacicept

BAFF/APRIL inhibitor · Merck Serono · Decision: January 2008
PhaseFolio assigned the lowest cumulative PoS in the cohort (7.9%) with three risk flags: FIRST_IN_CLASS_RISK, NOVEL_MODALITY, and LIMITED_TRIAL_DATA. The target validation multiplier applied 0.60x for zero prior approvals. Monte Carlo showed 92.1% probability of negative outcome.
Actual outcome: Phase 2 terminated due to severe immunoglobulin reduction and fatal infections.
7.9%
Predicted PoS
$6M
rNPV
92.1%
P(Negative)
Methodology

Computed Transition Rates

NDA/BLA transition rate correction · 679 enriched trials
Static BIO/QLS benchmarks assign 91% NDA/BLA success — but that measures "given an NDA was filed, did it succeed?" Indication-specific rates computed from the enriched-trials corpus (used when n≥5 drugs at a phase, with BIO/QLS 2021 immunology benchmarks as fallback) instead answer the real investment question: "given a drug reached Phase 3, did it ultimately get FDA approval?" The computed rate of ~42% captures the full attrition the static benchmark hides.
This reframing of the NDA/BLA question is a central source of discriminative signal — pairwise AUC 0.625, phase-controlled AUC 0.65.
91%
BIO/QLS NDA
~42%
Computed NDA
0.625
Pairwise AUC

This validation uses 16 drugs — sufficient for proof of concept, but not statistically powered for calibration. Discrimination passes: pairwise AUC 0.625 (target 0.60) and phase-controlled AUC 0.65 (target 0.55) confirm the model ranks eventual successes above failures beyond structural phase bias. Calibration and separation are weak at this sample size: the separation gap is +8.4pp (below the 10pp target) and the false-confidence rate at the 25% PoS cut is 50% (above the 20% target). Cross-indication validation (oncology) and larger cohorts are the planned next steps. The computed indication-specific transition rates described here are a research approach; current production uses static BIO/QLS 2021 base rates. See the full research report for detailed methodology.