Methodology · Historical Snapshot

Probability of Success Calibration

You are viewing a frozen historical snapshot

This is methodology@2026-05-28 as it was on 2026-05-28 — the immutable record any signed export stamped methodology@2026-05-28 was computed under. It is intentionally the citation prose, not the current presentation, and it will never change. The methodology has since advanced (current is methodology@2026-07-28-v2). Current version · Full version history · Verify an export

PhaseFolio derives stage-transition probabilities from observed clinical outcomes rather than expert opinion, following Thomas et al. (2021) and Wong et al. (2019). The benchmark matrix is three-dimensional (11 indications × 8 modalities × 3 biomarker strategies = 264 cells); evidence-based multipliers are applied through a log-odds (logit) transformation to keep results bounded and reflect diminishing returns at high baselines. A multiplier is allowed to score the engine only if a held-out cohort containing both approvals and failures can validate it; signals that cannot be validated are demoted to non-scored risk flags. Engine 2.6.0 (shipped 2026-05-28) adds a drug-specific clinical signal layer — biomarker quality scores in oncology solid tumor at Phase II/III; Phase 1 objective response rate is extracted and surfaced as a flag after Phase 0 cohort validation found it double-counts biomarker quality at 50% cohort coverage.

1. Three-dimensional benchmark matrix

Rather than applying a single set of industry-average transition rates, PhaseFolio stratifies PoS by three independent classification axes that are known to materially affect clinical outcomes: therapeutic area, drug modality, and biomarker strategy.

Therapeutic Area. 11 indications from oncology (solid and hematologic) through cardiovascular, neurology, metabolic, and rare disease. Oncology solid tumor has the lowest overall LoA (~2.5%); rare disease the highest (~9.4%). Source: Thomas et al. 2021.
Drug Modality. 8 modalities including small molecule, monoclonal antibody, bispecific, ADC, cell therapy, gene therapy, and peptide. Modality affects safety profiles and regulatory pathways. Source: Citeline 2024.
Biomarker Strategy. 3 levels: none, enrichment (biomarker-selected population), and companion diagnostic (required for Rx). Biomarker use drives Phase II and III success rates up to 4×. Source: Parker et al. 2015.

The full matrix contains 11 × 8 × 3 = 264 unique indication–modality–biomarker combinations, each specifying five stage-transition probabilities (Preclinical → Phase I → Phase II → Phase III → NDA/BLA → Approval). Values are derived from meta-analysis of 12,728+ clinical-stage transitions (BIO/QLS/Informa 2021) with modality-specific adjustments from Citeline 2024 pipeline data.

Table 1 — Baseline stage-transition probabilities by therapeutic area (small molecule, no biomarker strategy; Overall LoA is the product of all five transition rates):

Oncology (Solid): Preclinical 5.0%, Phase I 40.0%, Phase II 24.0%, Phase III 55.0%, NDA/BLA 90.0%, Overall LoA 2.4%
Oncology (Hematologic): Preclinical 7.0%, Phase I 72.0%, Phase II 42.0%, Phase III 63.0%, NDA/BLA 90.0%, Overall LoA 12.0%
Rare Disease: Preclinical 8.0%, Phase I 56.0%, Phase II 38.0%, Phase III 64.0%, NDA/BLA 93.0%, Overall LoA 9.4%
Neurology: Preclinical 4.0%, Phase I 46.0%, Phase II 20.0%, Phase III 47.0%, NDA/BLA 88.0%, Overall LoA 1.5%
Immunology: Preclinical 6.0%, Phase I 49.0%, Phase II 30.0%, Phase III 58.0%, NDA/BLA 91.0%, Overall LoA 4.6%
Infectious Disease: Preclinical 7.0%, Phase I 52.0%, Phase II 36.0%, Phase III 62.0%, NDA/BLA 92.0%, Overall LoA 7.4%
Cardiovascular: Preclinical 5.0%, Phase I 48.0%, Phase II 28.0%, Phase III 55.0%, NDA/BLA 90.0%, Overall LoA 3.3%
Metabolic: Preclinical 6.0%, Phase I 50.0%, Phase II 32.0%, Phase III 58.0%, NDA/BLA 91.0%, Overall LoA 5.1%
Respiratory: Preclinical 5.0%, Phase I 47.0%, Phase II 26.0%, Phase III 54.0%, NDA/BLA 90.0%, Overall LoA 2.9%
Dermatology: Preclinical 6.0%, Phase I 50.0%, Phase II 34.0%, Phase III 60.0%, NDA/BLA 91.0%, Overall LoA 5.6%
Ophthalmology: Preclinical 6.0%, Phase I 50.0%, Phase II 30.0%, Phase III 58.0%, NDA/BLA 91.0%, Overall LoA 4.7%

Source: BIO/QLS/Informa 2021; Wong et al. 2019.

2. Multiplier adjustments

Several evidence-based factors are known to shift clinical success probabilities relative to the population base rate. PhaseFolio applies these multipliers via a log-odds (logit) transformation — the mathematically correct method when a multiplier is a true odds ratio. The cited sources, however, report effect sizes in different forms: relative success ratios (Minikel 2024), phase success-rate comparisons (Parker 2015), relative approval rates (Mullard 2016), and a cumulative pipeline advantage (Tufts NEWDIGS 2023).

The engine currently treats all of these through the OR-style logit path as a deliberately conservative approximation — this under-credits favorable modifiers at higher baselines, never saturates to 1.0, and avoids stacked-modifier overshoot. Per-modifier estimand declarations (_source_estimand and _applied_as) are recorded in the machine-readable source and explained in the model card. Each multiplier is applied only to the clinical phases where the underlying evidence was measured.

Table 2 — Evidence-based multipliers (source estimand separated from applied path; favorable multipliers >1 increase PoS; unfavorable <1 decrease):

Genetic Validation: 2.6× (RR estimand, OR-applied), Phase II/III, Minikel et al. 2024 Nature
Companion Diagnostic: 2.0× (RR, OR-applied), Phase II/III, Parker et al. 2015 ASCO
Orphan Designation: 1.5× (RR, OR-applied), Phase II/III, Mullard 2016 Nat. Rev. Drug Disc.
Biomarker Enrichment: 1.5× (RR, OR-applied), Phase II/III, Parker et al. 2015; BIO 2021
First-in-Class: 0.85× (RR, OR-applied), Phase II/III, BIO/QLS 2021
CAR-T / TCR Therapy: 1.73× per stage (RR cumulative 3×, OR-applied), Phase I/II, Tufts NEWDIGS 2023
Gene Therapy (Orphan): 1.41× per stage (RR cumulative 2×, OR-applied), Phase I/II, Tufts NEWDIGS 2023
Biomarker Quality — Genomic Validated (oncology solid only): 1.35× (RR, OR-applied), Phase II/III, Schwaederle et al. 2016 — see Section 6
Biomarker Quality — Protein Only (oncology solid only): 0.85× (RR, OR-applied), Phase II/III, Schwaederle et al. 2016 — see Section 6

CAR-T and gene-therapy-orphan per-stage values are sqrt(cumulative) splits of the source's whole-pipeline advantage. Biomarker Quality is the drug-specific multiplier added in engine 2.6.0 (2026-05-28) and applies only when the engine is given a per-drug biomarker classification; see Section 6 for the bucket definitions and the cohort validation.

A critical design decision: preclinical PoS is never adjusted by any multiplier. Preclinical attrition is dominated by toxicology, pharmacokinetics, and formulation failures (Sun et al. 2025) — factors orthogonal to the clinical efficacy signals that these multipliers capture. Similarly, NDA/BLA approval rates reflect regulatory filing quality rather than drug-specific clinical attributes, and are therefore held constant.

3. Logistic transformation method

Applying odds ratios to bounded probabilities requires care. Naive multiplication (PoS × OR, capped at 1.0) produces mathematically unsound results: a drug with 50% base PoS and a 2.6× genetic-validation multiplier would yield 130%, capped to 100% — falsely claiming certainty. With multiple favorable multipliers stacking, this problem cascades rapidly.

PhaseFolio instead applies multipliers in log-odds (logit) space — the standard biostatistical transformation for adjusting bounded probabilities by a multiplicative factor. The three-step transformation:

Equation 1 — Logistic odds-ratio adjustment. Step 1: odds = PoS_base / (1 − PoS_base). Step 2: odds_adj = odds × OR. Step 3: PoS_adj = odds_adj / (1 + odds_adj).

This approach has three desirable mathematical properties:

Bounded output. The result is always in (0, 1) — it can never reach 0% or 100%, regardless of how many multipliers are stacked.
Diminishing returns. A 2.6× OR applied to a 24% base PoS yields 45.1% (+21.1pp). Applied to a 70% base, it yields 85.8% (+15.8pp). The higher the base, the harder it is to push higher — matching clinical reality.
Composability. Multiple ORs applied sequentially produce the same result regardless of order, because multiplication in log-odds space is commutative.

4. Worked example: PoS derivation

Consider a rare disease small molecule with genetic validation and orphan designation. We derive the Phase II PoS step-by-step.

Base rate (from benchmark matrix): Phase II PoS = 38.0%.

Apply genetic validation (factor 2.6, applied as OR): odds = 0.38 / (1 − 0.38) = 0.613; odds × 2.6 = 1.594; PoS = 1.594 / (1 + 1.594) = 61.5%. Factor source: Minikel et al. 2024 (RR); applied as OR.

Apply orphan designation (factor 1.5, applied as OR): odds = 0.615 / (1 − 0.615) = 1.597; odds × 1.5 = 2.396; PoS = 2.396 / (1 + 2.396) = 70.6%. Factor source: Mullard 2016 (RR); applied as OR.

Adjusted Phase II PoS: 70.6%.

Note: naive multiplication would yield min(1.0, 0.38 × 2.6 × 1.5) = 100% — clearly incorrect. The logistic method produces 70.6%, reflecting appropriate diminishing returns.

5. Guarding against overfitting: the multiplier governance gate

Every multiplier in Table 2 adds a degree of freedom, and a stack of adjustable factors can manufacture the appearance of rigor while quietly encoding the author's priors — the central failure mode of any heuristic valuation model. PhaseFolio constrains this in three ways: two structural, one evidentiary.

Structural bounds (Sections 2–3). Multipliers are applied only in log-odds space, so stacked factors can never saturate to 0% or 100% and never overshoot; each factor is applied only to the clinical phases where its source measured the effect; and preclinical and NDA/BLA rates are never adjusted at all. These bounds cap how far the knobs can move any result, regardless of how many fire at once.

The evidentiary gate — a multiplier may only score the engine if a held-out cohort can validate it. A candidate factor earns the right to change a probability only when a backtest cohort containing both approvals and failures shows it discriminates between them: it must fire on known successes as well as known failures, so a skeptic can confirm it tracks outcomes rather than merely labelling the failures after the fact. A signal that fires only on the failures in a cohort, with no approved counterexample, cannot be validated by that cohort — it is demoted to a non-scored, display-only risk flag rather than allowed to move the number.

Worked proof. In the antimicrobial backtest, three candidate antibacterial multipliers were pre-registered on evidence dated before each drug's decision. A pre-publication ablation showed that two of them — a hepatotoxicity mechanism-class prior and a sustained-clinical-response endpoint-fragility prior — fired only on that cohort's failures, with no approved counterexample, so they were demoted to flags. Only the third, single-asset sponsor fragility (which fires on three approvals as well as the failures), was allowed to score. We publish the full ablation rather than the most flattering configuration: scoring only the validatable factor yields a pairwise AUC of 0.629, whereas the unvalidatable pair would have produced a higher but uncheckable 0.797. We report the lower, defensible number. See the backtest methodology and the full antimicrobial Sprint-1 forensics.

This gate governs every future candidate multiplier: no factor scores the engine until a cohort with both outcomes can independently confirm it. Until then it may inform a risk flag, but it does not move the valuation.

6. Drug-specific clinical signals — biomarker quality (scored)

Through engine 2.5.x, every multiplier in Table 2 keyed on attributes of the program (modality, biomarker strategy, orphan designation, indication-level genetic validation). Engine 2.6.0 (shipped 2026-05-28) introduces a second class of signal — drug-specific clinical attributes extracted from each asset's underlying evidence (pivotal-program publications, FDA labels, sponsor disclosures, registry records). The first such signal to clear the multiplier-governance gate of Section 5 is biomarker quality.

Definition. Biomarker quality refines the biomarker-strategy axis of Table 1 by asking what kind of biomarker a program is built on, not merely whether one is present. Three scoring buckets ship in engine 2.6.0:

genomic_validated — a sequenced, mechanism-anchored DNA or RNA alteration (EGFR exon 19 deletion, BRAF V600E, ALK fusion, MSI-H, and similar) used as the enrichment criterion. Multiplier: 1.35× (RR estimand, OR-applied).
protein_only — a protein-expression biomarker without an underlying genomic anchor (HER2 IHC alone, PD-L1 tumor-proportion score, serum-protein enrichment). Multiplier: 0.85× (RR, OR-applied).
unknown — biomarker quality not extracted or not applicable. Multiplier: 1.00× (no adjustment).

The signal applies only to oncology solid tumor programs at Phase II and Phase III. Preclinical, Phase I, and NDA/BLA are not adjusted (the Section 2 design principle). Hematologic oncology, immunology, neurology, and every other indication carry no biomarker-quality multiplier.

Source and estimand. The 1.35× / 0.85× values are reported as relative response/success ratios in Schwaederle et al. 2016 — a meta-analysis of phase II precision-medicine trials across 13,203 patients in 346 studies — and applied through the OR-style logit path of Sections 2 and 3, the same conservative dispatch used by every other multiplier in Table 2.

Cohort validation. A 43-drug oncology-solid-tumor cohort (50% of the 85-drug Phase 0 cohort universe, with approvals drawn from FDA approvals 2018–2024 and failures from public discontinuations) scored biomarker_quality against the engine's baseline. The signal cleared the multiplier-governance gate of Section 5:

Cohort N: 43 oncology solid tumor; bucket-level minimum N ≥ 6 with ≥ 3 sponsors per bucket (the antimicrobial Sprint-1 precedent).
Pairwise AUC: baseline 0.618 → biomarker_quality alone 0.670 (+5.2pp). Stable across the v2, v3, and v4 extractor rounds at 28, 35, and 43 drugs respectively.
The signal fires on both approvals and failures — the both-outcomes gate.

Honest disclosure — the literature anchor is conservative. The cohort fit yielded a genomic_validated odds ratio of approximately 5.59 — markedly higher than the 1.35× we ship. The implication is that the Schwaederle (2016) anchor is now roughly a decade old and likely understates the discrimination of modern targeted oncology relative to current practice. We ship the lower literature anchor anyway, on the same discipline as the antimicrobial backtest (Section 5): we publish the defensible, externally-citable value rather than the higher cohort-derived value, until the cohort is large enough to justify a recalibration. Closing this gap is on the Phase 2 roadmap. The cohort-derived OR is documented here, in the public model card, and in backend/app/computation/pos_benchmarks.json annotations — nothing is hidden.

Human-review gate. Every extracted drug-specific signal carries a reviewed_at timestamp. Unreviewed signals do not score the engine: the engine treats reviewed_at IS NULL as inert. A human reviewer (CMO-grade by methodology design) must approve, reject, or amend each signal before it can influence rNPV. Reviewer identity, decision, and timestamp are stamped on every signed export under engine 2.6.0. A 10% deterministic second-pass audit (minimum 10 rows per cohort, seeded for reproducibility) checks reviewer drift; disagreement above 5% triggers a re-extract with a stricter prompt and a re-review.

7. Phase 1 objective response rate — extracted but not scored

Phase 1 objective response rate (ORR) is the second drug-specific signal PhaseFolio extracts under engine 2.6.0. It is captured, surfaced in the dossier, and stamped on signed exports — but it does not score the engine in 2.6.0. This section documents why, in the same place the engine documents its scoring multipliers.

What is extracted. For each oncology-solid-tumor program with Phase 1 efficacy data, the extractor captures: ORR (percent); the modality class the program belongs to (antibody_targeted, small_molecule_targeted, immune_checkpoint, cytotoxic); the source type (FDA label, pivotal paper, registry, sponsor disclosure); a verbatim source excerpt; and a citable URL. Modality-specific high/low thresholds are pre-registered:

antibody_targeted: high ≥ 40%, low < 15%
small_molecule_targeted: high ≥ 50%, low < 20%
immune_checkpoint: high ≥ 30%, low < 10%
cytotoxic: high ≥ 45%, low < 25%

The reported ORR is BICR-adjusted (investigator-assessed ORR is systematically higher than blinded-independent-central-review ORR — Zhang et al. 2022 documents the typical ≈ 8pp gap) and further discounted for Phase 1 winner's-curse (Vreman et al. 2020). The adjusted value classifies into high / mid / low buckets per modality.

Why it does not score in engine 2.6.0. The Phase 0 validation backtest scaled from 28 to 43 drugs (50% cohort coverage) and found:

biomarker_quality alone: pairwise AUC +5.2pp over baseline — stable across extractor rounds.
phase1_orr alone: marginally validatable. Small positive lift at lower sample size, not robust at 43 drugs.
biomarker_quality + phase1_orr combined: at 28 drugs the combined AUC showed +8.8pp lift; at 43 drugs (50% cohort coverage) the combined AUC fell below baseline (-0.3pp).

The combined +8.8pp at 28 drugs was a small-sample artifact. At higher coverage the two signals double-count correlated information: a high Phase 1 ORR is strongly conditioned on the biomarker that already scores the program, so adding phase1_orr on top of biomarker_quality re-weights the same evidence twice. The combined score actively degraded discrimination.

Governance decision. Section 5's multiplier-governance gate requires both-outcome cohort validation. phase1_orr alone is marginal; phase1_orr combined with biomarker_quality is actively harmful. The decision for engine 2.6.0:

Ship biomarker_quality only as the scoring drug-specific multiplier.
Continue to extract, display, and stamp phase1_orr on signed exports — the data is captured for diligence transparency and for the engine 2.7.0 recalibration cycle.
Defer scoring phase1_orr to engine 2.7.0 pending either (a) a larger cohort that supports a recalibrated independent lift, or (b) a conditional-multiplier framework that applies phase1_orr only when biomarker_quality is unknown, avoiding the double-count.

This is the multiplier-governance gate working as designed. A signal that looked promising at 28 drugs failed the both-outcomes validation at 43 drugs and was demoted to flag-only. We disclose the demotion in the same place the engine documents its scored multipliers; we do not publish only the larger combined number.

8. Non-scored risk flags and provisional disclosure

PhaseFolio extracts additional drug-specific attributes that inform diligence but do not score the engine. Each is rendered in the dossier as a flag (informational, positive, or warning) with a citable source; none move the rNPV value.

Sponsor prior approvals (count). 0 → SPONSOR_PRIOR_APPROVALS_NONE; 1–3 → SPONSOR_PRIOR_APPROVALS_SOME; ≥ 4 → SPONSOR_PRIOR_APPROVALS_HIGH (positive).
Grade 3+ adverse-event rate (modality-thresholded). Antibody-targeted / immune-checkpoint ≥ 30%, small-molecule-targeted ≥ 35%, cytotoxic ≥ 50% triggers G3_PLUS_AE_ELEVATED (warning); below threshold → SAFETY_PROFILE_CLEAN (positive).
Trial randomization. Yes → TRIAL_RANDOMIZED; no → TRIAL_SINGLE_ARM.
Primary endpoint type. Surrogate vs clinical (PRIMARY_ENDPOINT_SURROGATE / PRIMARY_ENDPOINT_CLINICAL).
Sample-size target. Below 60 patients → SAMPLE_SIZE_UNDERPOWERED (warning); at or above → SAMPLE_SIZE_ADEQUATE.
Indication-specific surrogacy R². Below 0.40 → SURROGACY_R2_LOW (warning); above — no flag.

These signals are non-scored by the same governance discipline as the demoted antimicrobial multipliers in Section 5 and as phase1_orr in Section 7: each would need an independent both-outcomes cohort validation to earn the right to move the number. They are useful diligence anchors — and they appear in the signed export so a reader can verify which fired — but the rNPV math is unchanged whether they are present or absent.

8.1 Authorship, AI assistance, and outside review

The drug-specific clinical signal layer was developed by PhaseFolio's non-MD founder using Anthropic Claude Opus 4.7 as the extraction engine, with an independent adversarial subagent reviewing 28 of the extractions for self-consistency (14 flagged for human spot-audit, two certain errors corrected before ship, five likely flagged, seven soft). The Phase 0 GO recommendation and the underlying validation data were prepared for two outside reviewers as of the methodology@2026-05-28 ship date and are being routed to them as part of this release:

A HEOR / governance reviewer — was the multiplier-governance gate of Section 5 applied honestly to phase1_orr, given the small-sample artifact at 28 drugs and the demotion at 43?
An oncology clinical reviewer — are the five validatable buckets (across biomarker_quality and phase1_orr) and the 85-drug cohort itself defensible to a clinical eye?

This methodology version (methodology@2026-05-28) is published provisional pending outside review (one-week feedback window from 2026-05-28). Per substrate doctrine — no version is ever retroactively changed or invalidated; older versions remain valid forever and continue to verify at /verify — if outside review identifies a correction, it ships as a subsequent methodology version, not as an edit to this one. The methodology@2026-05-28 stamp is durable; any export issued under it remains forever-resolvable.

8.2 What's deliberately out of scope for Phase 1

Five constructs evaluated in the Phase 0 research are deliberately deferred to Phase 2 and beyond, in the spirit of "ship the disciplined subset, defer the rest":

Hierarchical Bayesian PoS model. Deferred to engine 4.x or 5.x, when per-indication cohort N reaches roughly 500. The current logistic-OR transform is the right method at current cohort sizes.
Phase 2 readout-quality multiplier. Phase 2 ORR and randomized-vs-single-arm carry independent signal, but the AMR-Sprint-1 precedent requires a per-bucket cohort that current data does not yet support.
Mechanism-class hepatotoxicity prior. The antimicrobial Sprint-1 ablation already demoted this to a flag for that cohort; it remains a flag here.
Sustained-clinical-response endpoint-fragility prior. Same disposition.
Real-world-evidence calibration. Out of scope until a registry-grade RWE source clears the same governance gate as the literature anchors.

The roadmap is published openly: a signal becoming scored later is the normal path, not a methodology break. A signal staying non-scored is also a defensible disposition — transparency about what does not yet earn its way into the math is the point of this section.

Key facts

Benchmark matrix size	11 indications × 8 modalities × 3 biomarker = 264 cells
Source N (stage transitions)	12,728+ (BIO/QLS/Informa 2021)
Scored multipliers (eng 2.6.0)	8 evidence-based factors (Table 2)
Transformation method	Log-odds (logit) — bounded, diminishing-returns, commutative
Preclinical adjustment	Never adjusted (orthogonal failure modes)
Multiplier governance	Scores only if a both-outcomes held-out cohort validates it; else demoted to a non-scored flag
Drug-specific signals (eng 2.6.0)	biomarker_quality scored (oncology solid, Phase II/III); phase1_orr + 9 others flag-only
Phase 0 cohort validation	43 oncology-solid drugs (50% cohort coverage); biomarker_quality alone +5.2pp AUC vs baseline
Human-review gate	Unreviewed signals are inert (reviewed_at IS NULL); 10% deterministic second-pass audit

References

01Thomas, D.W., Burns, J., Audette, J., Carroll, A., Dow-Hygelund, C., & Hay, M. (2021). Clinical Development Success Rates and Contributing Factors 2011–2020. BIO, QLS Advisors, Informa Pharma Intelligence.

02Wong, C.H., Siah, K.W., & Lo, A.W. (2019). Estimation of clinical trial success rates and related parameters. Biostatistics, 20(2), 273–286.

03Citeline (2024). Pharma Intelligence Global Clinical Trials Database. Modality-specific pipeline data used to calibrate transition rates for bispecifics, ADCs, cell therapy, and gene therapy.

04Minikel, E.V., Painter, J.L., Dong, C.C., & Nelson, M.R. (2024). Refining the impact of genetic evidence on clinical success. Nature, 629, 624–629.

05Sun, J., Wei, Q., & Zhou, Y. (2025). Dynamic success rates of drug clinical trials. Nature Communications, 16, 1629.

06Mullard, A. (2016). Parsing clinical success rates. Nature Reviews Drug Discovery, 15, 447.

07Parker, J.L., Zhang, Z.Y., & Buckstein, R. (2015). Clinical trial risk in hematology and oncology: the effect of biomarker use. ASCO Annual Meeting Abstracts.

08Tufts Center for the Study of Drug Development / NEWDIGS (2023). Cell and Gene Therapy Success Rates.

09Schwaederle, M., Zhao, M., Lee, J.J., Lazar, V., Leyland-Jones, B., Schilsky, R.L., Mendelsohn, J., & Kurzrock, R. (2016). Association of biomarker-based treatment strategies with response rates and progression-free survival in refractory malignant neoplasms: a meta-analysis. JAMA Oncology, 2(11), 1452–1459. Companion meta-analysis: Impact of precision medicine in diverse cancers (J Clin Oncol 2015) — pooled n = 13,203 patients across 346 phase II studies; basis for the biomarker_quality multiplier (genomic_validated 1.35×, protein_only 0.85×).

10Haslam, A., Lythgoe, M.P., Greenstreet Akman, E., & Prasad, V. (2023). Characteristics of phase 1 trial participants and accuracy of response rates as predictors of later-stage outcomes in oncology. Basis for the phase1_orr modality-conditional thresholds used in the engine 2.6.0 flag (currently non-scored per Section 7).

11Vreman, R.A., Bouvy, J.C., Bloem, L.T., Hövels, A.M., Mantel-Teeuwisse, A.K., Leufkens, H.G.M., & Goettsch, W.G. (2020). Weighing of evidence by health technology assessment bodies: retrospective study of reimbursement recommendations for conditionally approved drugs. Source for the Phase 1 winner's-curse discount applied in the phase1_orr extractor (Section 7).

12Zhang, J., Pilar, M.R., Wang, X., Liu, J., Pang, H., Brzezniak, C.E., Park, S.S., Subramanian, J., Liu, S.V., & Doroshow, D.B. (2022). Subgroup analyses by investigator-assessed vs blinded-independent-central-review-assessed objective response rate in cancer trials. Source for the BICR adjustment in the phase1_orr extractor (Section 7); investigator-assessed ORR is systematically higher by ~8pp.

Frozen snapshot · methodology version: methodology@2026-05-28 · Last updated: 2026-05-28 · Version history →