How PhaseFolio's predictions hold up against reality. Back-tests evaluate the production rNPV engine against held-out historical cohorts whose outcomes are now known — using only information available before each drug’s decision point. Evaluations publish the per-signal verdicts behind the engine, validated and not-predictive alike.
3 published cohorts · 111 drugs. The production engine scored against held-out historical drugs whose real-world fate is now known, with indication-specific FDA approval as the success criterion.
Browse back-tests →2 published signal verdicts — what earns the right to score the engine and what was deliberately kept non-scored after a held-out test. The negative results are published alongside the positive ones.
Browse evaluations →The same production engine and the same scoring discipline run across every cohort, and every scoring factor is held to one rule: a multiplier may move a probability only if a held-out cohort containing both approvals and failures can validate it, otherwise it is demoted to a non-scored flag. The methodology page carries the cross-cohort comparison and the full discrimination-versus-calibration framing.