To do that, we built Explority – the first AI system that (1) connects drug discovery outcomes with the preceding scientific publications at scale and (2) learns patterns in this data – to forecast the likelihood of success for new early-stage therapies ideas.
From a technical perspective, Explority consists of two tightly connected parts:
1. A multistage algorithm that structures biomedical literature and links it to downstream drug discovery outcomes
2. Large language models (LLMs) trained as classifiers on this structured data to identify patterns associated with success or failure

In the first step, we used OpenAI’s models as part of our structuring algorithm. The scale of our clustering is well illustrated by the fact that we used ~1/10 of McKinsey’s total token usage.
Step 1: “Pharmaceutical archeology”
Structuring scientific papers and linking them to outcomes
A few manual studies previously attempted to link successful therapies with the first articles that inspired them (Spector et al., Eder et al.). However, such analyses were only feasible on a scale of tens of therapies.
Our goal was to structure all research and outcomes at the scale of millions of papers across 5,846 rare diseases. Through multiple iterations, we were able to develop an algorithm that effectively achieves this, creating a “periodic table of therapies” for each disease:
Stage | Approach | Results |
|---|---|---|
1.1) Sourcing all articles mentioning each disease | Search algorithms | 15 million publications |
1.2) Identifying papers relevant to drug discovery | General-purpose LLMs | 1 million publications |
1.3) Grouping papers one by one by target, mechanism, and compound, starting with the earliest publication. | Custom RAG-like system with specialized embeddings, feature normalization, and LLM-as-a-judge validation | 245 479 unique ideas |
1.4) Linking ideas to positive outcomes | ↳ Same approach | 10,708 orphan designations and 1,288 approvals |
Our structuring algorithm was independently validated using 3,000 article groupings created by medical advisors from big pharma, achieving 94.6% accuracy in grouping similar articles.
Step 2: Training LLMs to forecast likelihood of approval
Learning patterns in article sequences that preceded successful and unsuccessful therapies.
Regular LLMs are trained to predict the next word (token) in a sentence. Explority’s models are trained differently – as classifiers.
This means that our LLMs have modified final layers, so instead of generating text, they compute a single score between 0 and 1. This score embeds all the reasoning across sequences of scientific publications that preceded successful therapies (marked as 1 in training) and unsuccessful therapies (labeled as 0).
In simple terms, Explority AI combines:
General scientific knowledge, inherited from pre-trained LLMs
Outcome-driven training, based on real drug discovery results
This training approach allows the model to move beyond human level of understanding successful and unsuccessful patterns. Similarly, how AlfaFold understood patterns in protein folding.
Validation: testing Explority against real-world outcomes
To rigorously test Explority’s predictive ability, we conducted a validation study that recreated the results our model would have produced if it had been used in 2019:
LLMs were trained only on literature and outcomes published up to 2019
The models were then applied to 83,501 new drug candidates mentioned in articles at that time
New orphan drug designations from 2019 to 2025 were used as positive outcomes for validation
As a result, Explority AI sourced 50.7% of therapies that would reach approval, while exceeding the average 2.5% success rate for drugs entering preclinical trials (Takebe et al., Go Bio). This makes it the first AI proven to surpass the historical success rates of the pharmaceutical industry.
Download the full report ↓
to explore the methodology and results in more detail.
Partner with Explority to turn information into impact. Whether you're planning your next phartership, selecting next R&D idea or just have questions—drop us a message. Let’s explore how we can work together to solve rare diseases.

