Explore how our LLM training works

1/28/26

How Explority AI is trained

To solve pharma’s low early-stage success rate.

Over decades of drug development, humanity has accumulated a vast amount of data on what has and has not worked in preclinical and clinical trials. Yet, as humans, we can only interpret a tiny fraction of that knowledge. But what if we could incorporate reasoning over all of this knowledge into every new decision to improve the success rates of early-stage drug discovery?

Over decades of drug development, humanity has accumulated a vast amount of data on what has and has not worked in preclinical and clinical trials. Yet, as humans, we can only interpret a tiny fraction of that knowledge. But what if we could incorporate reasoning over all of this knowledge into every new decision to improve the success rates of early-stage drug discovery?

To do that, we built Explority – the first AI system that (1) connects drug discovery outcomes with the preceding scientific publications at scale and (2) learns patterns in this data – to forecast the likelihood of success for new early-stage therapies ideas.


From a technical perspective, Explority consists of two tightly connected parts:

1. A multistage algorithm that structures biomedical literature and links it to downstream drug discovery outcomes

2. Large language models (LLMs) trained as classifiers on this structured data to identify patterns associated with success or failure


In the first step, we used OpenAI’s models as part of our structuring algorithm. The scale of our clustering is well illustrated by the fact that we used ~1/10 of McKinsey’s total token usage.


Step 1: “Pharmaceutical archeology”

Structuring scientific papers and linking them to outcomes

A few manual studies previously attempted to link successful therapies with the first articles that inspired them (Spector et al., Eder et al.). However, such analyses were only feasible on a scale of tens of therapies.

Our goal was to structure all research and outcomes at the scale of millions of papers across 5,846 rare diseases. Through multiple iterations, we were able to develop an algorithm that effectively achieves this, creating a “periodic table of therapies” for each disease:


Stage

Approach

Results

1.1) Sourcing all articles mentioning each disease

Search algorithms

15 million publications

1.2) Identifying papers relevant to drug discovery

General-purpose LLMs

1 million publications

1.3) Grouping papers one by one by target, mechanism, and compound, starting with the earliest publication.

Custom RAG-like system with specialized embeddings, feature normalization, and LLM-as-a-judge validation

245 479 unique ideas

1.4) Linking ideas to positive outcomes

↳ Same approach

10,708 orphan designations and 1,288 approvals

Our structuring algorithm was independently validated using 3,000 article groupings created by medical advisors from big pharma, achieving 94.6% accuracy in grouping similar articles.



Step 2: Training LLMs to forecast likelihood of approval

Learning patterns in article sequences that preceded successful and unsuccessful therapies.

Regular LLMs are trained to predict the next word (token) in a sentence. Explority’s models are trained differently – as classifiers.

This means that our LLMs have modified final layers, so instead of generating text, they compute a single score between 0 and 1. This score embeds all the reasoning across sequences of scientific publications that preceded successful therapies (marked as 1 in training) and unsuccessful therapies (labeled as 0).

In simple terms, Explority AI combines:

  • General scientific knowledge, inherited from pre-trained LLMs

  • Outcome-driven training, based on real drug discovery results


This training approach allows the model to move beyond human level of understanding successful and unsuccessful patterns. Similarly, how AlfaFold understood patterns in protein folding.



Validation: testing Explority against real-world outcomes

To rigorously test Explority’s predictive ability, we conducted a validation study that recreated the results our model would have produced if it had been used in 2019:

  • LLMs were trained only on literature and outcomes published up to 2019

  • The models were then applied to 83,501 new drug candidates mentioned in articles at that time

  • New orphan drug designations from 2019 to 2025 were used as positive outcomes for validation


As a result, Explority AI sourced 50.7% of therapies that would reach approval, while exceeding the average 2.5% success rate for drugs entering preclinical trials (Takebe et al., Go Bio). This makes it the first AI proven to surpass the historical success rates of the pharmaceutical industry.


Download the full report

to explore the methodology and results in more detail.


ready to solve rare diseases?

ready to solve rare diseases?

Partner with Explority to turn information into impact. Whether you're planning your next phartership, selecting next R&D idea or just have questions—drop us a message. Let’s explore how we can work together to solve rare diseases.

From insights to

impact.

impact.

Let's accelerate therapies for rare diseases affecting over 350 million people worldwide.

From insights to

impact.

impact.

Let's accelerate therapies for rare diseases affecting over 350 million people worldwide.

From insights to

impact.

impact.

Let's accelerate therapies for rare diseases affecting over 350 million people worldwide.

228 Park Ave S,
New York, USA.

At Explority, we build first-of-its-kind AI to bring clarity to the earliest and riskiest stages of pharmaceutical research by forecasting which therapies are most likely to succeed. Explority AI web and mobile applications are properties of the Explority AI Inc., a company registered in the United States (File No. 10320493).
For all questions: support@explority.ai

Copyright © 2026 Explority AI Inc.

228 Park Ave S,
New York, USA.

At Explority, we build first-of-its-kind AI to bring clarity to the earliest and riskiest stages of pharmaceutical research by forecasting which therapies are most likely to succeed. Explority AI web and mobile applications are properties of the Explority AI Inc., a company registered in the United States (File No. 10320493).
For all questions: support@explority.ai

Copyright © 2026 Explority AI Inc.

228 Park Ave S,
New York, USA.

At Explority, we build first-of-its-kind AI to bring clarity to the earliest and riskiest stages of pharmaceutical research by forecasting which therapies are most likely to succeed. Explority AI web and mobile applications are properties of the Explority AI Inc., a company registered in the United States (File No. 10320493).
For all questions: support@explority.ai

Copyright © 2026 Explority AI Inc.