The time has come for biotech companies to embrace machine learning for clinical trials, but they should start with compiling the data. That means grappling with real-world patient records.
For startup companies developing new drugs, clinical trials are full of risks. Getting drugs approved through clinical trials is a slow, expensive process that requires resources and infrastructure or, for early-stage biotech companies, interactions with external vendor services. Some vendors help specifically with recruitment for clinical trials, a task difficult for big and small pharma alike, but recruiting for rare diseases trials can be even more challenging. It does not have to be this way — advances in machine learning (ML) can help. We have seen how artificial intelligence (AI) and ML can accelerate drug development workflows. It is time that clinical trials start to benefit from AI as well.
Clinical trials have become much more complicated in overall design over the last 25 years, with multiple countries and regulatory bodies participating, new trial formats, and a broader set of data sources like real-world data and data from devices1. These complexities can lead to delays due to trial recruitment and regulatory hurdles, increasing costs. There is room to make this process faster and smoother, but no one can assuredly predict whether changing protocols will be better and safer — between patients and companies (privacy), and between different government agencies (regulatory workflows). Instigating any sort of coordinated large-scale change is difficult. Biotech, with its traditionally flexible structure, willingness to try new things, and footing strongly in the AI/ML space, is uniquely poised to take on some of these challenges. Recently, Highlander Health announced that it will be investing in companies hoping to bridge clinical evidence generation and personalized healthcare by reducing the cost and complexity in clinical trials, supporting biotech companies who want to test possible improvements.
There are precedents for how to succeed here. In clinical trials, electronic health records (EHRs) help identify eligible participants and follow up on clinical outcomes. Back in 2012, Flatiron Health, which was acquired by Roche for $1.9 billion2, noticed that patient data gathering required transferring manual records into databases. Moreover, clinical and genomic data are generally collected separately, by providers and diagnostic labs; therefore, it can be difficult to combine these data without releasing private patient information. Flatiron created software to improve these workflows in oncology and aggregated EHRs from oncology practices across the United States into a massive dataset used to train AI and ML models. Pharma companies use model predictions from their database to identify patients who would be suitable for clinical trial cohorts for over 22 tumor types, in part through quick prescreening of patients for clinical trials.
In oncology, efforts to use real-world data have already borne fruit, for a few reasons. Oncology is a field with known disease progression, and treatment outcomes are measurable. Such data lend themselves nicely to ML, and predictions can be validated — as in Flatiron’s platform, where a network of clinical experts examine the recommendations3. Also, real-world data can be de-identified, kept secure and up-to-data, and represent the general population and patient diversity.
In personalized medicine or rare disease, progress has been slower. Patient histories are more complicated, as are their symptoms, and EHR data are still poor quality and not easily shared. There is an opportunity here — one that Highlander Health plans to tackle. Highlander Health aims to create a learning healthcare system, taking data from clinical research and everyday patient care and applying this continually to improve patient treatment, getting them medicines faster. Highlander Health is set up to have two arms: a private equity firm that is investing in life sciences, healthcare and technology companies (more details to come) and the Highlander Institute, which is a philanthropic arm making targeted grants. Biotech companies with ideas to facilitate data collection from EHRs for use in a variety of contexts would find financial and moral support from this initiative.
For new companies, building AI and ML into their ecosystem from inception could deliver quick wins. For instance, large language models could curate EHR data from manually written notes, patients could be better matched to clinical trials, and clinical imaging and diagnostics could be incorporated into trial results.
In the long term, AI can improve clinical study design itself — but this would come after EHRs are integrated more seamlessly into workflows. The ability to use real-world data acquired during routine clinical practice and to integrate them into clinical trials data and design would change how studies are designed in the first place. Optimizing synthetic control arms, which are external controls that are generated using external patient-level data, for example, could reduce numbers of patients given placebos4 while also requiring lower enrollment numbers. Down the line, there is potential for models to create ‘digital twins’, or virtual replicas of patients that could predict specific treatment outcomes. The US Food and Drug Administration is showing an increased interest in the use of real-world data, which means that now is the time that companies can start testing new practices.
In drug discovery, it is the data underlying the models that are crucial. AI needs the right training data — and without knowing the underlying biological mechanisms, it is impossible to predict how a drug will act in a patient. Even with little to show so far, biotech has embraced AI for drug discovery, putting its trust in algorithms that work on paper. Whether they work or not in patients cannot be known without bigger, better and larger training datasets. At the moment, the opposite is true: clinical trials generate vast amounts of data, but biotech companies have not yet generated the models to use these data effectively. Of course, clinical data must be used cautiously, and trust must be established, but the platform (and now the funding, thanks to Highlander Health) is now there for biotech to take advantage of.