Using artificial intelligence to find the best tumor models for cancer drug screening

Altuna Akalin
5 min readSep 9, 2021
Photo by Louis Reed on Unsplash

The tumor models are necessary tools not only for the search for new drugs with antitumor activity but also for assessing their effectiveness. Optimizing the selection of the tumor models for drug screening is crucial for developing next-generation therapies for precision oncology. If the insights from these initial screens on tumor models were more transferable to primary tumors, more effective drugs could be developed. Needless to say, on top of the improved efficacy, a ton of money could be saved during preclinical and clinical trials by being able to design more precise trials and reducing the drug attrition rate.

Cell lines are widely used in vitro tumor models for drug discovery in basic cancer research. However, they are not perfect models of a tumor. Although cell lines are initially derived from patient samples, they might evolve towards different directions in cell culture. They also can not model the microenvironment of the tumor which affects drug response and overall survival. In addition, not all cell lines are equal and some are more similar to actual tumors when we consider their molecular features. Despite these difficulties, cell lines are one of the first steps to screen the efficacy of drugs that are in development. Since they are originally derived from a patient’s cancerous tissue, despite their divergence from the original tissue through cell culture over time, they still have some defects in their genome that would also occur in patients, therefore they are a good starting point for drug efficacy screening. Drugs are usually screened on a set of specific cell lines that have the target gene or defect of interest or broader screens that include a variety of genomic defects.

How can insights from cell lines be translated to patient-derived xenografts (PDX)?

PDXs (patient-derived xenografts) are tumor models, in which the tissue or cells from a patient’s tumor are implanted into an immunodeficient or humanized mouse. PDX can better recapitulate drug response in primary tumors than cell lines. They are generally the next step after a drug is proven to be a promising candidate on cell line models.

If you have a limited set of cell lines that you tested, which PDX models would reproduce the drug response you see in those cell lines? To answer this question, one must be able to measure the distance between the molecular features of cell lines and PDX somehow. The approach you take has to somehow transfer the information learned from cell lines to the PDX, and potentially also to the primary tumors. However, one important challenge for the transfer of such information between cell lines and the PDX models is that there are fundamental technical and biological differences between them. Yet, such differences can be irrelevant in terms of the models’ drug response outcomes. Therefore such irrelevant technical/biological differences should be sorted out or discounted.

A simple approach would be matching known genetic signatures that are associated with cancer between cell lines and PDX. However, that does not guarantee that the selected preclinical model will successfully represent the human disease or even the PDX. The drug response is more complicated than even the presence/absence of a mutation targeted by a drug. The presence or absence of a single mutation or a set of mutations does not guarantee transferability, we need to consider more dimensions, more molecular information when we make such decisions. However, when we consider more data types that define tumor models, it is also easy to make more mistakes when calculating the distance between cell lines and PDX.

Translating insights from cell lines into PDX: the wrong way

The figure below shows what happens when you don’t execute the information transfer properly and then calculate distances based on molecular features of cell lines and PDX. The scatterplots show the clustering of cell lines and PDX models based on the molecular features (gene expression, mutations, and copy number variation). As you can see, the PDX models and cell lines completely group into different clusters on the left-hand side plot. They also do not group by tumor tissue of origin (right-hand side plot). You will not be able to match your cell lines of interest to the right PDX models because of the primitive data integration techniques used.

Figure 1: Showing clustering based on molecular features of cell lines (CCLE: marked red on the left plot) and PDX (marked blue on the left plot). The right plot is identical to the left one just colored by the tumor origin of the samples.

Translating insights from cell lines into PDX: the right way

The data from cell lines and PDX models often do not have one-to-one correspondence. In addition, other batch effects such as differences in sequencing techniques can affect the comparability of cell lines to the PDX models. This creates insight transferability issues. The discordant datasets must be rectified to represent the same information during learning.

Arcas technology enables simultaneous data integration and rectification, where the machine learning method focuses on biological similarities between datasets and is not biased by technical differences. This way the data from cell lines and the PDX models can be integrated, thereby providing a way to match cell lines to their nearest neighboring PDX model and also to generate insights simultaneously from multiple data sets.

The figure below shows the results of proper integration of cell lines and the PDX models via the Arcas platform. The cell lines and the PDX models are now mixed (left panel) and clustered according to the tumor of origin (right panel).

Figure 2: Showing clustering based on molecular features of cell lines (CCLE: marked red on the left plot) and PDX (marked blue on the left plot). Right plot is identical to the left one just colored by tumor origin of the samples.

In summary, our technology allows us to search and match cell lines to the PDX models, or search and match any other disease model to primary tumors. We also possess the expertise to deduce if the disease model has diverged to such an extent that it can no longer be a useful model for any primary tumor. This way, the insights obtained in pre-clinical phases of drug development can be projected to later stages more confidently resulting in better patient stratification and biomarker discovery, consequently cutting down costs and time.

--

--

Altuna Akalin

Bioinformatics Scientist, writing about data analysis, genomics, bioinformatics, and science. https://al2na.co