Brief Description a Vaticle Community discussion – with David Dylus, researcher, systems biology, Roche. This speech was practically held at Orbit 2021 in April.

Central to finding drugs is finding important targets in disease mechanisms. Currently, however, all known targets have been tested slowly. In this project, David and his team designed a system of rules to infer and find hidden connections between objects and diseases.

In the following story, David demonstrates how his team in Roche was able to identify potential new targets that were not identified Open goals equally high. This was possible TypeDB, which his team used to store the relevant data and then found the underlying biological certificate for these new sites.

Three materials were used in this project: STRING, My and DisGeNET.

STRING is a database of known and predicted protein-protein interactions. Interaction includes direct (physical) and indirect (functional) associations; they result from computational predictions, data transfer between organisms, and interactions compiled from other (primary) databases. This includes not only the literature, but also experimental evidence for the interaction of proteins with other proteins. This means that we can look for proteins that only have an experimentally validated protein-protein interaction.

Own It was used to add gene families, parallax genes, that is, similar genes that have survived and are duplicates of the genome that may have a similar function to each other.

Finally, DisGeNET was used to provide transformation information that allowed them to link mutations to genes: for example, if you have a mutation in your genome, if the mutation is related to a gene, and if that gene is related to a disease, we can associate the disease with that mutation. This database was also mentioned by Tomás in his speech Orbit 2021 [Computational Future of Biology].

This leads us to ask a question like this:

Do people with this mutation also have some incidence in a particular disease?

To answer this, we need to check how close a specific mutation is in a gene, and then determine that this modification somehow modulates that gene. This would explain why we see a phenotype of this type of disease.

Initially, the team looked at existing items Open goals from the database and selected those who are already ranked high and are known to have high association points – unfortunately, due to the internal Roche IP, they cannot mention which ones have been renamed understanding.

David called high-quality objects taken from open objects billionDollarTarget and bestTarget, who generally have high association scores, that is, they are strongly associated with a disease in which David’s team is interested. In the following, we will see how TypeDB can be found for targets that are not very open targets, but still indirectly modulate the disease and are therefore potentially valuable targets for study.

For this purpose, David built a set of rules and schemas in TypeQL. Below is just a very small excerpt of how such data can be modeled – taken BioGrakn-Covid, A project led by the Vaticle community Konrad Mysliwiec (Data Science Software Engineer, Roche). Note that this is the selected formula; the full scheme can be found in the BioGrakn-Covid scheme file.

gene sub fully-formed-anatomical-structure,
owns gene-symbol,
owns gene-name,
plays gene-disease-association:associated-gene;
disease sub pathological-function,
owns disease-name,
owns disease-id,
owns disease-type,
plays gene-disease-association:associated-disease;
protein sub chemical,
owns uniprot-id,
owns uniprot-entry-name,
owns uniprot-symbol,
owns uniprot-name,
owns ensembl-protein-stable-id,
owns function-description,
plays protein-disease-association:associated-protein;
protein-disease-association sub relation,
relates associated-protein,
relates associated-disease;
gene-disease-association sub relation,
owns disgenet-score,
relates associated-gene,
relates associated-disease;

By adding the right schema, rules, and data, we can write the first query. The relationship below is one that David’s team called gene-disease-inference, with the attribute order:1 means it is a direct relationship. The survey looks like this:

$d isa disease, has disease-name "Disease";
$r ($gene, $d) isa gene-disease-inference, has order 1;
get $r, $d, $gene;

The result is below, we can see that billionDollarTarget, bestTarget, and youWillNeverGuessTarget is linked Disease. We also see that these three goals are order: 1, showing a direct and previously known link between disease and genes. However, the goal is to find new destinations.

To this end, they write the survey below. This looks for diseases and genes that are linked to a gene-disease-inference in relation to order 2, but explicitly excludes those who are already connected to a gene-disease-inference in relation to order:1:

$d isa disease, has disease-name "Disease";
$r ($gene, $d) isa gene-disease-inference, has order 2;
not {($gene, $d) isa gene-disease-inference, has order 1;};
get $r, $d, $gene;

This query returns a completely different list of genes: whatCouldIBeTarget, awesomeTargetand thatTarget. All of these are items that are connected Disease through a gene-disease-inference with order:2.

If you are not yet familiar with TypeDB Workbase, you can right-click one of the inferred relationships and select Explain from the drop-down menu. This explains these conclusions and shows how these subjects relate to the disease through the typical roles that the subjects appear.

If we explain the conclusion that links deadTarget, we see that this target is part of the same gene family as youWillNeverGuessTarget with order: 1. This conclusion was made using a rule that allows us to infer new data from existing data. In this case, we found a previously unknown indirect interaction between the two objects.

The logic of the rule giving this conclusion is broken down as follows:


the gene target is associated with a disease

and this target is also in the same gene family as another target that has already been found to have a strong association with the disease


this gene target and disease should be linked through a gene-disease inference relationship

For other new items awesomeTarget and bestTarget, we see that these conclusions are based on the protein-protein interaction that connects whatCouldIBeTarget. If we explain that relationship, we see that it is connected billion-dollar Target through gene-disease-association, which possibly distributes the same variant of the disease.

Albeit awesomTarget and thatTarget appear in the Open Targets database for a disease of interest, they ranked very low. That means they had some connection to this disease, but not strong. TypeDB revealed new evidence to suggest that these items could be at a higher level.

In this way, David Roche’s team was able to leverage TypeDB’s reasoning engine to find new goals that might have been missed with standard approaches or more direct approaches.

Biology is a very complex field that is constantly evolving. Data sets that held true in the past may not be true today. We are constantly dealing with new mixers, different methods, biological noise. The goal is to find new ways to modulate the disease with strong biological evidence that works.

Finding a new item does not necessarily mean that this is now a solution or ready for experiments. However, it is a great hypothesis to begin to dig into its effectiveness in modulating a particular disease, whether to find a cure or provide better treatments for the patient.

Instead of targeting a single protein, more advanced targeting can be done by integrating additional information such as protein complexes and pathways. For example, we could look for multiple genes that are part of the same pathway. If the drug is unable to modulate one site enough to cause a positive change in the patient’s condition, we may consider targeting multiple points along the same route.

David also mentioned that he was considering extending the rules to, for example, finding higher-grade relationships to examine third-, fourth-, or fifth-degree connections to a particular disease. There is also room to expand beyond protein-protein interaction and to include very specific query constraints. For example, we could filter that we want genes X and Y to be part of the same pathway, expressed in the same cell type, shown to be up- or down-regulated in disease expression, etc. In this way, like boundary conditions, we can increase our targeting and make it very valuable to our process.


Please enter your comment!
Please enter your name here