The CANDO plan

Jump to: navigation, search

[everyone please copy in from relevant emails]

CANDO main page

Goals Progress
Compile structure and compound libraries
The goal of the CANDO project is to repurpose existing drugs to target neglected diseases in underserved populations. In order to accomplish this we will use compounds that have passed the toxicity studies of phase II FDA studies. Compound libraries are available from KEGG (, DrugBank (, and Chembl ( We're also working on selecting a set of high resolution protein crystal structures to use in our drug binding simulations. From these protein structures we will select the most relevant area of the protein structures to screen. These areas are chosen by using regions that have been co-crystallized with small molecules or using knowledge based prediction algorithms. There are curated sets of these protein binding sites that have been derived from the Protein Data Bank. The highest quality and most extensive annotation appears to be the database at This will be used as our gold standard for binding sites. In the case where a binding site is not known, or where we wish to explore the protein structure more extensively, we will use a combination of prediction methods. The strategy, which we used in CASP9, is to find structural homologues that have been bound with a small molecule and map this area to our protein in question. In addition we will use an in-house prediction algorithm, MFS, to predict clusters of amino acid residues that are deemed to be functionally important.
In Progress
Shotgun multitarget fragment based docking with dynamics
Our Docking method is centered around the scoring function developed by Brady Bernard, We evaluate the probability of a small molecule binding to the protein by exhaustively scoring the inter-atomic distances of the drug and protein atoms.
In Progress
Cluster and reproritise compound structure Matrix
Targeting all known protein structures with each drug will generate protein profiles that give us insight into how a drug is working at the atomic level. We will then incorporate known drug affinities into the profiles in order establish a relative activity of each small molecule.
In Progress
Verify binding, inhibition, and correctness through in vitro and in vivo studies
For diseases where specific proteins are known to be involved, we will test drugs that are predicted to bind with high affinity and selectivity in relation to all other proteins. We will perform tests in-vitro to establish the activity of the drug on the protein of interest.
Not Yet Started
Demonstrate efficacy through well designed clinical studies
Compounds that have shown in-vitro activity will be moved into clinical trials. As these molecules are readily available and non-toxic at previously evaluated doses, we hope to start clinical trials within two years.
Not Yet Started

Computational multidisease multitarget screening pipeline. Follow the link at right for a detailed description.



Within 2 years, we need to be able to produce sets of predicted hits/leads for any arbitrary disease ranked by probability (or some other measure) that it will work. That is, someone should be able to specify a set of structures (or a disease that selects a set of structures) and we need to be able to say what compound (or set of compounds) will likely bind, inhibit, or work in a druglike fashion against that entire collection the best, the second best, the third best, and so on. This compound/structure list is the Matrix and the first version of it (based on the Dunbrack 90 + DrugBank + some random compounds) should be done by this time. I think it's doable.

For ones which we have downstream collaborators, especially those that will testing the clinic, we will then quickly wet verify our predictions by using Kd studies using Biacore SPR to ensure it binds at least. We can also do a few other preliminary studies to ensure there is indeed inhibition, perhaps even of function like in the herpes protease case. We then pass the molecule(s) on to a well designed clinical study that can provide us with a clear answer on the efficacy of a drug in a small way.

So within 5 years, we would provide proof of concept idea that the CANDO platform works and is a viable means of doing in virtuale drug discovery.

Worse than random?

As with CASP1, we can do worse than random when you actually trying to DO stuff. While we'll be doing wet sanity checks, we could have pathological cases that don't work but score well overwhelming our docking method. We need to watch out for these and weed it out. This will happen if we're not careful with our training/testing methods so we don't introduce knowledge about the test set into our algorithm. The definition of "test set" is very broad, this problem applies to "de novo" algorithms that don't have a training set also.


docking platfrom which is currently "fragment based docking with dynamics" which really is an integrated dynamics/fragmentation strategy thinking about it. I used to separate it as: (1) incorporation of dynamics; (2) fragmentation of compounds at their rotatable bonds and then docking them individually, doing the dynamics, and then rejoining the most viable conformations *.

Further more, rather than having something that will dock a single molecule to a single structure (which he has to do), Gaurav will deal with ~5000 molecules and ~50,000 structures. That's the input Brian will provide. And the output Gaurav provides will be a Matrix with compounds as rows and structures as columns and the best 1, N, etc. ranked structures.

BAB/DEE type search

BAB/DEE type search built into the docking

BAB is the branch and bound algorithms used for optimization of discrete and combinatorial optimization problems. It is a basically a set of the strategies used for enumeration. DEE is the Dead-End elimination algorithm is also an optimization method used basically to identify the not-possible parts of combination to reach a global minima.

, and it's indeed the same idea but it's not doing the same thing obviously (in fact,

it sort of has become a fractal. working at atomic level

atomic level pipeline optimization

the pipeline works at the fragment level keep the best options that can be joined An external BAB/DEE will end where we don't have any joining to do (or the joining will still have to be done)

but rather simply whether or not a path down a search tree is viable or not. there is overlap...

the fragment based method allows for this and other optimisations.

don't need fragmentation for this.

You can do this at the atomic level and eliminate unlikely candidates and use any forcefield or docking method for this optimisation when you're doing the shotgun style of approach that's all.

The gain of speed per compound comes from doing it shotgun it's a pretty obvious thing to do but it needs to be done

there should be a large number of published algorithms to do this (dig up old Communications of the ACM journals from the 70s, which is where I got my clique finding algorithm that works quite well).

Pipeline Implementation


1. Select a set of high resolution protein structures to screen against. The initial screen will use a curated set with 2.5A resolution and 90% similarity.
2. Annotate binding sites for the docking screen. This can be done manually, with prediction software, or a combination of both.
3. Determine a set of small molecules to use in the screen. For the CANDO project, we are looking for a set of molecules that have passed toxicity studies and have shown drug like properties. The best resources for this screen seems to be the 1400 FDA approved small molecules in DrugBank, as well as the molecules in clinical trials from Chembl.

Small Molecule Libraries

Binding Site Identification

paper with a bunch more, slightly older free computational screening tools


Personal tools