Applying multi-parameter optimisation in drug discovery

By On Jan 19, 2013

MPO methods have been used in a wide range of fields from engineering to economics and, more recently, drug discovery. In this article Matthew Segall, CEO, Optibrium, discuss how MPO can be applied effectively in drug discovery to guide rigorous and objective decisions on the selection and design of compounds

Matthew Segall

Finding a successful drug is a delicate balancing act. It is necessary to simultaneously optimise many, often conflicting, requirements to identify a compound that will ultimately become a safe and efficacious drug. Methods for guiding this process, commonly referred to as multi-parameter optimisation (MPO) have been developed1 and in this article we will explore how these can be applied in practice to improve productivity and efficiency in drug discovery.

When searching for a potential drug it is not sufficient to find a highly potent compound against the intended therapeutic target; selectivity against off-targets, appropriate pharmacokinetics and an absence of toxicity at the therapeutic dose are also necessary to reach the market and achieve a strong position. Unfortunately, these requirements are often conflicting; for example, increasing lipophilicity will often improve potency but this is also correlated with poor absorption, increased metabolic clearance and a higher chance of non-specific toxicity. The high rate of attrition in pharmaceutical R&D and the increasing cost attest to the challenge that this balancing act presents.

One key to reducing costs and reducing late stage attrition is to simultaneously consider as many compound properties as possible from the earliest stages of drug discovery. By identifying high quality compounds with a good balance of properties as early as possible, resources can be focused on the areas of chemistry with a high chance of downstream success. An overly narrow focus on a single property, typically target potency, early in the optimisation process can be risky. Avoiding this reduces the chance of encountering a dead end, where a critical property cannot be achieved within a potent lead series, leading to many, long iterations in lead optimisation.

The need to generate data on many properties for potentially large numbers of compounds has led to the development of high throughput in vitro assays and in silico models for a wide range of physicochemical, absorption, distribution, metabolism and elimination (ADME) and toxicity endpoints. However, the avalanche of data that these can generate poses a new challenge for drug discovery scientists; how to analyse this data effectively in order to make good, quick decisions regarding the selection and design of compounds. The human brain is not reliable when juggling complex data to make decisions. Unconscious biases can often impact on efficiency and productivity2. Furthermore, this challenge is heightened by the fact that the data generated in early discovery almost always has significant uncertainty due either to experimental variability or statistical error in predictive models.

This underlying uncertainty brings its own challenge; using even the best experimental or predictive methods in early discovery, it is impossible to say with confidence that a given chemistry will achieve the goals of a project. Furthermore, it is easy to incorrectly discard a compound based on an uncertain piece of data, leading to missed opportunities to find a good drug. Therefore, while it is important to focus quickly on the best chemistry for a drug discovery project, it is also necessary to first explore broadly. Where possible a range of possible avenues for exploration should be identified, which can be studied in detail to validate the initial hypothesis and confirm the direction the project should take.

Multi-parameter optimisation

Figure 1: An example of a multi-parameter scoring profile defining the properties of interest, the criterion for each property and the relative importance of those criteria. Underlying each criterion is a desirability function defining the relationship between a compound’s property value and how likely it is to achieve the project’s objective. An example is shown to the right, in blue, for the target potency (pKi). This indicates that ideally the pKi would be greater than 8 (Ki lower than 10 nM), below a pKi of 7 (Ki greater than 100 nM) the compound would not be of interest, and between a pKi of 7 and 8 the desirability increases linearly. The histogram in the background shows the distribution of pKi values in the data set

A wide range of MPO methods have been applied to compound optimisation in drug discovery; a detailed review may be found in reference1. Probably the most common approach is the use of rules of thumb, such as Lipinski’s Rule of Five (RoF)3, that provide guidelines for the characteristics of compounds that are similar to those of successful drugs. These are very easy to interpret and apply, which has lead to their undoubted popularity and positive effect on the quality of compounds. However, these simple rules have significant limitations: the correlation between the simple characteristics employed by these rules and the complex biological endpoints are not strong; the rules are typically defined for a particular goal, e.g. oral absorption in the case of the RoF, and application to selection of compounds for other objectives may not be appropriate; and, these simple rules are often applied to filter compounds making inappropriately harsh distinctions between compounds, for example a compound with a calculated logP of 5.01 is not significantly worse than one with a logP of 4.99, particularly given that such a calculation typically has an uncertainty of approximately 0.5 log units. These limitations can lead to missed opportunities and wasted effort.

Figure 2: Graph illustrating the output of probabilistic scoring. The compounds in a data set are ordered along the x-axis from the highest to lowest scoring. The score is plotted on the y-axis along with error bars showing the uncertainty (1 standard deviation) in the overall score due to the uncertainty in the underlying data

More sophisticated methods for MPO can bring together data of any type, experimental or predicted, allowing acceptable trade-offs to be defined and allowing for uncertainty in the underlying data to be rigorously taken into account. For example, the Probabilistic Scoring method, employed by the StarDrop software4, allows a desired profile of property criteria to be defined by a user or project team as illustrated in Figure 1. This ‘scoring profile’ allows the requirements for each property to be defined along with the relative importance of each individual criterion to the overall success of the project. In addition to simple thresholds or ranges, more subtle relationships between the property value and a compound’s desirability can be defined as a ‘desirability function’, (also illustrated in Figure 1). Once this is defined, the data for each compound are assessed against the scoring profile, taking into account the uncertainty in the data due to experimental variability or statistical error in a prediction. The result is a score that estimates the likelihood of success of the compound against the ideal profile of properties and an uncertainty in the overall score. This allows compounds with a good balance of properties to be easily prioritised and makes it clear when compounds can be distinguished with confidence. This enables the project team to focus effort on chemistries with the highest chance of success while avoiding missed opportunities when the data does not support confident rejection of compounds. This is illustrated in Figure 2.


a

b
Figure 3. Examples of two selections of 20 compounds from a set of 267. For each selection, the ‘chemical space’ is plotted to illustrate the diversity of the full data (the diversity is defined by Tanimoto similarity of 2D fingerprints) and each point is coloured by score from high (yellow) lo low (red). A similar graph to that shown in Figure 2 is also plotted for each selection. (a) illustrates a selection biased towards score in light blue. This shows that the compounds are selected from the highest-scoring, but that they are focussed on a few small regions of similar chemistry. (b) illustrates a selection biased more towards diversity, selecting a wider range of both diversity and score

Given the uncertainty in the data and hence in the overall assessment of compound quality, it may also not be appropriate to focus too heavily on a single series of closely related compounds. In particular, early in a project it is also important to explore a broad range of diverse chemistries to mitigate risk, investigate potential backup series and, where predictive models are being used, validate the predicted hypothesis. Achieving this balance between quality and diversity is also a form of MPO and a number of approaches have been developed to assist the exploration of this trade-off. It is not possible to sort compounds by their diversity, the diversity is a property of a set of compounds, therefore achieving a good balance of diversity and quality requires many different selection strategies to be explored; for example there are 3×1025 ways to select 30 compounds from a set of 100. Many of the methods to guide the exploration of this trade-off are based on ‘genetic’ algorithms that use the principles of evolution to ‘evolve’ a population of different selection strategies and identify one that achieves an appropriate balance5. Examples of two such trade-offs can be seen in Figure 3, showing the effects of changing the bias in the selection from quality (score) to diversity.

Practical application of MPO

MPO algorithms provide a powerful basis to guide the efficient identification of high quality compounds. But how can they be applied in practice in drug discovery?

The full value of these approaches can only be realised if they are easily accessible to all decision-makers in a project. The majority of these are experimental scientists responsible for the design, synthesis and testing of compounds. Analysis by computational experts provides valuable insights into design decisions, however this analysis often introduces a delay before the results can be determined and reported back. The greatest impact on project decisions comes when many strategies can be explored, with instant feedback, before reaching a confident decision. Therefore, it is important that access is via a user-friendly interface through which it is possible to define multi-parameter objectives and then interpret the results in a visual way.


a	b
Figure 4. Two examples of visual feedback that help to guide the redesign of compounds in order to improve the overall chance of success. The histogram in (a) indicates the impact of each individual property on the overall score; a high bar indicates a high confidence that the property is good while a low bar indicates a significant negative impact of a property value (the colours correspond to the key in Figure 1). This suggests that the most significant risks for this compound are due to high logP and hERG inhibition. (b) is an example of the Glowing Molecule that shows the key structural influences on a predicted property of a compound, in this case logP. The red regions are those that have a significant impact increasing the predicted property, while the blue regions correspond to regions with a significant impact decreasing the property value

It is also important that the algorithms are not ‘black boxes’ that accept data and output a result with little or no explanation. The objective is to provide tools to guide the decisions of experts not to automatically make a decision to be accepted blindly. Therefore, the output of the analysis must provide clearly interpretable results and guidance on potential issues or strategies for improvement. Some illustrative examples are shown in Figure 4.

MPO methods may be applied throughout the drug discovery process. When designing libraries for use in high throughput screening it is important to cover a wide diversity of chemistry to ensure the greatest chance of finding a potent hit. However, even here, it is beneficial to select compounds with appropriate properties to provide good starting points for hit-to-lead wherever possible. In focussed library design, the results of virtual screens can be combined with predictions of other important compound properties to provide good quality hits. During hit-to-lead, the goal is to identify one or more high quality lead series; here it is important to find series with good ADME properties and no overt toxicity in order to give the best chance of rapid progress through lead optimisation. At this hit-to-lead stage, it is important to explore as many options as possible to minimise the chance of getting locked into a lead series with a consistent problem. It can often be difficult to ‘hop’ to a new lead series in order to resolve a problem without sacrificing potency, which can lead to additional, time consuming iterations, increasing the cost and time of lead optimisation. Finally, during lead optimisation, MPO can help to optimise all of the required parameters simultaneously; too much focus on optimisation of a single property can lead to the sacrifice of other important factors which must then be re-optimised in turn, increasing the number of design-synthesis-test iterations.

Early in the drug discovery process, when large libraries of compounds (virtual or synthesised) are often considered, little or no experimental data may be available. In this case, the design and prioritisation of compounds will be primarily guided by data from predictive models. This has been the most common way in which MPO has been applied, due to the quantity and complexity of the data that may be generated. MPO of predicted properties may also be used to guide the exploration of very large numbers of virtual ideas, automatically generated from a starting structure in order to increase the breadth of the search for optimisation strategies around hit or lead compounds6 7 and prioritise the most interesting ideas for detailed consideration by an expert.

However, with the increasing quantity of in vitro ADME and toxicity data that is now routinely generated, even in the earliest stages of drug discovery, MPO has become equally valuable for the effective use of these experimental data to select compounds for further investigation. An example application of MPO to a data set comprised only of in vitro data, to identify compounds with improved in vivo disposition in lead optimisation, is described in8.

Examples

Many examples of the application of MPO to the challenges of drug discovery have been discussed elsewhere1 8. Here we will summarise two recent examples.

MPO-Guided automatic idea generation

Figure 5. An illustration of the chemical space explored around the initial lead that led to the discovery of the drug Duloxetine. The points are coloured by score, from the lowest (0.29) in red to the highest (0.69) in yellow. The initial lead is shown as a dark blue diamond, Duloxetine as a green diamond. The top-three scoring compounds are shown as purple diamonds along with their structures. In this plot, each point represents a compound and the distance between two points indicates their structural similarity; close points are structurally similar while distant points are structurally diverse

Reference 7 illustrates the application of MPO, coupled with automatic idea generation, to the lead compound that ultimately led to the discovery of the serotonin reuptake inhibitor Duloxetine, using the Nova tool in StarDrop. A library of 206 ‘medicinal chemistry transformations,’ representing typical compound optimisation steps, were applied iteratively to the lead structure to create three ‘generations’ of related compounds. This could have generated approximately 1.7 million compounds if all possible combinations had been enumerated. Therefore, to control this, only the top 10 per cent were selected from each generation, based on a probabilistic score calculated from properties predicted using quantitative structure activity relationship (QSAR) models of target potency and key ADME properties. After three generations this resulted in a total of approximately 2,200 compound ideas that explored the ‘chemical space’ around the initial lead and proposed a diverse range of interesting structures, as illustrated in Figure 5.

Among the top-scoring compounds in the final generation was the drug Duloxetine; its score was statistically equivalent to the top-ranked compound and was predicted to be better than the lead with a confidence of approximately 90 per cent. Furthermore, the second-ranked compound generated was very similar to another clinical candidate, Litoxetine, differing only in the substitution point of the side chain on the core naphthalene ring and the addition of a single methyl.

This demonstrated that the combination of idea generation using medicinal chemistry transformations with predictive models and MPO can propose relevant and interesting structures for consideration during drug discovery.

Quantitative estimate of drug likeness

Bickerton et al.9 introduced a metric that estimates the similarity of a compound’s characteristics to those of known drugs. The quantitative estimate of drug likeness (QED) they propose is a generalisation of simple rules of thumb such as the RoF to provide a single numerical measure of ‘drug-likeness’.

The QED was constructed by examining the frequency distributions of molecular weight, lipophilicity, numbers of hydrogen bond donors and acceptors, polar surface area, number of rotatable bonds, number of aromatic rings and number of structural alerts (i.e. undesirable substructures) for 771 known drugs. Desirability functions were fitted to each of these distributions, such that compounds with a value for which a high frequency of known drugs are observed will receive a high desirability score. The overall QED can then be calculated for a compound by combining the desirability scores for its individual characteristics into a single desirability value by taking a weighted geometric mean, which indicates the similarity of the compound to known drugs.

Unlike rules of thumb which classify compounds as ‘good’ or ‘bad’, the QED provides a measure of ‘drug likeness’ on a continuous scale. The authors found that the value of the QED correlated with the subjective view of medicinal chemists on the suitability of a compound as a starting point for a medicinal chemistry project over a set of 17,117 diverse compounds. Furthermore, a benchmarking study also found that drugs were, on average, more likely to have a high QED than a general set of small molecule protein ligands. However, while avoiding ‘non-drug like’ compounds will reduce the risk of failure, it should be noted that a ‘drug like’ compound is far from guaranteed to have suitable physiochemical and biological properties to be a successful drug.

Conclusion

The application of MPO to drug discovery can help to efficiently explore many potential avenues for research and quickly and confidently focus synthetic and experimental efforts on those areas of chemistry most likely to yield a high quality drug. This, in turn, reduces the cost and time for drug discovery while improving the chance of downstream success and reducing the chance of missing valuable opportunities.

For MPO to have a strong impact on key decisions in drug discovery, it must be accessible to all members of a drug discovery project team to provide intuitive guidance on design and selection of compounds. Software that supports MPO in a visual and user friendly environment can facilitate collaboration between computational scientists, chemists and biologists to bring consistency and objectivity to decision-making in order to quickly achieve the objectives of a drug discovery project.

References

Segall, M. D. Multi-Parameter Optimization: Identifying high quality compounds with a balance of properties. Curr. Pharm. Des. 2012, 18, 1292-1310, Preprint may be downloaded from http:///www.optibrium.com/community.
Chadwick, A. T.; Segall, M. D. Overcoming psychological barriers to good discovery decisions. Drug Discov. Today 2010, 15, 561-569.
Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 1997, 23, 3-25.
Optibrium. http://www.optibrium.com/stardrop, visited on 7th January 2012.
Fonesca, C. M.; Fleming, P. J. Genetic algorithms for multiobjective optimization: formulation, discussion and generalisation. Genetic Algorithms: Proceedings of the Fifth International Conference, San Mateo, CA, 1993; 416-423.
Stewart, K.; Shiroda, M.; James, C. Drug Guru: a computer software program for drug design using medicinal chemistry rules. Bioorg. Med. Chem. 2006, 14, 7011-7022.
Segall, M. D.; Champness, E. J.; Leeding, C.; Lilien, R.; Mettu, R.; Stevens, B. Applying medicinal chemistry transformations to guide the search for high quality leads and candidates. J. Chem. Inf. Model. 2011, 51, 2967–2976.
Segall, M.; Beresford, A.; Gola, J.; Hawksley, D.; MH, T. Focus on success: using a probabilistic approach to achieve an optimal balance of properties in drug discovery. Expert Opin. Drug Metab. Toxicol. 2006, 2, 325-337.
Bickerton, G. R.; Paolini, G. V.; Besnard, J.; Muresan, S.; A.L., H. Quantifying the chemical beauty of drugs. Nature Chemistry 2012, 4, 90-98.

The author can be reached at [email protected]