Scaffolds are a core concept in medicinal chemistry and they can be the focus of multiple independent development efforts, over an extended period. Thus, scaffold associated properties can vary over time, possibly showing consistently increasing or decreasing trends. We posit that such trends characterize the attention that the community pays to a scaffold.

This application allows you to query ChEMBL for a scaffold (represented as a SMILES) and visualize properties of the compounds containing the scaffold over time. This functionality is analogous to Google Trends.

Currently the properties considered are

  • Unique compound counts
  • Tested assay count
  • Z-scored bioactivity
  • Solubility (log S at pH = 7.4), computed using StarDrop
  • Fsp3 (fraction of sp3 carbons, a measure of 3D-ness)
  • Chemical beauty, a measure of drug-likeness described in Bickerton et al, (Nat. Chem. Biol., 2012, 4(2), 90-98).
  • Synthetic accessibility, based on the RDKit implementation of the method by Ertl & Schuffenhauer (J. Cheminf., 2009, 1:8)

As you might expect, very small or commonly occurring scaffolds can result in long query times. We currently employ some simple heuristics to avoid querying for such cases.

For feedback and issues, please contact guhar@nih.gov

How can I cite this application?

If you find this application useful, consider citing

Zdrazil, B. and Guha, R., J. Med. Chem., 2007, ASAP. DOI: 10.1021/acs.jmedchem.7b00954

Frequently Asked Questions

What version of ChEMBL are you using?
We currently use ChEMBL v23
What is the application written in?
We use the Play framework, with a PostgreSQL backend, and RDKit to support cheminformatics operations. Plotting is enabled using HighCharts
Where can I get the sources for this application?
You can get the source code for this application from its Git repository, where you can find instructions on compiling and depoying
Where can I report bugs or feature requests?
You can email Rajarshi Guha at guhar@nih.gov, but filing issues on the tracker is preferred (and will have a better chance of being addressed)
Which ChEMBL assays do you consider?
Currently we consider all assays with units of Ki or IC50, with at least 5 observations, and median absolute deviation of the activity values greater than 0.No constraints are applied to species or targets
How is bioassay activity calculated?
We first compute the median and MAD for each assay that has a standard_type of Ki or IC50. We then remove assays with less than 5 observations or a MAD = 0 (correct to 4 decimal places). Next we use the median and MAD values for each assay to convert the standard_value's to robust Z-scores. The robust Z-scores are used to compute the median bioactivity trends. Currently no constraints are applied to target or species.