For full functionality of this site it is necessary to enable JavaScript. Here are the instructions how to enable JavaScript in your web browser.
U.S. Department of Health & Human Services U.S. Department of Health & Human Services National Institutess of Health National Institutes of Health
Predictions on the final evaluation set submitted by all challenge participants are available here
Results for the final evaluation dataset are available here


Key Dates

August 18, 2014
NCATS begins accepting submissions

November 14, 2014 (11:59 p.m. ET)
Registration and submission deadline

January 12, 2015
Winners announced

Register Now »

Training Datasets

The complete training dataset is available here. For individual datasets, please use the links below. In the datasets, "1" means active, "0" means inactive.

Nuclear Receptor Signaling Panel

Stress Response Panel


Final Evaluation

The final evaluation dataset is now available for download as either SMILES or SDF. Results submitted for this dataset will be used to determine the final ranking of the competition. Note that you can submit multiple times but only the latest submission will be used for scoring. You can continue submitting to the leaderboard to test your model until October 13th, 2014, after which time the leaderboard will be closed and submissions thereafter will be toward the final evaluation set.

Testing Dataset

The testing dataset is available for download here. Please note this dataset is only used to evaluate performance for the leaderboard; a separate dataset will be used to determine the winner.

Results for the testing dataset are now available for download.

Public Domain Code

The following are links to code developed by our group that might be useful for the challenge. Please feel free to for any questions about the code.

  • LyChI is a structure standardizer that can be used to remove salts and solvents.
  • PCFP is our implementation of the PubChem fingerprint.
  • A number of molecular descriptors (e.g., topological indices, group contribution, solvent accessible surface area, etc.) are available as part of the library synthesizer code base.
  • Tox21Baseline is a simple implementation of the naive Bayes classifier—utilizing LyChI for structure standardization and PCFP for descriptor/feature extraction—to provide baseline models for the 12 datasets in the challenge.