
Motivated by the recent renewed interest in substructure searching in the literature, we recently develop a proof-of-concept, self-contained substructure searching engine that can scale to large databases with modest hardware requirements. The current prototype is able to handle
PubChem database (>30M structures) with reasonable performance on any modest server with sufficient (>12Gb) RAM. This work is an extension to our
recent work on improving fingerprint screening. While it's tempting to throw out qualitative (and/or unverifiable) performance numbers, we'll let you be the judge. The prototype hosting the entire PubChem (snapshot taken in September of 2011) is
available here. Please bear in mind this is a prototype, so it might not be able to handle DoS-type queries (e.g.,
c1ccccc1
) gracefully. The binary and source code for the entire prototype are also
available. We'd love to help you deploy it in-house, so feel free to contact us.
3 thoughts on “Large scale substructure searching”