- Algorithmic elegance. The clique finding problem has been well-studied in computer science, so there exist many well-known algorithms available (e.g., see the second DIMACS implementation challenge). Most of the published maximum clique algorithms are quite easy to understand. In fact, the most problematic aspects of the clique-based approach, in terms of code, are likely to be in the construction of the association graph and bounding function. The actual clique finding routine can be expressed in a few lines of code using any modern computer programming language.
- Optimized implementation. Since most efficient clique solvers are based on the branch-and-bound algorithm, they can easily be parallelizable on modern multi-core CPUs. Moreover, with some care the implementation of a clique solver can be done entirely with bit twiddling.
- Built-in support for disconnected MCS. Due to the nature of the association graph, the resulting maximum clique often yields disconnected MCS's. This can be quite useful, for example, in the identification and visualization of bioisostere replacements.
Maximum common subgraph (MCS) is considered as one of the most challenging problems in combinatorial optimization. As a generalization to (sub-) graph isomorphism, MCS has deep roots within cheminformatics. From reaction mapping to enabling molecular graphs as first-class objects in kernel-based QSAR methods, MCS has become a basic necessity in any cheminformatics toolbox. However, the utility of MCS (or lack thereof) is currently limited by our inability to solve MCS efficiently. Pending a significant development in computational complexity theory, this fact will likely to remain true for a forseeable future. Despite the inherent "hardness" in solving MCS, there exist a number of algorithms for solving MCS within reasonable time on modern hardware. For a good survey of MCS algorithms, please see this paper. The purpose of this post, as such, is to briefly describe our approach to solving MCS and, more generally, to the problem of solving MCS for a set of molecular graphs. Multiple MCS (as the latter problem is often known) has numerous applications in cheminformatics such as Markush structure extraction (e.g., see our related R-group decomposition tool), generalizing bioisostere replacement beyond pairwise, (2D) pharmacophore elucidation, and so forth. Clique-based approaches are arguably the current state-of-the-art for solving exact MCS. Here the MCS problem is transformed into one of finding the maximum clique of the corresponding association graph. This approach has a number of desired properties including (but are not limited to) the following: