Approximation Methods for Large Chemical Spaces

Rajarshi Guha

Vertex Pharmaceuticals, Boston, US

The availability of large chemical libraries, virtual or real, provides us with the opportunity to explore new regions of chemical space. However, such activities are dependent on the methods to navigate such spaces. In some cases, such as for GDB-13 or GDB-17, reduced representations are employed to enable reasonable search times. However, with the growth of libraries such as the Enamine REAL space, even such reduced approaches may not scale well (and may not even be feasible if explicit enumeration is required). In this talk I will discuss some methods to explore large chemical spaces using approximation methods, primarily focusing on similarity search. These methods are not guaranteed to identify, say, the most similar molecule to a query, but usually provide an answer with bounds on the error. I will discuss the applications of these methods to similarity search and the characterization of large chemical libraries.