Rahul Haque, Mukti Mohammad and Md Maidul Islam*
Department of Chemistry, Aliah University, India
*Corresponding author: Md Maidul Islam, Department of Chemistry, Aliah University, Kolkata-700156, India
Submission: August 10, 2017; Published: January 08, 2018
ISSN : 2576-9170Volume1 Issue4
Molecular Docking plays a significant role in drug discovery and the knowledge on molecular docking will help us to design appropriate drugs in future. Several types of docking methodologies were used depending on the criteria of drug designing. Scoring functions play a significant role to short out the proper data obtained from molecular docking. In this review we have discussed several method of docking and several methods of choosing scoring functions.
Molecular docking is a growing field in the discovery new biologically active molecules and medicinal research. It predicts the biological activity, and several physical parameters of a molecule. Molecular docking research focuses on computationally simulating the molecular recognition process. It aims to achieve an optimized conformation for both the protein/DNA/RNA and other biological receptors and ligand and relative orientation between receptors and ligand such that the free energy of the overall system is minimized.
Molecular docking is one of the most frequently used methods in structure-based drug design, due to its ability to predict the binding-conformation of small molecule ligands to the appropriate target binding site. Characterisation of the binding behaviour plays an important role in rational design of drugs as well as to elucidate fundamental biochemical processes [1].
The molecular docking approach can be used to model the interaction between a small molecule and a protein at the atomic level, which allow us to characterize the behavior of small molecules in the binding site of target proteins as well as to elucidate fundamental biochemical processes [2]. Molecular docking attempts to arrange molecules in favorable configurations by matching complementary features [3]. This is a difficult task because there are many ways in which complex molecules can be associated. The problem is further complicated by an exponential dependence on molecule size, so that the number of possible configurations explodes when docking involves biological macromolecules such as proteins or nucleic acid polymers. Current docking methodologies thus invoke either geometric- or energy- based schemes to guide configurational sampling [4], the former relying upon the matching of topographical features and the latter upon optimization along a potential energy surface of some kind. As alluded to earlier, however, configurational sampling is only half of the problem. The ranking of each configuration by some measurement of complementarity constitutes the other major hurdle [5]. The structure-based design methods used to optimize these leads into drugs are now often applied much earlier in the drug discovery process. Protein structure is used in target identification and selection (the assessment of the 'drug ability' or tractability of a target), in the identification of hits by virtual screening and in the screening of frΔGments. Additionally, the key role of structural biology during lead optimization to engineer increased affinity and selectivity into leads remains as important as ever. Each of these topics will be outlined, using the field of kinase drug discovery as an example of the role of structure in lead optimization.
Just like in protein folding, solving the docking problem also involves two components: an efficient search procedure and a good scoring function. The two critical elements in a search procedure are speed and effectiveness in covering the relevant conformational space. On the other hand, the scoring function should be fast enough to allow its application to a large number of potential solutions and, in principle, effectively discriminate between native and nonnative docked conformations. The scoring function should include and appropriately weigh all the energetic ingredients. Hence, as in folding, the performance of a particular docking program should not be viewed as representing one complete piece. To solve the docking problem, ideally, the best matching algorithms and scoring schemes should be combined. Similar considerations and division have recently been discussed [6-8]. The three aspects of the docking are mutually inter related. The choice of the system (surface) representation decides the types of conformational search algorithms, and the ways to rank potential solutions. Below, we review the principles of the representation, available search algorithms, and scoring schemes. Based on these, we highlight some potential promising approaches.
Many reviews of docking algorithms and scoring functions have been published in recent years [9-15]. It is commonly reported in these reviews that docking programs are often able to reproduce the correct pose of protein-ligand complexes (alongside many incorrect poses), and that the problem lies in the accurate estimation of the relative binding affinities of ligand poses, i.e. the scoring function.
The scoring function is one of the most important components in structure-based drug design. Despite considerable success, accurate and rapid prediction of protein-ligand interactions is still a challenge in molecular docking. In this perspective, we have reviewed three basic types of scoring functions (force-field, empirical, and knowledge-based) and the consensus scoring technique that are used for protein-ligand docking. The different scoring functions have been developed that exhibit different accuracies and computational efficiencies. In this section, we will briefly review the scoring functions in literature developed for protein-ligand interactions in molecular docking [16]. Some of the commonly-used scoring functions are summarized in (Figure 1 and Tables 1-4).
Figure 1
Table 1: Types of scoring functions
Force-field-based scoring function
Affinities are estimated by summing the strength of intermolecular vander waals and electrostatic interactions between all atoms of the two molecules in the complex using a force field. The intra molecular energies (also referred to as strain energy) of the two binding partners are also frequently included. Molecular mechanics force fields usually quantify the sum of two energies, the receptor-ligand interaction energy and internal ligand energy (such as steric strain induced by binding).
Table 2: Success rates of 16 scoring functions for Wang et al.’s test set of 100 diverse protein-ligand complexes, using the criterion of rmsd ≤2.0 A° [43].
a“K” stands for knowledge-based scoring functions, ‘‘E’’ for empirical scoring functions, and ‘‘F’’ for force field scoring functions, respectively.
Force-field-based scoring is further complicated by the fact that it generally requires the introduction of cut-off distances for the treatment of non-bonded interactions, which are more or less arbitrarily chosen and complicate the accurate treatment of long- range effects involved in binding. One typical force field scoring function in molecular docking is the scoring function of DOCK whose energy parameters are taken from the Amber force fields [17-19]. The scoring function is composed of two energy components of Lennard-Jones VDW and an electrostatic term
Table 3: Correlation coefficients between the experimentally determined binding energies and the calculated binding scores of 17 scoring functions for Wang et al.'s test [56] set of 100 complexes, [43].
Table 4: Enrichments of nine scoring functions at the top 5% of the ranked data bases on four targets of ERα, MMP3, fXa, and AChE [42].
aFor each protein target, the constructed database includes known inhibitors (146 for ERa,60 for MMP3, 129 for fXa, and 54 for AChE) and 999 random, diverse drug-like molecules served as a set of inactive compounds. b The last row lists the maximum theoretically possible enrichments at the top 5% of the ranked database, given the compositions of the databases including the active and inactive compounds.
Where rij stands for the distance between protein atom i and ligand atom j, A_ijand B_ijare the VDW parameters, and qi and qj are the atomic charges. Here, the effect of solvent is implicitly considered by introducing a simple distance dependent dielectric constant s(r_ij ) in the Columbic term. Despite the computational efficiency of the force field scoring function of DOCK, the distance- dependent dielectric factor cannot account for the desolvation effect, an important solvent effect that charged groups favor aqueous environments whereas non-polar groups tend to stay in non-aqueous environments. The desolvation energy is a many-body interaction term and depends on specific geometric and chemical surrounding environments of the considered solute atoms. If the desolvation effect is ignored, a scoring function would be biased on coulombic electrostatic interactions and therefore would tend to select highly charged ligands [16] (Table 5).
The based on counting the number of various types of interactions between the two binding partners. Counting may be based on the number of ligand and receptor atoms in contact with each other or by calculating the change in solvent accessible surface area (ASASA) in the complex compared to the uncomplexed ligand and protein. The coefficients of the scoring function are usually fit using multiple linear regression methods. These interactions terms of the function may include for example: hydrophobic - hydrophobic contacts (favorable), Hydrophobic - hydrophilic contacts (unfavorable) (Accounts for unmet hydrogen bonds, which are an important enthalpic contribution to binding. One lost hydrogen bond can account for 1-2 orders of magGnitude in binding affinity, Number of hydrogen bonds (favorable contribution to affinity, especially if shielded from solvent, if solvent exposed no contribution), Number of rotatable bonds immobilized in complex formation (unfavorable conformational entropy contribution).
These scoring functions are fit to reproduce experimental data, such as binding energies and/or conformations, as a sum of several parameterized functions, as first proposed by Bohm [20]. The design of empirical scoring functions is based on the idea that binding energies can be approximated by a sum of individual uncorrelated terms.
Table 5: Selected force field-based scoring functions.
The total ΔGbinding is given by the sum of the terms corresponding to the interaction energy between the receptor and the ligand and the internal energy of the ligand (when available). Depending on the scoring function, these terms can be obtained by adding the following contributions: (a) electrostatic (i.e. Eelectrostatic); (b) vander Waals (i.e. EvW); (c) hydrogen bonding (i.e. EHbond); and (d) torsional (i.e. Etorsion). For two atoms i and j: (a) A., and B.. correspond to van der Waals parameters for the given atom types; (b) d.. corresponds to their interatomic distance; (c) q, and q. are the atomic partial charges; and (d) €(d,.) is a distance-dependent dielectric function. In G-Score, the hydrogen bonding term is a sum of the individual energies (i.e. Ed , E , Ed , E ) from all the donor acceptor pairs in the complex [59].
Second kind of scoring functions are empirical scoring hydrophobicity, etc. The corresponding coefficients Wi are functions, which estimate the binding affinity of a complex on the determined by fitting the binding affinity data of a training set basis of a set of weighted energy terms of protein-ligand complexes with known three-dimensional
Where ΔGi represents different energy terms such as VDW calculations due to their simple energy terms [16]. (Table 6). energy, electrostatics, hydrogen bond, desolvation, entropy,hydrogen bond, desolvation, entropy,we are determined by fitting the binding affinity data of a training set of proteinligand complexes with known three-dimensional structures [21-33]. Compared to the force field scoring functions, the empirical scoring functions are much faster in binding score calculations due to their simple energy terms [16]. (Table 6).
Table 6: Selected empirical scoring functions.
The free energy of binding (i.e. ΔGbind) is obtained by adding the contribution to the free energy of some terms thatcorrespond to: (a) hydrogen bonding (in LUDI and SYBYLTM/F-Score the first two terms account for neutral andionic hydrogen bonds, respectively); (b) hydrophobic or lipophilic (that accounts for the hydrophobic effect); (c)ligand rotational entropy (a term that counts all the rotatable single bonds in the ligand, which is supposed to be related with the torsional entropy loss of the ligand upon protein-ligand complexation.); (d) contact (that accounts for a general distance-dependent potential for protein-ligand atom contacts); and (e) metal (that accounts for metal ions residing inside the protein binding pocket). Thus, the different scoring functions differ in: (a) the number and the typology of the terms that contributes to ΔGbind ; and (b) the mathematical function (i.e. f) used to calculate one specific contribution [where this function can depend on an angular (Aa) and/or a distance (AR) parameter/s thatpenalize/s the deviations from an ideal geometry]. ΔG(_. ,,, ΔGi . , ΔG, , ,,. , ΔG " , ΔG .. , Gcontact, ΔG " " ΔG are regression coefficients for each corresponding free energy term.'H-bond)7 onic ’ hydrophobic ’ rotor7 aromatic ’ 7 metal’ lipo 0i o ΔG0 is a regression constant. Ahydrophobic corresponds to the molecular surface area [49].
Table 7: Selected Knowledge-Based Scoring Functions.
Docking is frequently used to predict the binding orientation of small molecule drug candidates to their protein targets in order to predict the affinity and activity of the small molecule. Hence docking plays an important role in the rational design of drugs. Two approaches are particularly popular within the molecular docking community. One approach uses a matching technique that describes the protein and the ligand as complementary surfaces. The second approach simulates the actual docking process in which the ligand-protein pairwise interaction energies are calculated. There are three types of Docking: Protien-Protien Docking, Protien- Ligand Docking and Protien-Protien & Protien-Ligand Docking. Softwares used in protein-protein docking include Affinity, Auto dock, Combibuild, Dock vision, Fred, Flexi dock, Flex-X, Glide and Gold. Some Software is discussed below:
DOCK (Dedicator of cyto-kinesis) is a family of related proteins involved in intracellular signaling networks. Studies to date suggest that this family act as guanine nucleotide exchange factors for small G proteins of the Rho family, such as Rac and Cdc42. DOCK family proteins are categorized into four subfamilies based on their sequence homology: DOCK-A, DOCK-B, DOCK-C, DOCK-D
Auto Dock is a suite of automatic docking tools. It has been predicted how small molecules, such as substrates or drug candidates, stopping with a receptor known 3D structure. In addition to using them for docking, the atomic affinity grids can be visualized. This can help, for example, to guide organic synthetic chemists design better binders. Auto dock is current distribution includes two generations of software: Auto dock 4 and Auto dock Vina. Auto dock has now been distributed to more than 29000 users around the world. It is being used in academic, governmental, nonprofit and commercial settings. In January of 2011, a search of the ISI Citation Index showed more than 2700 publications have cited the primary Auto Dock methods papers. Auto dock [40] uses the Lamarckian genetic algorithm, which allows favorable phenotypic characteristics to become inheritable. Auto dock was also used to identify inhibitors of RNA Editing Ligase-1 enzyme of T. brucei, the causative AGent of human African try panosomniasis [41].
FLOG (Flexible Ligands Oriented on Grid), that searches a database of 3D coordinates to find molecules complementary to a macromolecular receptor of known 3D structure. The philosophy of FLOG is similar to that reported for dock [42]. In common with that system, we use a match center representation of the volume of the binding cavity. This new algorithm includes the concept of 'essential points', match centers that must be paired with a ligand atom. Also, we introduce the use of a rapid simplex-based rigid- body optimizer to refine the orientations. We demonstrate, using dihydrofolate reductase as a sample receptor, that the FLOG system can select known inhibitors from a large database of drug-like compounds [43]. Fully automated "anchor and grow" methods have been implemented in several methods such as flog.
Genetic Optimization for Ligand Docking (GOLD) [44] explores full ligand flexibility with partial target flexibility using a genetic algorithm. The GOLD algorithm optimizes rotatable dihedrals and ligand-target hydrogen bonds. The fitness of a generation is evaluated based on a maximization of intermolecular hydrogen bonds. The fitness function is the sum of a hydrogen bonding term, a term for steric energy interaction between the protein and the ligand and a Lennard-Jones potential for internal energy of ligand.
Genetic Optimization and Ligand Docking, uses multiple subpopulations of ligand. Force-field based scoring function includes three terms: H-bonding term, intermolecular dispersion potential, and intermolecular potential. It is 71% success in identifying experimental binding mode in 100 protein complexes.
Auto Dock Vina is a new generation of docking software from the Molecular Graphics Lab. It achieves significant improvements in the average accuracy of the binding mode predictions, while also being up to two orders of magnitude faster than Auto Dock 4.
Auto Dock 4.2 is faster than earlier versions, and it allows side chains in the macromolecule to be flexible. As before, rigid docking is blindingly fast, and high-quality flexible docking can be done in around a minute. Up to 40,000 rigid dockings can be done in a day on one cpu.
Auto Dock 4.2 now has a free-energy scoring function that is based on a linear regression analysis, the AMBER force field, and an even larger set of diverse protein-ligand complexes with known inhibition constants than we used in Auto Dock 3.0. The best model was cross-validated with a separate set of HIV-1 protease complexes, and confirmed that the standard error is around 2.5kcal/mol. This is enough to discriminate between leads with milli-, micro- and nano-molar inhibition constants.
Beta Dock is molecular docking simulation software based on the theory of Beta-complex. We claim that Beta Dock is superior to any contemporary docking software while it requires less human intervention. Current version of Beta Dock is also an intermediate deliverable of the on-going research. Prioritizing shape complementarity, based on the theory of b-complex and the Voronoi diagram. Rigid bodies (both receptor and ligand) [45].
Flex X is a software package to predict protein-ligand interactions. For a protein with known three-dimensional structure and a small ligand molecule, Flex X accurately predicts the geometry of the protein-ligand complex within a few seconds. We docked a set of known active compounds with standard Flex X and derived three sets of target-specific receptor-based pharmacophore constraints by statistical analysis of the predicted placements. Applying these receptor-based constraints in a virtual screening protocol utilizing Flex X-Pharm led to significantly improved enrichments.
Base fragment is picked up and docked using "pose-clustering" algorithm. Clustering algorithm is implemented to merge similar ligand transformations into active site. Flexible fragments are added incrementally using "MIMUMBA" and evaluated using overlap function, followed by energy calculations till the ligand is completely built. Final evaluation through Bohm's scoring function that includes H-bonds, ionic, aromatic and lipophilic terms.
Table 8
A number of protein-ligand docking programs currently available is high and has been steadily increasing over the last decades. The following list presents an overview of the most common programs, listed alphabetically, with indication of the corresponding year of publication, involved organisation or institution, short description, availability of a web service and the license. This table is comprehensive but not completes (Table 8).
Table 9: aTop performing programs/scoring functions indicated by bold font.
Docking methodologies were developing day after day gets more better result for drug desining. The scoring function is one of the most important components in structure-based drug design. Despite considerable success, accurate and rapid prediction of protein-ligand interactions is still a challenge in molecular docking. A number of protein-ligand docking programs currently available is high and has been steadily increasing over the last decades and many companies producing newer software to overcome the challenges observed in old one [46-69].
© 2018 Rahul Haque, et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.