Virtual
Screening
There are millions of chemical 'libraries' that a trained chemist could hope to synthesise. Combinatorial chemists have already demonstrated in several prototype systems that libraries containing 1,000-100,000 compounds can be assembled (54). Virtual screening help chemist decides what compound should be synthesised (55). Virtual screening can be done by docking method or pharmacophore method.
Pharmacophores are the "refined" essence of what makes an effective ligand-receptor interaction, explicitly three-dimensional, and represent fundamental physicochemical aspects of ligand-receptor interactions, and are extremely useful when experimental structural data is unavailable and homology models are unreliable. In that case, a good pharmacophore model could give powerful insight and screen more effectively (56, 57). Since the conformer is only compared against a three-point pharmacophore model, the method of virtual screening using pharmacophore could be very useful for a large number of compounds when compared to virtual screening using the docking method (54). This is due to the memory used for each conformation is not as large relative to when docking is required.
This method of virtual screening using pharmacophore could be seen in the work of Pokhrel et al. (37) which screened more than 11 thousand from the Ambinter database. The complicated steps in the pharmacophore model method are in the step of model validation. The literature used the GH scoring method for pharmacophore model validation and also includes enrichment factor and goodness of hit score. This validation needs to use a set of other databases called decoy compounds, which are usually available in the Database of Useful Decoys (DUDe) (58). This database has to be compared to active compounds in this matter they use known active ACE2 inhibitors from CheMBL (59) and a literature search. After the model is validated, then it can be used as the parameter for the screening. This could be a problem when there is no available decoy database, or the number of active compounds is inadequate.
Table 2. Summary of studies included in the systematic review.|
Ref.
|
Method, Protocol
|
Software
|
Ligand
|
Target
|
Grid Size (Å)
|
MD
|
ADMET
|
Candidate Drugs (ΔG) (kcal/mol)
|
Std. Ref.
|
Important Residue
|
|
(21)
|
PISA, MolDoc
PhysProp
MD
|
PDBsum,
ADT 1.5.6,
AD4.2,
DW 4.6.1,
GROMACS 2019.2,
LigPlot+ v1.4.5
|
36 compounds with a preclinical or clinical
trial against previous variants
|
SP Omicron - hACE2 (7WBL)
|
36 x 52.875 x 57.75
|
100ns
|
Molecular
Docking
The docking method is another option for virtual screening. It can also further be used to validate virtual screening results. Molecular docking studies are mainly used to predict the ligand-receptor complex's binding affinity, preferred binding pose, and interaction with the least amount of free energy. Docking studies also can reveal the interaction between protein-ligand, protein-nucleotide, and also protein-protein interactions (PPIs). Noncovalent interactions can include ionic bonds, hydrogen bonds, and van der Waals interactions (60). In addition to the software mentioned previously in the summary of studies, several other software options are widely used in many molecular docking studies. RosettaLigand (61), Surflex (62), and Ligandfit (63) are some of the other popular software.
The docking mechanism is a two-step mechanism. It started with sampling conformation of the ligand in the receptor then followed by ranking these conformations using a scoring function. The effectiveness of a docking programme is determined by two major factors: search algorithms and scoring functions (64).
There are 2 main algorithms in molecular docking, which is the stochastic algorithm when where the search is carried out by modifying the ligand conformation or population of ligands. Example algorithms for this method are Monte Carlo and Genetic Algorithms (65). On the other hand, systematic search methods promote minor modifications in structural parameters, which gradually change the conformation of the ligands. The algorithm examines the energy landscape of the conformational space and, after many search and evaluation cycles, converges on the lowest energy solution corresponding to the most likely binding mode. The systematic algorithms are presented in GLIDE and DOCK (66).
Scoring functions are used to predict the target-ligand complex's binding free energy, which is a measure of the small molecule's binding potency for the biomolecular target. Scoring functions are classified into three types: force-field-based scoring, empirical-based scoring, and knowledge-based scoring (67). AutoDock scoring is an example of force-field-based scoring, which is derived from the classic force field and evaluates the binding energy as a sum of nonbonded interactions. Empirical-based scoring, such as GlideScore, is a weighted sum of various types of receptor-ligand interactions. DrugScore is a knowledge-based scoring system that penalises repulsive interactions while favouring preferred contact between each of the atoms in the protein and ligand within a given cut-off (64).
Grid
Size and Parameter
In our reviewed studies, we examined several variations in the grid size parameter. Several studies reported the grid size qualitatively, stating that they used active site residues or critical interacting residues as grid size (22, 24, 35, 38). Because of the lack of quantitative data, this approach of disclosures would, of course, reduce the reproducibility of the studies. In other studies, the grid size is determined differently for each receptor target (33). In others, it is the same size for all tested receptors (23, 25-27, 39-41). Understandably, a specific PDB will have a set of specific grid sizes that can be used in it, but the grid size should always be bigger than the ligand that is docked. According to Feinstein and Brylinski's research, the highest accuracy is obtained when the dimensions of the search space are 2.9 times larger than the radius of the gyration of a docking compound (68). They developed a procedure based on this discovery to customise the box size for individual query ligands to maximise docking accuracy. This finding essentially reduces the number of scoring failures caused by overly generous box sizes while also avoiding sampling failures caused by a too-narrow search space.
Database of Chemicals
Although the majority of the chemical structure is obtained from PubChem, the dataset used for each study is unique to each author. Several studies are being conducted to investigate whether already approved drugs can be repurposed from their original purpose to become ACE2 blockers and SARS-CoV-2 inhibitors (31, 35, 36). Another method for preparing the dataset for testing is to look for compounds found in traditional medicine or plant constituents as drug candidates (22, 24, 39, 40, 25, 27-30, 33, 34, 38). The final notable approach is to select a chemical suitable as a candidate from a large dataset ranging from 10,000 to 500,000 compounds (32, 37, 41). However, these differences in approach may provide beneficial insight from different perspectives, bringing us closer to real drug candidates ready for development.
Target Receptors
We may find a range of PDB IDs for ACE2 and SARS-CoV-2 Spike Protein receptors in this review study. The same PDB ID was used in several articles for the same receptors. We can see this in four publications (24, 25, 32, 33) for ACE2, which used PDB 1R42 (69) as the receptors. Some other interesting point is that a PDB file can be used as a receptor target for ACE2 or SARS-CoV-2 because it contains both receptors in one file. We can see the PDB 6M0J (70) can be used as a SARS-CoV-2 Spike Protein target in 5 articles (22, 27, 30, 40, 41) and for ACE2 in 3 other articles (31, 39, 40). Both mentioned receptors are good in crystal resolution (2.20 and 2.45 respectively).
Interacting Residues
We summarised the common residues that appear more than once in the reviewed literature from the reported residues that interact with the ligands. Tyr453 and Gly496 appear in four different articles in the SARS-CoV-2 Spike Protein, while Arg403, Ser494, and Tyr505 appear in three distinct articles. Several residues appear in two articles: Cys336, Phe338, Gly339, Asn343, Asp364, Ser373, Tyr449, Gln493, and Asn501.
The binding residues in the ACE2 receptor that appear more than once in the reviewed article are as follows. His34 was mentioned in five different articles. Glu37 and Arg393 were mentioned in four different articles. Three different articles featured Asp30, Arg273, and Asp367. While Lys26, Asn33, Lys363, and Thr371 were found in two separate articles.
These residue similarities could help researchers find more specific binding sites, as well as serve as a reference for future drug discovery and development aimed at ACE2 and SARS-CoV-2.
Molecular Dynamics
Advances in software and hardware performances allowed researchers to adopt molecular dynamics to address drug discovery issues, especially protein–ligand stability (71). The dynamic nature of the receptor has been largely demonstrated and conformational changes have been related to ligand binding (71).
In brief, MD simulation begins with the selection and preliminary analysis of protein structure. Protein structures must be in 3D conformation and can be downloaded from the Protein Data Bank (PDB). If a 3D conformation is not available, the structure can be obtained using homology modelling.
There must be no missing residues in the structure, and any that are missing must be added using the Modeller software. After the structure has been complexed with the best ligand, the input file for MD simulation must be prepared. A complex is formed in a system by adding water, after which the protein is minimised in a vacuum and a short MD is performed, restraining the protein. The system is then equilibrated and configured for the parameter being used. After all of the preparations have been completed, the long MD simulation can begin.
13 of the 21 articles reviewed are undergoing molecular dynamics testing to further validate the findings. Simulation times range from 1 ns to 250 ns, with an average of 50 ns to 100 ns. 11 of the 13 performed the simulation at 300 K, while only one performed the simulation at 310 K, and the other did not specify the simulation temperature.
Summary
Following our review of these articles, we made several recommendations to improve research in this area. To begin, it is critical to select target receptors of high quality for in silico experiments. This quality can be attributed to the structure having a lower resolution (below 2.5 Å) and no missing residues. This minor adjustment can significantly improve the quality of the research.
Second, a protein-protein interaction could help to ensure that the docking site is in the correct position. This allows the interacting residues of both receptors to be determined and provides additional evidence that the experiment is being carried out in the correct location.
Third, it is encouraged to use multiple receptor conformations for virtual screening and molecular docking. This can be accomplished by processing the same PDB file through MD simulation and capturing the conformation for each several ns. This gives virtual screening more consideration for what is matched with the receptors and what is not, while also semi-accommodating the protein's flexibility, which is non-rigid in nature. Another approach is to use multiple PDB files for the same receptor. This could provide insights into the consistency of the virtual screening results, such as whether the experiment used a different structure for the same macromolecule and still produced similar results.
Fourth, as discussed in the grid size section, it is recommended to set the grid size to the relative size of the ligands, which is 2.9 times larger than the radius of gyration of the docked molecule, according to the reference. As a result, when using a really big molecule for virtual screening and docking, this needs to be considered, because if it is too small as well, the search space becomes too narrow, making it ineffective.
Last but not least, molecular dynamics simulation should always be included in the research steps. This experiment could greatly aid the researcher's understanding of the ligand-receptor interaction, specifically how they interact with each other over time, which could be deduced as the stability of the ligand docked in the receptors. For each complex, 50 ns is a good starting point for the simulation duration, but longer simulations are preferable because they can provide more insights into the interaction. Trajectories analysis for MD simulation is also useful for research, and it is recommended that it be processed until the free energy calculation step using the MMGBSA/MMPBSA method.
Conclusions
We’ve shown several drug discovery research related to the ACE2 and SARS-CoV-2 Spike Protein by using the in-silico method. Aside from virtual screening, molecular docking, ADMET prediction, and molecular dynamic simulation, the research is carrying out several experiments. Several publications include active site determination, quantum chemical calculation, synthetic accessibility prediction, principal component analysis, and free energy calculation. These additional experiments could be extremely beneficial in ensuring the results of the lead compound or drug candidates. We've also highlighted the key interacting residues for each receptor reported in each article and summarised them in the order of how often they appeared in the reviewed articles. Finally, we've made several recommendations on how to make future research on this topic more elaborate and of higher quality, so that it can provide more precise results.
List of Abbreviations
Act. Site : Active Site Prediction; AD : AutoDock; ADMET : ADMET Prediction; ADT : AutoDockTools; ADV : AutoDockVina; AL : ADMET Lab; BSP : Binding Site Prediction; DFT : Density Functional Theory; DL : Drug-likeness; DSV : Discovery Studio Visualizer; DW : DataWarriot; FE : Free Energy Calculation; MD : Molecular Dynamics; Mi : Molinspiration Server; MOE : Molecular Operating Environment; MolDoc : Molecular Docking; PCA : Principal Component Analysis; PhyProp : Physical Properties; PISA : Protein Interface Stat. Analysis; PK : Pharmacokinetics; pkCSM : pkCSM Server; PLI : Protein-Ligand Interaction; PT2 : ProTox-II; Qua.Calc. : Quantum Chemical Calculation; SA : SwissADME; SAP : Syntetic Accesibility Prediction; SP : Spike Protein; VS : Virtual Screening.