Exploring QSARs for inhibitory effect of a set of EGFR tyrosine kinase inhibitors by GA-MLR and molecular Docking simulations

: In this study, EGFR as a target for the anticancer activity of a series of 113 inhibitors were taken from the literature. Current work aims to derive statistically robust and appropriately validated multiple QSAR models using easily interpretable molecular descriptors and molecular docking analysis. It will be simpler to identify important structural trends and how they relate to anticancer activity as a result. The MLR model has been used to suggest some novel compounds with improved activity. It has been demonstrated how the predicted compounds interact with the enzyme using the docking study. All predicted compounds were discovered to have several hydrogen bonds with the receptor and involve their bulky groups in strong steric interactions with specific places of the enzyme. The proposed compounds exhibit good pharmacokinetic properties, according to the analysis of their pharmacokinetic profiles.


INTRODUCTION
Many human malignancies have to overexpress the epidermal growth factor receptor (EGFR), which is linked to a poor prognosis.Since EGFR tyrosine autophosphorylation is a growth signal pathway that can be inhibited, inhibitors of this mechanism have attracted a lot of attention as potential anticancer medications.Associated 4-Anilinoquinazolines through competitive binding to the ATP site, 4-anilinopyrido[d]-pyrimidines are practical and selective, reversible inhibitors of both isolated EGFR and EGF stimulated EGFR autophosphorylation in cells, and two of these drugs, CP-358,774 and ZD 1839 (Iressa), are currently undergoing clinical trials.It is anticipated that achieving sufficiently high intracellular levels of such inhibitors may be challenging to permanently inhibit EGF-stimulated autophosphorylation in some cell lines due to high intracellular ATP levels [1][2][3][4][5][6][7][8].774 ZD 1839 In a sizeable majority of human malignancies, overexpression of the tyrosine kinase for the epidermal growth factor receptor (EGFR) is linked to a poor prognosis [9][10].Potentially a new class of anticancer treatments is substances that block EGFR autophosphorylation and concurrently EGF-stimulated signal transmission [11][12][13].The 4-anilinoquinazolines and associated 4-anilinopyrido-[d]pyrimidines are the most effective and selective EGFR inhibitors [14][15][16][17][18].The EGFR's ATP binding domain is reversibly bound by these substances.
Through competitive binding at the enzyme's ATP site, these compounds are practical and specific inhibitors of the tyrosine kinase activity of the EGFR.Potent inhibition of the enzyme is linked to tiny, lipophilic electron-donating groups at the 6-and 7-positions of the quinazoline and electronwithdrawing groups at the 3-position of the aniline ring.
A well-liked subset of CADD is the field of QSAR, which focuses on estimating activity/property and mechanistic interpretation.Structure and activity are found to have a mathematical relationship in QSAR.Complex descriptors are frequently the only ones found in a statistically sound QSAR model.Complex descriptors are challenging to interpret mechanistically with structural aspects.Since then, synthetic chemists have had limited success using the known QSAR models.Deriving multiple fully validated QSAR models with one or more readily understandable descriptors in each derived model is one way to get around this significant constraint [31][32][33][34][35][36].
In this study, EGFR as a target for the anticancer activity of a series of 113 inhibitors were taken from the literature [37][38][39][40].Current work aims to derive statistically robust and appropriately validated multiple QSAR models using easily interpretable molecular descriptors and molecular docking analysis.It will be simpler to identify important structural trends and how they relate to anticancer activity as a result.

Dataset:
The  Approximately 4885 descriptors comprising 0D,1D,2D and 3D were calculated using the Dragon Software followed by feature selection method in QSARINS software [43].This considerable reduced set of 1024 descriptors were further used in QSAR model development.

RESULTS AND DISCUSSION
The exploratory search was limited to seven variables per model in order to create simple and information rich QSAR models.The sixty-nine-molecule dataset was divided periodically into training (56 molecules, 80%) and prediction sets (28 molecules, 25%), using a random splitting mechanism, to create the GA-MLR QSAR models.
Table 1 includes a list of all the compounds and their topological and EGFR inhibition activity.
For QSAR studies, out of 113 compounds, 85 Compounds (75%) were selected for the training set by random selection, using QSARINS software, for the generation of the QSAR model, and the remaining 28 compounds (25%) were used for the test set to evaluate the predictability of the developed model.Among all the calculated physicochemical and topological descriptors, only seven descriptors, which were found to be correlated with the activity, are listed in Table 1.In this In Eq. ( 1), n denotes the number of data points used in the correlation, r 2 is the square of the correlation coefficient, r 2 cv is the square of cross-validated correlation coefficient obtained by the leave-one-out (LOO) jackknife procedure, and the square of the correlation coefficient for the test set compounds, or r 2  pred, is used to assess the correlation's external validity.Eqs. ( 2) and ( 3), where yi,obsd in Eq. ( 2) refers to the observed activity of compound i in the training set and that in Eq. ( 3) refers to compound i in the test set, are used to determine the values of r 2 cv and r 2 pred, respectively.Similar to this, yi,pred in Eq.( 2) refers to the expected activity of compound i in the training set obtained using the leave-one-out jackknife approach, and yi,pred in Eq.( 3) refers to the predicted activity for the compounds in the test set by the model obtained in the training set.However, yav,obsd in the equations refers to the average activity of the training set compound.
S is the standard deviation and F is the Fischer-ratio between the variances of the calculated and observed activities.These two statistical parameters make up the final two.The percentage confidence intervals are indicated by the numbers in parenthesis with a sign.The standard F-value at the 99 percent level is shown by the F-value in parentheses.A strong association is indicated by a F value greater than this.As a result, all of the descriptors utilised in this correlation are found to be extremely significant, and if we eliminate them one at a time, the correlation's significance is prominently reduced.
According to the findings, Eq. ( 1) significantly correlates the inhibitory activity values with the compound structural descriptors.The association has strong predictive power despite lacking any mechanistic elements.
A graph (Fig. 2) that compares the calculated and actual activities for the training and test sets demonstrates that the model is having a strong predictive power.As seen in Figure 2, nearly all the points lie close to the straight line except few.Using GA-MLR model (eq.1) we predicted some new compounds reported in Table 2, each of the predicted molecules has a greater activity value than any compound in the existing series (Table 1).

Docking Analysis
To determine these compounds' binding modes, LeadIT FlexX software was used to perform a molecular docking analysis on the predicted compounds (Table 2).A molecule's potency is determined by its capacity to interact with an enzyme.The linked enzyme's crystal structure, which can now be accessed from the RCSB protein data library, is essential for the research of molecular docking.The enzyme with the PDB entry code 2bgf (http://www.pdb.org) was chosen.The enzyme was docked with each of the anticipated chemicals listed in Table 2, and Table 3 reports the docking conclusions.
All projected molecules in the enzyme underwent a molecular docking analysis.To demonstrate the greatest possible interactions between the inhibitors and the enzyme 2bgf , we only quoted compounds 10 and 1 here (Figs. 3 and 4), with compound 10 having the highest projected activity and compound 1 having the highest docking score (Table 3).It is evident from these Figs. 3 and 4 that the predicted compounds interact well with the enzyme.All of them go through hydrogen bonding, and steric interactions, wherein other active clefts of the enzyme encircle other chemical moieties.Any inhibitor's flexibility will determine whether or not a given moiety can enter an enzyme cavity.These steric interactions may entail dispersion interactions, a type of electronic contact.
Pharmacokinetic Studies: Data Warrior software [49] was used to determine the predicted compounds' pharmacokinetic profiles, and the findings are shown in Table 4. Molecular weight (M.W.), ClogP, the number of hydrogen bond acceptors (H.A.s), the number of hydrogen bond donors (H.D.s), and the number of rotatable bonds (NRBs) are all included in these pharmacokinetic profiles [50][51].Lipinski's rule of five asserts that any drug must satisfy at least three of the following four characteristics to be considered active: (i) The value of its logP shouldn't be more than 5.
(ii) There shouldn't be more than ten hydrogen bond acceptors in it.
(iii)There shouldn't be more than five hydrogen bond donors (the sum of the N-H and O-H bonds).(iv) Its molecular weight shouldn't exceed 500 Remember that all numbers are multiples of five, which is why it is called the rule of five.The ability to absorb and permeate is considered suitable for compounds with M.W. < 500 and ClogP < 5. Similarly, Viber's rule states that molecules with NRB < 10 have good oral bioavailability.As a result, the pharmacokinetic characteristics of all projected substances are excellent [52-54].compounds were discovered to have several hydrogen bonds with the receptor and involve their bulky groups in strong steric interactions with specific places of the enzyme.The proposed compounds exhibit good pharmacokinetic properties, according to the analysis of their pharmacokinetic profiles.

Figure- 3 :
Figure-3 :A representation of binding of predicted compound 10 in 2bgf

Table - 1
: Dataset compounds and their activity.

Table - 3
: Some predicted compounds belonging to the series of Table-1 and their predicted activity using eq. 1

Table 4 :
Docking results of predicated molecules Website: https://bjmas.org/index.php/bjmas/indexPublished by European Centre for Research Training and Development UK Table-4: Pharmacokinetic properties of the proposed compounds Website: https://bjmas.org/index.php/bjmas/indexPublished by European Centre for Research Training and Development UK Figure 2: A graph between predicted and observed activities of compounds of Table 1 Website: https://bjmas.org/index.php/bjmas/indexPublished by European Centre for Research Training and Development UK