Nonparametric Method for Estimation of Controlled Correlations in Studies of VEGF-Hypoxia Relationship

Citation: Kodryan MS, Kuznetsova AV, Klimenko LL, Mazilina AN, Baskakov IV, et al. (2020) Nonparametric Method for Estimation of Controlled Correlations in Studies of VEGF-Hypoxia Relationship. Int J Clin Biostat Biom 6:024. doi.org/10.23937/2469-5831/1510024 Accepted: March 02, 2020: Published: March 04, 2020 Copyright: © 2020 Kodryan MS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Abstract
It is known that vascular endothelial growth factor (VEGF) expression is a response to hypoxia. On the other hand hypoxia may be detected by oximetry parameters including venous CO-oximetry indices or corresponding partial pressures of O2 and CO2. However significant correlation ties between VEGF levels and oximetry parameters were not found in groups of patients with ischemic stroke and transient ischemic attack. At that some effect related to the relationship between VEGF and sO2 was observed at corresponding scatter plots. Correlation between VEGF and proteins S100 levels in serum existed only in group with severe hypoxia where sO2 is less threshold close to 39-40%. So the relationship between VEGF level and saturation index sO2 exists in conjunction with additional factor that is S100 level in serum. To assess statistical significance of observed regularity it is necessary to test three null hypotheses about independence of one of involved factors on two another. The relationship may be manifestation of hypoxia effect on VEGF. To assess significance of hypoxia effect as a whole all three null hypotheses were tested with the help of developed technique based on random permutations and involving nonparametric combinations of criteria related to single oximetry parameters. The statistical significance assessments also involved multiplicity adjustment aimed to take into account multiple search of additional factors among variety of biological indices from analyzed data set. As a result of developed technique application all three considered null hypotheses were rejected at adjusted level p < 0.02 when effect of hypoxia on correlation between VEGF and complement component C4 was evaluated.

Introduction
Angiogenesis may be a biological response to insufficient oxygen supply resulting in hypoxia. The key mediator of angiogenesis and probably neurogenesis [1] is vascular endothelial growth factor (VEGF) which is homodimeric glycoprotein with a molecular weight of approximately 45 kDa. VEGF expression is activated as a response to stabilization and nuclear translocation of hypoxia-inducible factor-1 (HIF-1) when intracellular oxygen level is reduced [2]. Existence of HIF-1, VEGF signalling pathways is confirmed by high levels of VEGF in patients with chronic obstructive pulmonary disease (CHOPD) [3] and asthma [4] or in subjects with "plateau red face" [5]. It is known that immune system is involved in angiogenesis via secreting VEGF and other pro-angiogenic factors by macrophages, neutrophyles and other immune cells [6][7][8]. Oxygen saturation index sO2 and other CO-oximetry parameters in venous blood reflect balance between oxygen delivery and oxygen consumption [9]. Low sO2 correspond to tissue hypoxia [10]. However correlation coefficients between CO-oximetry Figure 1: Correlation between S100 and VEGF levels are compared in groups with sO2 < 38.4% (left scatter plot) and sO2 > 38.4% (right scatter plot). effect considered in the paper. Verification is possible with the help of discussed below procedure based on testing several null hypotheses.
Effective methods calculating nonparametric combination (NPC) of several dependent permutation tests [15] may be used for assessing significance of hypoxia effect by several oximetry parameters. Also permutation tests are widely used to control multiplicity in various applied tasks including high-dimensional tasks related to DNA microarray experiments [16][17][18][19]. At that there are different ways to control FWER. Single step or stepwise procedures may be used to receive adjusted p-values from previously received raw p-values by min P correcting procedure or to calculate adjusted p-values directly from distributions of test statistics by maxT technique [20][21][22]. Unlike the mentioned works our paper is focused at problems associated with multiplicity control and combinations of several criteria when more complicated multifactor regularities are studied.

Data Set and Preliminary Results
Effect of hypoxia on relationship between VEGF and 138 different biochemical, clinical or biophysical parameters was studied in a group of 88 patients of age from 33 to 88 with ischemic stroke and transient ischemic attack. Hypoxia level was assessed by partial pressures of O2 and CO2 and also by CO-oximetry parameters in venous blood that were measured with ABL80 FLEX CO-OX analyser. Serum levels of VEGF, S100 and complement component C4 were measured by the enzyme-linked immunosorbent assay (ELISA). CO-oximetry parameters together with partial pressures of O2 and CO2 in venous blood will be for the simplicity further referred to as oximetry parameters.
Results of standard correlation analysis are given in Table 1.
It can be seen from Table 1 that statistically significant linear ties between VEGF levels and hypoxia are absent.
It is quite possible that linear ties nevertheless may exist in combination with some additional factors. Such supposition is supported by the effect of oxygen satura-parameters and VEGF levels were small and not statistically significant in groups of patients with severe neurological disorders. Corresponding data set is discussed below. Correlation coefficients between VEGF and partial pressures of O2 and CO2 were also not significant. Lack of reliable ties may be attributed among other things to complexity of existing dependence when relationship between two factors is controlled by the third one. Previous study of the data set with the help of OVP method [11,12] and visual analysis of related sparse diagrams uncover complex effect involving sO2 and levels in serum of VEGF and proteins from S100 family.
Left scatter diagram from Figure 1 corresponds to group with sO2 lower than 38.4%. This diagram conforms to existence of linear dependence of VEGF on S100 level. The only exception is one object that is marked with a red circle. At that there is no noticeable correlation between VEGF and S100 in the right diagram. This diagram corresponds to group with sO2 greater than 38.4%. It may be suggested that the correlation observed in the left diagram may be caused by severe hypoxia existing when sO2 is below 38.4%. The relationship from Figure 1 may be described by following equations: Model (1) evidently is equivalent to standard piecewise regression if X i is equal to Z k . It is supposed that the verification procedure must satisfy following demands:

•
Verification procedure must include testing significance of both variables X and Z.

•
Hypoxia is assessed by several oximetry parameters. It may be expected that statistical technique evaluating the effect of hypoxia on relationship between VEGF and additional factor X i would be more powerful if it implements combination of statistical tests assessing effect each of oximetry parameters on relationship between Y and X i .

•
Effect from Figure 1 was found via testing variety of factors. So multiple testing must be taken into account when statistical significance is estimated.
Permutation test is an approach capable of meeting the listed demands due to following advantages. Permutation test may be implemented regardless whether or not underlying distributions of test statistics are known. There are no limitations on data sets sizes. Different permutation tests were proposed for assessing importance of each predictor in multiple regression models. Method based on testing significance of corresponding partial correlation coefficients may be mentioned in this regard [13]. Permutation test was used to make a choice between piece-wise regression and a simple linear regression [14]. However mentioned techniques are not suitable to evaluated significance of the small groups evidently is higher. Full compensation of this probability increase is impossible by using multiplier m l m r only.
Existence of outlying observations such as observation circled in red on the left part of Figure 1 reduces Q 2ρ and so may hinder correct statistical evaluation of the studied effect. So it is better to use robust Pearson correlation coefficient.
The robust version of the functional 2 can be written as Where ˆl ρ and ˆr ρ are robust correlation coefficients is used as statistics for testing null hypotheses ( ) , .
H k i Then p-values for these null hypotheses are calculated according to the following Procedure I: • Calculate optimal threshold δ 0 on data set t S  as   tion index (sO2) on relationship between VEGF and proteins S100 levels in serum. This effect was discovered with the help of OVP technique [11] and visual analysis of corresponding scatter plots. Left scatter plot from Figure 1 presents strong linear relationship between VEGF and S100 in group with saturation level sO2 less than 38.4. It is seen from right scatter plot that noticeable linear dependence between VEGF and S100 is absent in the group with saturation level sO2 greater than 38.4.

The statistics value
It may be supposed from Figure 1 that hypoxia leads to correlation between VEGF (Y ) and S100 (X i ). So S100 may be considered as additional factor. Our goal is to assess statistical significance of the assumed effect.

Verification of the Effect
Complex dependencies testing: The above supposition is too complex to test using a single null hypothesis. In fact the effect contradicts the following three hypotheses: All three hypotheses must be rejected to be sure that the supposition is perfectly correct. Otherwise the observed pattern may be explained simpler. For example it may be attributed to existence of linear relationship between Y and X i when 3 Where m l is the number of patients in group s͂ l with Z k < δ and m r is the number of patients in group s͂ r with Z k > δ. The threshold δ is initially unknown. It is proposed to use the optimal threshold δ 0 that is calculated as δ 0 = arg max Q̂2 ρ (δ). Maximum of the functional Q 2ρ (δ) may be searched by trying all boundaries between distinct values of Z k existing in full group s͂ . Great values of Q 2ρ (δ 0 ) better testify against each of 3 null hypotheses when Z 1 ≥ 0.4. So data is generated to provide existence of effect that is similar to effect from Figure 1 for each combination from set {(Y,X i ,Z k )|i = 1, . . . , 30, k = 1, . . . , 7}.
3. Variables X 31 , . . . , X 60 for the observation j were calculated as ij ij X e δ * So Y is independent on variables X 61 , . . . , X 90 and Z 1 .

5.
Variables Z 2 , . . . , Z 7 for the observation j were calculated as Z ij = Z 1j if U g ≤ 0.9 and as Z ij = 1 -Z 1j if U g > 0.9.
There are no regularities for combinations (Y,X k ,Z i ) when 31 ≤ i ≤ 90 that are similar to regularity from Figure 1. Variables Z 1 , . . . , Z 7 are included in scenario to imitate all 7 oximetry factors. At that for combinations (Y,X 1 ,Z i ) regularities are more distinct to compare with regularities for combinations (Y,X k ,Z i ) when k > 1. Sets X 1 , . . . , X 30 ; X 31 , . . . , X 60 ; X 61 , . . . , X 90 will be referred to as 1 2 3 , , C C C    correspondingly. Results of experiment: Results of the first and second experiments are presented in Table 2. Columns of the table correspond to significance levels from p < 0.0001 to p < 0.1. Upper part of table corresponds to first experiment (δ 1 = 0.75) and lower part of table corresponds to second experiment (δ 1 = 0.57). Cell at intersection of row corresponding to significance level α and column corresponding to subset j C  contains number of triples (Y,X i ,Z 1 ) with i j X C ∈  for which all three null hypotheses were rejected at least at level α. Number for which all three null hypotheses were rejected at level α is given in the same cell in parentheses.  • Implement step (a 2 ).

It may be seen from
• Then calculate estimate of p-value for null hy-

Experiments with simulated data
Design of experiments: Experiments were designed to imitate regularity from Figure 1 only for certain groups of variables while for the remaining ones such regularities were absent. Scenario includes generating variables Y, Z 1 , . . . , Z 7 , X 1 , . . . , X n . Variables Y and e 1 , . . . , e n were independently sampled from N(0,1), variable X 1 , U g and variables U 2 , …, U 7 are independently sampled from continuous uniform distribution U(0,1). Variables X 1 , . . . , X n are calculated from Y, e 1 , . . . , e n , Z 1 and U g . Variables Z 2 , . . . , Z 7 are calculated from Z 1 and U 2 , . . . , U 7 .
Pattern from Figure 2 is similar to the pattern from Figure 1. However boundary point for pattern from Figure 2 is received by procedure I. This boundary differs from boundary for the pattern from Figure 1 that was calculated by OVP method. Correlation coefficient between VEGF and S100 in group of 33 cases with sO2 < 39.75% equals 0.64. Correlation coefficient increases to 0.88 after removing an outlying object highlighted in the left scatter diagram by red circle. No relationship between VEGF and S100 exists in group of 55 cases with sO2 < 39.75% as it may be seen from right diagram. Corresponding correlation coefficient equals 0.03.
It may be seen from Figure 3 and Figure 4 that effect of hypoxia on relationship between VEGF and C4 is similar to the effect of hypoxia on relationship between VEGF and S100. Correlation coefficient between VEGF and C4 is equal 0.47 in group of 31 cases with sO2 < 39.25% which corresponds to left scatter diagram from Figure 3. Correlation coefficient increases to 0.76 after removing of outlying object which is highlighted at the left scatter diagram by red circle. No significant relationship between VEGF and C4 exists in group of 57 cases with sO2 > 39.25% as it may be seen from the right diagram. Corresponding correlation coefficient equals -0.11. Correlation coefficient increases to 0.05 after removing the highlighted at right diagram outlier.
Low saturation index sO2 corresponds to high FHHB combination from 3 C  when k = 1. At that number of combinations where all three null hypotheses were rejected equals 6 inside 2 C  and 3 inside 3 C  for k = 2, . .
. ,7. In the second experiment number of combinations where three null hypotheses were rejected at level α were higher than number of such combinations practically for all significance levels. So the results of experiments strongly indicate unbiasedness of the developed criterion.

Experiments with clinical data
The developed technique was applied to find regularities similar to the one shown in Figure 1 on the described above clinical data set. Three null hypotheses were tested for combinations (Y,X i ,Z k ) where Y is concentration of VEGF in serum, Z 1 , . . . , Z 7 were oximetry parameters sO2, pO2, pCO2, FCOHb, FO2Hb, FMetHb, FHHb. All 138 variables different from VEGF concentration and oxymetry parameters were tried as additional factors X 1 , . . . , X n . The most significant effects were revealed if concentrations of S100 proteins or complement component C4 in serum are used as additional factor X. It is seen from Table 3 that all null hypotheses are rejected at significance level p < 0.001 for combination (VEGF, pO2, C4), at significance level p < 0.002 for combinations (VEGF, sO2, C4) and (VEGF, FO2Hb, C4), at significance level p < 0.05 for combinations (VEGF, FHHb,    Figure 2: Correlations between S100 and VEGF levels are compared in groups with sO2 < 39.75% (left scatter plot) and sO2 > 39.75% (right scatter plot). Boundary point was calculated using Procedure I.
Test statistics in Procedure II is calculated as a combining function of p-values related to partial tests. Several combining functions are discussed in [15]. The best performance is achieved according to our experiments when slightly modified Fisher combining function ψ is used. Let 1, . . . , k p p are some p-values calculated by permutation test with N random permutations. Then Results of the Procedure II applied to the studied data set are presented in Table 4. It can be seen that the global null hypotheses 1 0c H and 2 0c H are rejected at level p < 0.0005 when additional factors are C4 and S100 concentrations. Global null hypothesis 3 0c H is rejected at p = 0.0001 when additional factor is C4 concentration. But 3 0c H is not rejected when additional factor is S100 concentration.

Multiplicity Control
It was necessary to test global null hypotheses from values. Strong correlation between VEGF and S100 exists when FHHB is greater than a certain threshold. At that correlation coefficient is close to zero when FHHb is lower than the threshold as can be seen in Figure 4. Correlation coefficient between VEGF and C4 in group of 36 cases with FHHB > 56.3% equals 0.45. Correlation coefficient increases to 0.73 after removing of outlier highlighted at right diagram. No significant relationship between VEGF and C4 exists in group of 52 cases with FHHB < 56.3% as it may be seen from left diagram. Corresponding correlation coefficient is equal -0.1. Correlation coefficient increases to 0.07 after removing of highlighted at left diagram outlier.
Our goal is testing if hypoxia has effect on VEGF production via controlling relationship between VEGF and some additional factor. Hypoxia effect is manifested via effects associated with different oximetry parameters. Existence of supposed hypoxia effect contradicts simultaneously to several of null hypotheses associated with different oximetry parameters. Hypoxia effect may be assessed by testing global null hypotheses ( ) ( )

Conclusion
Results may be shortly summarized as follows. A method was developed which is aimed to discover relationships of the following type in data: Significant linear correlation between two factors Y and X i exists only if third factor Z k belongs to interval from one side of some threshold δ. At that from another side of δ Pearson correlation coefficient is close to zero.
It was suggested to consider such three-factor relationship as statistically significant when rejecting three null hypotheses: c H Nonparametric permutation tests with statistics that is optimal value of special quality functional were used to test these hypotheses.
Performance of the method was evaluated in tasks with simulated data. Good concordance between found regularities and patterns provided by the experiment scenario is seen from Table 2.
The method was applied to test supposition that hypoxia control relationship between serum VEGF concentration and some factor from the analyzed clinical database. It was supposed that hypoxia is manifested by oximetry parameters. Three null hypotheses were rejected for set of triples (Y,X i ,Z k ) where Y is VEGF concentration, X i is some additional factor and Z k is some of oximetry parameters.
Significance of hypoxia effect on correlation between VEGF level and additional factor X i may be assessed as combined significance of effects related to different oximetry parameters. Combined significance was evaluated with the help of NPC method testing intersection of null hypotheses related to set of triples i k Y X Z k A single-step permutations based FWE control was implemented to take into account that additional factor is searched among 138 variables.
It was shown that three combined null hypotheses were rejected at significance level p < 0.02 when concentration of complement C4 is the additional factor. Developed technique may be used in variety of biomedical tasks where it is necessary to assess effect of some factor or some group of factors on existing linear ties.