Bridging Strategies for In Vitro Diagnostic Clinical Trials in a New Region

Citation: Magari R, Hasan M, Lo K (2020) Bridging Strategies for In Vitro Diagnostic Clinical Trials in a New Region. Int J Clin Biostat Biom 6:028. doi.org/10.23937/2469-5831/1510028 Accepted: July 08, 2020: Published: July 10, 2020 Copyright: © 2020 Magari R, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

last year [5]. This forum organized by Chinese NMPA and aiming to address policies and regulations on medical devices, evidenced the increasing demand for multiregional clinical trials and bridging strategies to evaluate diagnostic products in new regions.
Accuracy performance is a way to evaluate IVD devices in clinical testing. ISO 5721-1 defines accuracy as the closeness of agreement between a test result and an accepted reference value [6]. The choice of reference depends on the diagnostic test and the type of device being tested. A newly discovered test is commonly evaluated for its accuracy to the clinical diagnosis. Clinical trials where diagnosis is used as a reference method are called 'diagnostic clinical performance' trials and accuracy is commonly evaluated in terms of sensitivity and specificity [7,8]. For other IVD tests, a marketed device can serve as a predicate for the candidate device under test and agreement with the comparative method is evaluated based on the closeness of the agreement of the measurements. These types of clinical trials are referred to as 'method comparison'. Bias between devices is an evaluator of agreement that can be estimated statistically by using a regression approach [9].

Introduction
Multiregional clinical trials have been used in the last decade or so to evaluate therapeutic products as part of developing global medicines. Approaches applied to these trials are summarized in the International Conference on Harmonization guideline ICH E5 [1], while several papers addressing bridging studies for therapeutic endpoints have laid the foundation for statistical procedures [2][3][4]. Recently, market requirements and regulatory landscape are changing for In Vitro Diagnostic (IVD) devices as well. Topics of multiregional clinical trials and bridging strategies were vastly discussed during the International Medical Device Regulatory Forum of China devices are measured with error. Knowledge about the precision of the device, AMI of the analyte, and a prespecified clinical tolerable bias (B s ) can be used to prospectively calculate power and size the clinical trial.
Hypotheses for equivalence between the predicate and test device in method comparison are: Prespecified bias, B s is usually symmetrical around zero. The absolute value of the bias represents the clinical relevance for an analyte while the sign indicates the direction of the bias. However, in specific comparisons, bias might be asymmetrical.
Let P be the prospective power to size the original clinical trial. Let B  and ( ) se B  be the estimated bias and standard error from the original clinical trial, and t c a statistic that follows a t-distribution with (N-2) degree of freedom.
be the post experimental power of the original trial. We can use Owen's function as [18]: Where t is from t-distribution with 1-α confidence and n-2 degrees of freedom, Φ(.) is the cumulative distribution function and ϕ (.) is the density function of the standard normal distribution, while, Power can be calculated as the difference: since precision and AMI are characteristic of the device they do not de-information to assess the similarity between the original clinical trial and bridging studies in the new region. All these methodologies were developed for therapeutic endpoints, i.e., comparing test results to a placebo, survival endpoints, etc. We are not aware of any publication addressing bridging methodologies to support the introduction of IVD products into new geographies or region.
Consequently, the purpose of this paper is to provide a strategy for bridging the results of an original clinical trial into a new region for IVD devices. We will consider the two most common types of clinical trials for IVD, method comparison and diagnostic clinical performance trials.

Method Comparison Trials
This section is only for method comparison studies for quantitative measurement procedures. Let X be the measurement with the predicate device and Y the measurement with the test device. Both devices are tested in N subjects, with X i and Y i being the measurement of both predicate and test devices, respectively, on the same i subject.
The linear function (1) fits the relationship between devices.
Where b 0 and b 1 are intercept and slope, and ɛ is the error.
Since both test and predicate devices measure with a certain degree of imprecision, Linnet [15] proposed an orthogonal least square approach to estimate parameters of (1). This approach is usually referred to in the IVD literature as Deming approach [9]. Non-parametrical estimation of the parameters is also commonly used in the IVD industry [9,16].
Bias (B) between two methods at a certain medically important level X c can be calculated as: The standard error of B can be calculated based on ordinary least square variances and covariance of the slope and intercept. Linnet [17] argued that for a relatively large analytical measuring interval (AMI), slope close to 1 and equal precision of both devices, orthogonal least square estimate of var(B) can be approximated by its ordinary least square estimate. Considering variances-covariance of the regression estimates, se(B) can be approximated as: Where σ is the precision and X is the average of the predicate device. se(B) is proportional to precision, AMI, as well as the distance of X c from the mean. The function under the square root is multiplied by 2 since both

Example
A clinical trial to comply with the pre-market regulatory requirement for testing Beckman Coulter DxH 520 hematology analyzer was conducted at different sites in the United States. DxH 520 was the test device while UniCel DxH 800 Coulter Cellular Analysis System was the predicate device. Hematology analyzers measure several hematology parameters at the same time. In the context of this example, we will focus on only one parameter, platelet (PLT) which is a cell type related to blood coagulation. PLT expressed as cell counts/µL × 10 3 was measured by both devices on N = 196 subjects. Because of a large number of counts for each measurement, both X i and Y i are assumed to be normally distributed, with means µ x and µ y and the same precision, σ.
Method comparison was used to evaluate the agreement and establish equivalence between devices. PLT counts in humans exhibit relatively wide range depending on clinical conditions. This can affect the variability of measurements throughout the AMI. In addition, the relative variability (CV%) of precision in the low range is much larger in comparison to the rest of AMI. Consequently, there is a pattern of dependency of variability within the AMI that required weighted adjustments for the estimation of regression parameters.
se B  The two-sided confidence intervals of B  were calculated as (α = 0.05 and N = 196): pend on the population. Consequently, Δ is only a function of the difference in bias.
Consistency between the results of the original trial and the expected results in the new region can be expressed as [12]: A consistency probability of γ ≥ 0.8 and the values of ρ ≥ 0.5 are generally acceptable. When D and D new are normally distributed, Ikeda & Bretz [12] showed that equation (5) can be approximated by a normal cumulative density function and the proportion of subjects in the new region can be calculated as:   Comparison of a test result to clinical truth can have four outcomes; True positive = (C = 1, X = 1), False negative = (C = 1, X = 0), False positive = (C = 0, X = 1) and True negative = (C = 0, X = 0). Let N be the total number of subjects in the trial. N = N TP + N FN + N FP + N TN where N TP , N FN , N FP , and N TN are the numbers of true positive, false negative, false positive and true negative respectively. Based on these outcomes, several conditional probabilities can be calculated to evaluate the diagnostic ability of the test [19]. The most common are sensitivity, specificity, positive/negative predicted values, positive/negative likelihood ratios, etc. In this section, we will focus on sensitivity, which is the proportion of subjects with disease that are tested positive. The same approach can be used for specificity and predicted values. S = p (X = 1 | C = 1) Let S be a target value for sensitivity and S s a lower limit of sensitivity that is clinically acceptable. Prospective power (P) can be directly calculated from the cumulative normal distribution function as: While the required number of diseased subjects for the original clinical trial is: The similarity index is only a function of differences in sensitivity between the original clinical trial and the expected estimates in the new region, Δ = S -S New , while new P  can be calculated based on the power function (9), S and Δ. When D = S -S s and D new = S new -S s , consistency between the results of the original trial and new region and proportion of subjects in the new region can be calculated based on equations (5) and (6).

Example
Data from the sepsis clinical trial presented in Crouser, et al. [20] were used for demonstration purposes. Clinical trial conducted at three different sites in the US, enrolled N Disease = 385 sepsis positive subjects out of a A 10% bias is usually tolerable for PLT. Consequently, B s = 15 cells/µL × 10 3 at X c = 150 cells/µL × 10 3 and B s = 45 cells/µL × 10 3 at X c = 450 cells/µL × 10 3 . Results are provided in Table 1.
Using this information and equation (4) were also calculated for the same interval. The summary of the results for X c = 150 are shown in Table 2 (Table 2).
Decision for a bridging study corresponds to Δ = 6 for X c = 150 (Table 2) and the fraction of the difference from the prespecified clinical tolerable bias is ( -) 0.6. This fraction corresponds to Δ = 18 for X c = 450. Calculation for P  and ρ and decision are shown in Table 3 starting from Δ = 17. All decision results for a reasonable interval indicate that no clinical trial is need in the new region.
Based on the results from the two medically important levels, is it evident that bias at X c = 150 level represents the worst-case scenario and drives the decision on the clinical trial. The relative value of this bias is 1.24% compare to 0.92% at X c = 450.

Diagnostic Clinical Performance Trials
Let C be a binary variable denoting the true clinical status for a disease. C = 1 when the disease is present and C = 0 when the disease is absent. Let X be the results of an IVD test, where X = 1 indicates the presence of the disease while X = 0 indicates the absence of the disease. X can have a binary outcome or can be measured on a continuous scale. When measured on a continuous scale, X can be transformed into binary by using a cut-off point that separates the values of X in two categories, disease present or absent.  Figure 1 and Table 6 show  Table 5. At least 122 negative samples are needed in the new region.

Reproducibility of the Original Trial
The performance in the original trial expressed in terms of reproducibility has a significant effect on sample size of bridging study. To demonstrate this, we simulated data calculating sample size for bridging study by varying P  and ρ. The proportion of the sample size was calculated for the significance level of α = 0.05 and con-    and others are not considered here but might be subject to future publications.
Bridging studies should be considered when the expected agreement in the new region is inferior to the agreement in the original trial. In method comparison, a bridging study in the new region is not needed, if bias in this region is equal or smaller than the estimated bias of the original trial. Similarly, no bridging study is necessary for diagnostic clinical performance trials when sensitivity in the new region is equal to or greater than the sensitivity of the original trial.
We are using the ratio of reproducibility and the ratio of differences from the specification (ρ) of the new region to the original trial to decide about the clinical trial in the new region. Shao and Chow [10] recommended a 90% level of reproducibility ratio as a cutoff level for considering a bridging study in a new region. Values of ρ ≥ 0.5 are based on the Japanese Ministry of Health, Labor, and Welfare guidance and are used in several publications [11,12,25]. Both these values are related to therapeutic drug testing and authors recommend modifying them according to the specific product being tested. the difference in agreement between the original trial and the expected agreement in the new region. We also showed that the reproducibility of the original trial significantly affects sample size of the bridging study.
The reproducibility of the original trial is related to the prospectively calculated sample size and estimated agreement performance in the trial. Sample size should be based on at least 80% power (90% is preferred for method comparisons) while estimated performance depends on the product being tested. Prospectively powering a method comparison trial separately for slope and intercept has been discussed by Linnet [17] and Passing & Bablok [21]. We are showing an approach to approximate standard error of bias based on variance-covariance structure of slope and intercept and use the knowledge about bias acceptance limits, precision, and measuring interval to calculate sample size for a predetermined power and confidence level. Other approaches for evaluating agreement in method comparison clinical trials are not discussed in this paper [22,23]. Powering of diagnostic clinical performance trials is based on normal approximation of a binomial proportion. Approaches like the exact binomial calculations [24], powering both sensitivity and specificity Pepe [19], There is no historical evidence or literature that we know about using these values to design bridging studies for IVD. We are recommending the following steps: Sometimes there might be no solutions for the combination of values above. In these cases, a new set of expected differences in step 2 will need to be simulated. We also caution that the values of ρ > 0.6 could require the bridging study to be too large and not practical. Ko, et al. [25] also argued that larger values of ρ would indicate that the overall results are dominated by the new region and consistency of the results between the original trial and the new region might not be valid. For these reasons, we recommend that the sample size for bridging study be based on the minimum proportion in the interval of, Bias in method comparison studies might be calculated at different medically important levels. We recommend identifying the highest relative bias to the prespecified clinical tolerable bias at a medical level and make decision for bridging studies for that bias. In addition, some IVD devices provide measurements for multiple analytes. In these cases, we recommend that calculations be performed for each analyte separately and decision be made based on the worst performing analyte. Similarly, both sensitivity and specificity need to be considered in diagnostic clinical performance trials.
Sample size for bridging study represents the minimum sample size to satisfy the pre-determined requirements and conditions. However, if necessary, more subjects can be tested to uniformly cover the AMI.
Like other authors in the therapeutic domain, we recommend that the choice of new P P   and ρ to be based on the clinical relevance of IVD under test as well as regulatory requirements in the new region. In addition, recommendations for bridging studies in this paper are