A Signal Detection Analysis of World Health Organization's Pharmacovigilance Database

Citation: Liu Y (2019) A Signal Detection Analysis of World Health Organization’s Pharmacovigilance Database. Int J Clin Biostat Biom 5:023. doi.org/10.23937/2469-5831/1510023 Accepted: December 12, 2019: Published: December 14, 2019 Copyright: © 2019 Liu Y, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


Introduction
Reporting of drug or medical device related adverse reactions (ARs) is usually voluntary. One of the major postmarket safety surveillance databases is the World Health Organization's (the "WHO") global pharmacovigilance database, which contains reports of suspected ADRs, so called Individual Case Safety Reports (ICSRs), collected by national drug authorities in over 110 countries and span over more than 100,000 different medicinal products. Clinical reviewers evaluate adverse reactions reports to look for new safety concerns that might be related to a marketed product, or for a manufacturer's compliance to reporting regulations.    and define MLR i = max(LR ij ) as the test statistic. For computational convenience, we may sometimes work with the log-likelihood ratio log(LR ij ) which is the brief overview of World Health Organization's global pharmacovigilance database is provided. In Section 3, we give a brief review of the likelihood ratio test procedure for adverse reactions detection for a single drug, then we propose a generalized likelihood ratio test procedure, namely GLRT, to detect multiple ARs in a drug class. The performance of GLRT is evaluated using simulated datasets in Section 4. In Section 5, both the LRT and the GLRT are applied to the 2000-2005 and 2005-2010 data from WHO's global database. Section 6 contains some discussion and concluding remarks.

World Health Organization's Pharmacovigilance Database
The WHO's global pharmacovigilance database consists of the individual reports with demographical information, route of administration, drug/biological information, medical history, treatment indication, therapy start dates, and end dates. For adverse reaction detection, Medical Dictionary for terminology of preferred terms is often used to identify the adverse events, such as Death, Stroke, Myocardial infarction, and so on. There are also verbatim drug names in the file for drug/ biologic information. In studying the drug-AE association, the generic name of the drug is used, which refers to the unique chemical makeup of a drug.
The WHO's global pharmacovigilance database includes reports since 1980, however researchers and reviewers are more interested in data from recent years. In this article we focus on cases reported to WHO between 2000 and 2010 for more than 6500 drugs and 14,000 Adverse reactions. For any particular adverse event, the investigators consider all suspect and concomitant drugs.

Inference Procedure
Test procedure for adverse reactions detection of a single drug After summarizing the data files, the WHO pharmacovigilance data can be presented in a tabular form with, say, adverse reactions (ARs) as the row variable and drugs as the column variable (as in Table 1), with n ij as the cell count for i th AR and j th drug, n i. as the sum of counts for i th AR (i th row total) and n .j as the sum of counts for j th drug (j th column total).
We collapse the data structure table into multiple 3 × 3 tables. For a fixed j th drug, we have I such tables ( Table 2), each associated with an AR (i = 1,…, I). We assume that n ij ~ Poisson(n i. × p ij ), where p ij is the reporting rate of j th drug for i th AR; and n .j -n ij ~ Poisson((n .. -n i. )q ij ) where q ij is the reporting rate of j th drug for other ARs combined excluding i th AR. We also assume n ij and n .j -n ij are independent. Since drug j is fixed, unless stated otherwise, we suppressed the notational dependence of p ij and q ij and on j th drug. We define the null hypothesis, The maximum likelihood under both the null and the two-sided alternative hypotheses are obtained by replacing the parameters with their MLEs in the likelihood functions, leading to the likelihood ratio for i th AR in k th drug as: The likelihood ratio test statistic for testing Where i = 1,…I and k = 1,… , K.
Because the distribution of MLR under H 0 is not analytically tractable, we still use a Monte Carlo simulation to obtain its distribution. For each drug k in the drug class under H 0 we generate 499 datasets using and compute 500 values of MLR including the one from the real data, for k = 1,…, K. This results into 500 × K MLR values. The nulll hypothesis is rejected at α = 0.05 level if the value of MLR from the observed dataset is greater than the (1 -α) th percentile of the 500 × K MLR values T α . After AR associated with the largest LR ik is identified as signal (LR ik > T α ), we move to the AR with the second largest value of LR ik , determine if it is a signal and so on. This way, the generalized likelihood ratio test procedure controls Type-I error. It also controls the false discovery rate (FDR) with FDR ≤ α.

Applications
In the following, we present the results from applying the likelihood ratio test procedure discussed monotone function of LR ij .
The distribution of MLR under H 0 is not analytically tractable and is obtained using Monte Carlo simulation as defined below. First, the number of cases for each AR, for a given drug j, are simulated under H 0. Under H 0 , since n 1j ,…,n Ij , given the margin totals n 1. ,…,n I. are independent Poisson(n i. p 0 ), i = 1,…,I, the joint distribution of (n 1j ,…,n Ij ) conditioning on n .j and (n 1. ,…n I. ) is A total of 499 datasets under H 0 are simulated from the multinomial distribution, and 500 MLRs are calculated. The null hypothesis is rejected at the α = 0.05 level if the value of MLR from the observed dataset is greater than the 95 th percentile of the 500 MLR values (threshold, T α ). The corresponding p-value is then 1-R/500, where R is the rank of the observed MLR among all the 500 MLR values. If the p-value of the observed MLR is less than α (say, 0.05), then the AR associated with this MLR is the strongest signal among all ARs for the j th drug under consideration. Having found the strongest signal, we can then move to the second largest LR ij , and so on, and declare them as signals if their LR ij are greater than T α or the corresponding p-values are less than α.
The likelihood ratio test is shown, analytically and through extensive simulation study, to control type-I error and false discovery rate (FDR) while retaining good power and sensitivity ( [7,8]). In the next section, we generalize the likelihood ratio test procedure to detect all AR signals in a drug class. The methods to detect drug signals for a set of prespecified ARs can be performed in a similar fashion.

Test procedure for adverse reaction detection of multiple drugs
In order to develop a test statistic that can identify adverse reactions of multiple drugs in a class, we assume that a drug class has K different drugs (usually K is a small number), and we assume that for k th drug the number of reports for i th AR and all other ARs (excluding i th AR) still remains a Poission distribution: ( tected, respectively. Across the four drugs, the GLRT detects less ARs than the LRT. By cross-checking the ARs in the four MAOIs drugs, there are 23 common ARs detected within this drug class. The top ARs are listed in Table 4 and Table 5, and postural hypotension, high blood pressure, fainting, abnormal heart rhythm, dizziness, headache, drowsiness are the most strong ARs for this MAOIs class.

Data simulation
We then study the performance of the generalized likelihood ratio test (GLRT) using simulated datasets. We simulate datasets based on the four drugs in the monoamine oxidase inhibitors drug class in WHO's global pharmacovigilance database.
Under the null hypothesis, the data are simulated from multinomial distribution (3). Under the alternative hypothesis, data are generated as follow, Where k = 1,…5, and rr 1k ,…, rr Ik are the relative reporting rates for AE 1 , …, AE I in K drugs with constraints reporting rates rr ik are specified as follows: rr ik are assigned a value; higher than 1 for ARs selected as signals and 1 for all other ARs not selected as signals. r 0k can be regarded as baseline risks for drug k, and r 0k can be different from one drug to another.
We evaluate how the relative reporting rate (rr), the sample size (n .k ) and the number of signals affect the performance of the GLRT through the following four in Section 3 to the "Monoamine oxidase inhibitors" (MAOIs). The MAOIs are used to treat several conditions. They include, but are not limited to: Depression, generalized anxiety disorder, agitation, obsessive compulsive disorders (OCD), manic-depressive disorders, childhood enuresis (bedwetting), major depressive disorder, diabetic peripheral neuropathic pain, neuropathic pain, social anxiety disorder, posttraumatic stress disorder (PTSD) etc. The drug class includes Nardil (phenelzine), Parnate (tranylcypromine), Marplan (isocarboxazid), Emsam (selegiline), etc. We select four MAOIs labeled as MAOI1, MAOI2, MAOI3, MAOI4 and MAOI5 (not in any specific order to mask their names) using the WHO 2000-2005 and 2005-2010 data set. The purpose of this analysis is to identify the ARs signals (with high disportionality rates) associated with MAOIs drug class. We apply the likelihood ratio test (LRT) and generalized likelihood ratio test (GLRT) for detecting Adverse Reactions.
The results of MAOIs drug class using both the LRT and GLRT are listed in Table 3. By using the likelihood ratio test procedure to each of the four drugs in the drug class, there are 66, 37, 74, 45 ARs detected for the four MAOIs drugs; while using the generalized likelihood ratio test, there are 61, 32, 68, 39 ARs de-   across the drug class, but signals are not necessarily common between drugs.
• Senario 4: We take a similar process as Scenario 3, randomly select 30 signals for each drug independently, but we use inhomogeneous rr. A rate of 2 × rr is assigned to those AR signals for which n i. (the total num-scenarios: • Scenario 1: One signal is randomly assigned to one drug, and the remaining of other four drugs are free of signals. Without loss of generality, we assign one signal to the drug with the column total as 12000.
• Scenario 2: We randomly assign 30 common signals in each drug over the drug class with homogeneous relative reporting rate.
• Senario 3: We randomly assign 30 signals in each drug using homogeneous relative reporting rates (rr)  Number of reactions falsely detected in i simulated data (V) Total number of reactions detected in the i simulated data (V+S) FDR ∑ All power, ST and FDR have values between 0 and 1. As we shall see in next section, GLRT have high sensitivity, low FDR, and to control Type-I error α which indicates its superiority over the conventional likehood ratio test.

Simulation results
The simulation results shown in Table 7 include power, sensitivity, and false discovery rate for the different scenarios described in Section 5.1.
In Scenario 1, one signal was assigned to the ARs with the relative large or moderate marginal counts (28,216 and 4,362). With fixed rr = 3, n i. = 28216 and sample size n .j = 500, the power is 0.073, ST is 0.13 and FDR is 0.0565. As the sample size n .j increases, the power and ST increase to 1, and FDR decreases from 0.06 to 0.03. When the sample size of AR is fixed at n i. = 28216, with the increase of rr from 1 to 7, the power increases from 0.06 to 0.75, and then to 1. The same increasing trend is also observed for ST. FDR decreases from 0.05 to 0.03, a value much lower than ber of reports for the ARs) fall between 35,000 and 40,000, a rate of 3 × rr to those AR signals for which n i. fall between 20,000 and 25,000, a rate of 4 × rr to those AR signals for which n i. fall between 15,000 and 20,000, and a rate of 5 × rr to those AR signals for which n i. fall between 6,000 and 12,000. rr is assigned to 1 for those ARs that are not selected as signals.
In each simulation, we generate 1,000 datasets.

Performance characteristics evaluation
The performance of the proposed methods is evaluated by using Power, sensitivity (ST) and false discovery rate (FDR). First, power is defined as: Where L = 1,000 is the total number of simulations. H 0 will be rejected when at least one AR in any one drug (in the drug class) is signal.
The sensitivity of a test is the proportion of positive results that are correctly identified. In our case, sensitivity is defined as: The definition of FDR can be illustrated by a 2 × 2 table as in Table 6, where V is the number of falsely deteced signals, S is the number of correctly detected sig- vides a useful tool to identify potential adverse reactions in pharmacovigilance database. However, the final discovery of the true adverse reactions should also be based on a thorough review of all available medical records.

Competing Interests
The author declares that he has no competing interests.
the level of significance. The effect of sample size is also evaluated when n i. is fixed at 4362. The trends remain similar for the Power, ST and FDR, though the change in trends is relatively slower.
In Scenario 2 where 30 common signals are assigned to all the four drugs in the drug class, when rr = 1, the power and FDR are both 0.06. As rr increases, the power increases to 1, the ST increases from 0.01 to 0.85, and the FDR decreases from 0.064 to 0.0009. Because multiple signals are assigned randomly, we use actual sample size (AS) for n .j . Similar trends of power, sensitivity and FDR are also found for Scenarios 3 and 4.
Besides the effect of relative reporting rate (rr), the effect of number of selected true signals on the performance of GLRT is also studied. In Scenario 2, if the number of signals are changed to 10 and 20, similar trends are observed for the power, sensitivity and FDR, as in Table 8. As rr increases, both the power and sensitivity increase, and the FDR decreases. If rr is fixed, as the number of selected signals increases, the power increases but the FDR decreases.

Discussion and Concluding Remarks
In this paper we generalized the likelihood ratio test procedure to detect adverse event for a class of drugs and applied it to the WHO's pharmacovigilance database. The proposed methods can also be used to detect drug adverse reactions in a group of pre-specified adverse reactions by renaming the row and column variables. One of the advantages of the generalized likelihood ratio test presented here is that the methods can be used to find multiple adverse reactions with both the Type-I error and false discovery rates controlled while retaining good power and sensitivity. We note that the GLRT tends to detect less adverse reactions than the LRT method. This is to be expected, since the threshold in the GLRT of the drug class is greater than or equal to those from each individual drug using the LRT, thus it is more conservative.
The generalized likelihood ratio test procedure pro-