Corresponding author: William J. Montelpare, Ph.D.; School of Kinesiology, Lakehead University; Thunder Bay, Ontario, Canada phone: 807.343.8481; fax: 807.343.8944 email: WMONTELP@FLASH.LAKEHEADU.CA.
|Abstract||Introduction||Figure 1||Building the webulator||Figure 2||Sample Application|
|Computation of kappa||Your Turn||Decision Rules||References|
This paper demonstrates the use of "client-side" scripting on the internet to produce a 2 x 2 calculator which can compute the McNemar test of symmetry and the kappa statistic. These two estimates are often combined in analyses which evaluate paired response data having a dichotomous outcome. The application is demonstrated using a comparison between a laboratory test and a field test. The web-based calculator, also referred to as a "webulator" computes the McNemar test of differences statistic as a "z" score, and the kappa measure of agreement. The tabular output includes the two statistics, and provides the probabilities associated with the statistics which can then be used to test the null hypothesis. The McNemar test and kappa statistic are extremely valuable in quantitative methods for health research where two or more tests are evaluated to establish differences and/or associations. The availability of a valid and reliable tool on the internet such as that demonstrated here can reduce the data processing time and assist in the interpretation of outcomes by researchers.
Application of McNemar's test of symmetry and kappa statistic in health research
In many health applications, individuals are evaluated on more than one test to establish the validity of information against existing criteria or against a gold standard. For example, in determining an individual's maximal ability to deliver oxygen to the working muscles (Vo2 max) an individual may be tested on an accepted field test, such as the "one mile walk test" and then again on the "gold standard" maximal treadmill test. An appropriate determination of validity for the field test is a "chi-square" to test differences between individuals' responses on the two tests.
When computing the chi-square for independent samples, a portion of any difference in response could be attributed to intrinsic differences between participants. McNemar's test of symmetry determines the equivalence in responses to two independent factors, either by a single subject over conditions, or between subjects. The approach is therefore considered to be pair-wise. The assumption of the pair-wise comparison is that by using the same group of individuals, tested at two different times, the researcher will reduce the heterogeneity of variance which may occur when comparing data from independent samples. The approach is thus intended to reduce differences in the response distributions attributable to intrinsic differences between participant groups. Further, in a 2 by 2 model as illustrated in Figure 1, the proportion of pairs of individuals arranged for each of the possible outcomes is expected to be equal across the four cells (i.e. 25% of pairs in cell "a", cell "b", cell "c", cell "d").
FIGURE 1. Structure of the 2 x 2 design used in McNemar and kappa calculations
The McNemar equivalence estimate (expressed as either a chi-square or z test) focuses on "discordant pairs", also referred to as the "off-diagonal" elements (i.e. the paired data in upper right cell --cell "b", and lower left cell --cell "c"). The McNemar procedure tests the equality of frequencies in pairs of cells that are symmetric around the diagonal of a 2 by 2 design (the diagonal elements are the paired data in upper left cell --cell "a", and lower right cell --cell "d"). In the computation of the McNemar equivalence estimates, the frequencies in the major diagonal (upper left cell to lower right cell) are ignored. The null hypothesis (Ho: p1. = p.1) implies that the proportion of individuals who score high on the field test and low on the lab test will match the proportion of individuals who score low on the field test and high on the lab test.
The kappa statistic, in contrast, is a measure of agreement. Kappa computations focus on the data in the major diagonal from upper left to lower right, examining whether counts along this diagonal differ significantly from those expected by chance (Streiner, Norman, and Blum, 1989). If there were no agreement between the responses on the two tests then we would expect an identical proportion of individuals to score high versus score low on the field test for individuals who scored high on the lab test.
The application of McNemar's test and the kappa statistic to matched pairs of data was demonstrated by Suchower and Copenhaver (1996) and earlier by Fleiss (1981) who indicated that, especially in research studies that compare pairs of outcomes, such as comparisons of ratings between independent judges, some researchers felt the need to express the extent to which the measure of agreement is beyond that which is expected by chance. As such, while the McNemar test provides an appropriate estimate of differences in responses, Fleiss (1981) described the kappa statistic as a useful technique to correct for chance agreement.
|Building the webulator|
The McNemar-kappa webulator is a relatively simple tool
to create for the internet because it is merely an extension of a two by two data input
commands" or " scripts" that were embedded in the HTML document file
(Figure 2 below). Scripts are more complex computer language statements, which invoke
"java" languages. Further, using the interpreted computer languages, scripts are
written to the user'
s computer (the client) as part of the published web page. In this way, the scripts and HTML code pages, are passed directly to the user during an internet session (Montelpare and McPherson, 1999).
The commands to produce the webulator shown
in Figure 2 are available by requesting to view the source code. The statistical formula
for the McNemar test of symmetry and the kappa statistic are presented in Appendix 1 below using the notation of Suchower and Copenhaver
(1996), after Fleiss (1981). Likewise, a SAS program and corresponding output for these
data are presented in Appendix 2 below.
|The McNemar Kappa Webulator|
2. A webulator for McNemar and kappa Statistics
Enter values for the "a",
"b", "c", and "d" cells (the coloured cells) in the spaces
then click the "Calculate" button to compute the remainder of the table values.
McNemar's test assumes that response data are binary. Therefore, the first step is to arrange your data so that they represent a set of binary outcomes. A simple approach to preparing the data is to list the pair-wise response data from the scores on two measures in column format and then separate the columns at the median scores for each variable as shown in Example 1, below.
Here data are presented as an individual's response to a field test and to a lab test. The median score for the field test was 27 while the median score for the lab test was 54. Notice, in the table below, the chi-square cell assignment is given, based on the pair-wise scores relative to the two median scores. For example, the first observation scored "45" on the field test which is above the median. Likewise, this individual scored "54" on the lab test which is at the median for this variable. Therefore, the corresponding cell assignment in a two by two chi-square for paired response data is (+,+)=cell "a".
|Example 1. Creating a Median Split for response data
(field test scores)
(lab test scores)
|Compare the subject's score to the median score for the variable. Use (+,-) to indicate the score's position relative to the median score for the entire variable|
|001||45||54||+ + (cell a)
|002||34||23||+ - (cell c)
|003||76||33||+ - (cell c)
|004||55||44||+ - (cell c)
|005||64||53||+ + (cell a)
|006||26||37||- - (cell d)
|007||47||55||+ + (cell a)
|008||37||62||+ + (cell a)
|009||70||38||+ - (cell c)
|010||71||37||+ - (cell c)
|011||15||64||- + (cell b)
|012||14||63||- + (cell b)
|013||16||63||- + (cell b)
|014||15||64||- + (cell b)
|015||14||63||- + (cell b)
|016||16||67||- + (cell b)
|017||17||65||- + (cell b)
|018||17||62||- + (cell b)
|019||10||68||- + (cell b)
|020||11||67||- + (cell b)
|021||25||54||- + (cell b)
|022||34||13||+ - (cell c)
|023||26||53||- - (cell d)
|024||35||14||+ - (cell c)
|025||24||53||- - (cell d)
|026||36||17||+ - (cell c)
|027||27||55||+ + (cell a)
|028||37||12||+ - (cell c)
|029||20||58||- + (cell b)
|030||31||17||+ - (cell c)
|031||15||14||- - (cell d)
|032||14||13||- - (cell d)
|033||16||12||- - (cell d)
|034||15||11||- - (cell d)
|035||14||13||- - (cell d)
|036||76||77||+ + (cell a)
|037||77||75||+ + (cell a)
|038||77||72||+ + (cell a)
|039||70||78||+ + (cell a)
|040||71||77||+ + (cell a)
Next determine the cell assignment in the 2 x 2 table for each subject based on whether they scored above or below the median score on both variables (this is indicated with "+" for scores at or above median score, and "-" for scores below the median score). The simplest way to locate the median score is to arrange the set of scores from lowest to highest, use the following formula of Freund and Simon (1991) median = (n+1)/2, which gives the position of the score with in the data set. In the example given here the median can be found at postion 20.5 since there are 40 scores in the data set for each variable, therefore (40+1)/2 = 20.5. After organizing the scores in rank order from lowest to highest we see that the median score for the first variable is 27, while the median score for the second variable is 54.
The McNemar's test can be applied to large data tables, but when data are arranged in a 2 by 2 design there can be any of the following outcomes:
|"positive" on field test with "positive" on lab test (cell a)||"positive" on lab test with "negative" on field test (cell b)||"negative" on lab test with"positive" on field test (cell c)||"negative"on field test with "negative" on lab test (cell d)|
The following is the rank order presentation of the data for variable 1 in the "sample table above": 10,11,14,14,14,14,15,15,15,15,16,16,16,17,17,20,24,25,26,26,27,31,34,34,35,36, 37,37,45,47,55,64,70,70,71,71,76,76,77,77
The median score for this data set is 27
The following is the rank order presentation of the data for variable 2 in the "sample table above": : 12,13,13,13,13,14,14,14,17,17,23,33,37,37,38,44,53,53,53,54,54,55,55,58,62,62,63,63,63,64,64,65,67,67,68,72,75,77,77,78
The median score for this data set is 54
By organizing the data according to the cells -- a through d, we observe the following frequencies: cell a = 9, cell b = 12, cell c = 11, cell d = 8.
A calculation of Chi-square or z then determines the extent to which the proportions of discordant pairs are equivalent. Entering the data into the webulator above using the following arrangement.
and clicking on the button labelled
"calculate", we observe the following results.
The results of the webulator computation
for the first data set are as follows; the McNemar z score is 0.21, and the kappa
statistic is -0.149 with a corresponding z for kappa of -0.949. The standard error of the
kappa statistic is 0.15613 which gives an upper 95% confidence interval of 0.15601 and a
lower 95% confidence interval of -0.45601. Therefore, we would interpret these results to
suggest that there is no significant difference in the group of individuals that scored
high on variable 1 and low on variable 2 versus those individuals that scored low on
variable 1 and high on variable 2. Likewise, the non-significant kappa statistic (z for
kappa= -0.949) indicates that there is no association between the responses on variables 1
and 2 used in this example. In order to verify that the webulator was indeed computing the
values accurately, we ran a SAS program using the data from example 1. The code used in
the SAS program is included in Appendix 1 below. The results of the SAS analysis are
presented below. Notice that all values are equivalent between the webulator and the SAS
output, any slight differences can be attributed to rounding.
SAS output for computation of McNemar z
and kappa in Sample 1 data set
StdErrMcNemar z= 0.15792
Po=0.425 Pe=0.5 kappa=-0.15
Kappaz=-0.94987 StdErrkappa z=0.15613
Kappa CI95LLA= -0.45601 Kappa CI95ULA= 0.15601
|Computation of kappa|
Unlike the McNemar test, the kappa statistic uses the "elements of the diagonal", namely the data in cell "a" and cell "d" from the 2 x 2 design. The steps used to compute the kappa statistic, the 95% confidence interval for kappa, and the test of the null hypothesis (H0: k = 0) are presented in Appendix 1 below using the notation of Suchower and Copenhaver (1996), after Fleiss (1981). Likewise, a SAS program and corresponding output for these data are presented in Appendix 2.
Use the following data set and the webulator above to
compute the McNemar test and kappa statistic for a 2 x 2 arrangement of subjects. The
scenario is as follows: you are presenting a course in which you wish to demonstrate the
accuracy of a field test to measure an individual's maximal ability to deliver oxygen to
the working muscles (Vo2 max) against the commonly accepted gold standard " the
laboratory maximal treadmill test". The data are listed below in three variables for
60 subjects(subject's ID, the field test response, and the lab test response).
Split the two "outcome variables field test and lab test" at their respective median scores. Arrange the data so that each subject is assigned to one of the four cells "a, b, c, or d" based on their scores on the two outcome variables with respect to the median splits. Enter the frequencies for each cell (i.e. "n" cases were assigned ++ cell "a") into the appropriate cells of the webulator and click the button labelled "calculate" to compute the McNemar test, the kappa measure of association, and the probabilities associated with the computed values.
Compare your results
In this sample data set the median split for the field test is 42 and the median split for the lab test is 44. Therefore, in the 2 x 2 table the results are as follows: there are 24 cases in the "a" cell, 7 cases in the "b" cell, 6 cases in the "c" cell, and 24 cases in the "d" cell. Entering these values into the webulator above will produce a McNemar z score of 0.28 with a standard error of 0.129. The kappa statistic is 0.566 with a zkappa of 4.391. The standard error for kappa is 0.1063 with a corresponding 95% lower confidence limit of 0.3582 and an upper 95% confidence limit of 0.775. A SAS program to compute these data and compare responses is presented in Appendix 3 below. The results of the SAS program are identical to the output for the webulator.
When the McNemar test is used in a 2 by 2 research design, there is only one degree of freedom. The corresponding maximal chi-square critical value at which we would accept the null hypothesis (Ho: "cell b" = "cell c") at p<0.05 is 3.84, which refers to a "z" score = 1.96. Recall that the z score is the square root of the chi-square score, and therefore the McNemar chi-square "observed" values must be greater than the chi-square critical value of 3.84 or the z critical score of 1.96 in order to reject the null hypothesis (Ho : p1. = p .1) for any comparisons.
If the result of the McNemar test is less than the critical values then the researcher must accept the null hypothesis of no difference between the proportion of individuals who scored below the median score on the field test but above the median score on the lab test versus the proportion of individuals that scored above the median score on the field test but below the median score on the lab test.
The results from the computations of the kappa statistic provide different conclusions than the results from the McNemar tests of symmetry. Considering the alpha level of p= 0.05 as an acceptable critical value, or point at which to establish statistical importance, the researcher should observe whether or not the z-score for kappa is greater than 1.96, since z=1.96 is the critical value associated with the commonly accepted p=0.05 alpha level. Further, the z critical of 1.96 (p=0.05) is the value which is therefore used to determine the statistical significance of the estimated association.
The kappa statistic is translated as the percent agreement in responses. The webulator provides the values for the kappa statistic as well as the corresponding z score for kappa, and the 95% confidence interval of [C.I.lower limit= measured percent; C.I. upper limit =measured percent].
Although there may be no significant differences computed for the McNemar test, a significant z kappa statistic may be observed. The interpretation of the kappa statistic is as follows. For the specific statistically significant kappa statistics, under the null hypothesis (H0 : k = 0), kappa indicates that the agreement between the test responses was unlikely to occur merely by chance.
Further, in such examples where no significant differences were computed for the McNemar test but significant measures of association are computed for kappa, the results suggest that a statistically significant proportion of the sample (kappa value written as a percent),agreed in the way that they responded on each test.
Such results demonstrate the effectiveness of combining measures of difference with measures of agreement in research designs, where outcomes may not be significantly different, but should not necessarily be construed as similar.
This paper demonstrates the utility of the internet to provide tools that are useful to researchers. In the present example, a "webulator" was developed that could provide the computations of McNemar's z test as well as the kappa estimate of association. The use of the webulator reduces the computational errors which may occur for moderate to large sample sizes. Likewise, by having such tools posted to the internet, researchers can reduce the time required to work through such tedious and cryptic computations. webulators of the type presented here increase the accuracy and reliability of reported results, and eliminate errors in computations which may have either a direct or indirect effect on research decisions or tests of hypotheses.
Agresti, A.(1990). Categorical Data Analysis. Toronto: John Wiley & Sons.
Fleiss, J.L. (1981). Statistical Methods for Rates and Proportions. 2nd edition. Toronto: John Wiley and Sons.
Lehman, E.(1975). Nonparametrics: Statistical Methods Based on Ranks. Toronto: McGraw-Hill.
Montelpare, W.J., and McPherson, M.N.(1999). Data processing across the internet: A Model For Design. I.E.J.H.E. 2(3), 127-137.
Collis, B., (1996). The Internet as an Educational Innovation: Lessons from Experience with Computer Implementation. Educational Technology, November-December, 21-30.
Musciano, C., Kennedy, B., (1996). HTML, The Definitive Guide, Bonn: O'Reilly& Associates, Inc.
Richer M., Richer, J., (1997).Official Netscape Livewire Book, Research Triangle Park: Ventana Communications Group, Inc.
Starr, R., (1997). Delivering Instruction on the World Wide Web: Overview and Basic Design Principles. Educational Technology, May-June, 7-15.
Stemler, L.K., (1997). Educational Characteristics of Multimedia: A Literature Review. Journal of Educational Multimedia and Hypermedia, 6(3/4), 339-359.
Streiner, D., Norman, G., & Munroe-Blum, H. (1989). PDQ Epidemiology. Toronto:B.C. Decker.
Suchower, L.J., and Copenhaver, M.D. (1996). Using the SAS System to Perform McNemar's Test and Calculate the kappa Statistic for matched pairs of data. Proceedings of the NorthEast SAS Users Group Conference. Boston, MA: 686-693
Appendix 1 -- Computation of kappa
Appendix 2 -- A SAS program to compute McNemar z and kappa in Sample 1.
SAS output for computation of McNemar z and kappa in
Sample 1 data set
StdErrMcNemar z= 0.15792
Po=0.425 Pe=0.5 kappa=-0.15
Kappaz=-0.94987 StdErrkappa z=0.15613
Kappa CI95LLA= -0.45601 Kappa CI95ULA= 0.15601
Appendix 3 -- A SAS program to compute McNemar z and kappa in Sample 2.
options pagesize=90 linesize=64;
/* this analysis created by Wm. Montelpare and M.N. McPherson 06_00 */
input @1 id field lab;
1 45 49
21 32 38
41 55 49
2 49 49
22 63 44
42 25 29
3 33 36
23 70 68
43 29 31
4 40 36
24 60 56
44 31 27
5 33 50
25 22 30
45 25 52
6 41 48
26 52 54
46 45 39
7 53 56
27 30 28
47 24 42
8 56 46
28 51 47
48 60 59
9 73 66
29 40 38
49 44 32
10 36 36
30 41 45
50 36 39
11 44 48
31 31 37
51 54 48
12 48 48
32 62 42
52 29 31
13 35 35
33 67 62
53 37 30
14 48 36
34 56 59
54 37 33
15 38 45
35 29 27
55 43 45
16 40 47
36 51 53
56 44 38
17 54 57
37 31 29
57 25 43
18 55 45
38 50 46
58 59 58
19 74 67
39 41 39
59 45 33
20 35 35
40 40 44
60 35 38
p11 = (ncell_11/n);
p12 = (ncell_12/n);
p21 = (ncell_21/n);
p22 = (ncell_22/n);
p1_=(p11+p12); ** row 1 probabilities **;
p2_=(p21+p22); ** row 2 probabilities **;
p_1 = (p11+p21); ** column 1 probabilities **;
p_2 = (p12+p22); ** column 2 probabilities **;
kappa= (pie_obs-pie_exp)/(1-pie_exp); **kappa statistic **;
sumprob=((p1_*p_1*(p1_ + p_1)) + (p2_*p_2*(p2_ + p_2)));
z = (kappa/ stderr1);
Aterm =( p11*(1-(p1_ + p_1)*(1-kappa))**2 + p22*(1-(p2_ + p_2)*(1-kappa))**2);
Bterm=((p12*(p_1 + p2_)**2 + p21*(p_2 + p1_)**2)*(1-kappa)**2);
Cterm=((kappa - pie_exp*(1-kappa))**2);
numABC= (Aterm + Bterm - Cterm);
stderr2 = sqrt(numABC /(n * ((1-pie_exp)**2)));
ci95LLa = (kappa - 1.96*(stderr2));
ci95ULa = (kappa + 1.96*(stderr2));
if field>=42 then field1=1;
if lab>=44 then lab2=1;
if field1=1 and lab2=1 then cell_11=1;
if field1=2 and lab2=1 then cell_12=1;
if field1=1 and lab2=2 then cell_21=1;
if field1=2 and lab2=2 then cell_22=1;
proc univariate; var field lab;
proc freq; table lab2*field1;
proc means noprint;
var cell_11 cell_12 cell_21 cell_22;
output out=results sum= ncell_11 ncell_12 ncell_21 ncell_22;
proc print data=results;
proc print data=results;
var pie_obs pie_exp kappa stderr1 z stderr2 ci95LLa ci95ULa Aterm Bterm Cterm;
SAS output for computation of McNemar z and kappa
Po=0.78333 Pe=0.5 kappa= 0.56667
Kappaz=4.39182 StdErrkappa z=0.10631
Kappa CI95LLA= 0.35830 Kappa CI95ULA= 0.77504
StdErrkappa ATERM = 0.25144 StdErrkappa BTERM=0.040592 StdErrkappa CTERM=0.122