Generalized Cochran Mantel Haenszel test for multilevel correlated categorical data: an algorithm and R function


  • D.B.U.S. De Silva
  • M.R. Sooriyarachchi


Algorithm, clustered data, generalized Cochran Mantel Haenszel (GCMH) test, multilevel correlated categorical data, R functions


 Multilevel data are a commonly encountered
phenomenon in many data structures. Modelling such data
requires careful consideration of the association between
underlying variables at each level of the data structure. This
requires the use of effective univariate techniques prior to
modelling. However, currently no univariate tests are used to
handle this situation. This paper presents the modification and
novel application of a test developed by Zhang and Boos for
testing the association between categorical variables measured
on clusters of observations, for examining initial association in
a multilevel framework. Zhang and Boos have used a
SAS/IML programme (unpublished) for performing their test.
This paper presents an R function for the application of the
test, which will be freely available to users, since R is an open
source software. The function is tested on a dataset from the
medical field pertaining to respiratory disease severity of
patients, attending several different clinics. The explanatory
variables pertaining to this study are Age, Gender, Duration
and Symptom, while the response variable indicating the
severity of the diagnosis made is termed Diagnosis. The
results indicate that when the experimental units show low
levels of correlation within clusters with respect to a particular
explanatory variable, the test performs similarly to the
Standard Cochran Mantel Haenszel (CMH) test. When the
corresponding correlation is high, the Generalized CMH
(GCMH) test results in a smaller p-value than the Standard
CMH test. Of the four variables, only Symptom and Duration
are significant with respect to association with Diagnosis.


