Kong X, Gray RH, Moulton LH, Wawer M, Wang MC
Stat Med. 2010 Sep 13. [Epub aheadof print] PMID: 20839368. PMCID:PMC2991598
Abstract
Human papillomavirus (HPV) infection is a common sexually transmitted disease of growing public health importance, and over 40 genotypes have been identified in genital infections. Current HPV cohort studies often follow participants at pre-determined visits, such as every 6 months, and data generated from such epidemiology studies can be described as clustered longitudinal binary data where correlation arises in two ways: the directionless clustering due to the multiple genotypes tested within an individual, and the temporal correlation among the repeated measurements on the same genotype along time. Current analyses for identification of risk factors associated with HPV incidence and persistence often either do not fully utilize information in the data set or ignore the correlation between the multiple genotypes. Given the scientific definition of incidence and persistence, conditional probability modeling provides us a natural mathematical tool. We thus present a semi-parametric regression model for such data where full specification of the joint multivariate binary distribution is avoided by using conditioning argument to handle the temporal correlation and GEE to account for the correlation between the multiple genotypes. The model is applied to the HPV data from the Rakai male circumcision (MC) trial to evaluate the as-treated efficacy of MC and also identify modifiable risk factors for incidence and persistence of oncogenic HPV types in men. A simulation study is performed to provide empirical information on the number of individuals that is needed for satisfactory power and estimation accuracy of the association parameter estimates in future studies.