Preventing Chronic Disease | Models for Count Data With an Application to Healthy Days Measures: Are You Driving in Screws With a Hammer? - CDC
Models for Count Data With an Application to Healthy Days Measures: Are You Driving in Screws With a Hammer?
Hong Zhou, MS, MPH; Paul Z. Siegel, MD, MPH; John Barile, PhD; Rashid S. Njai, PhD; William W. Thompson, PhD; Charlotte Kent, PhD; Youlian Liao, MD
Suggested citation for this article: Zhou H, Siegel PZ, Barile J, Njai RS, Thompson WW, Kent C, et al. Models for Count Data With an Application to Healthy Days Measures: Are You Driving in Screws With a Hammer? Prev Chronic Dis 2014;11:130252. DOI: http://dx.doi.org/10.5888/pcd11.130252.
PEER REVIEWED
Abstract
Introduction
Count data are often collected in chronic disease research, and sometimes these data have a skewed distribution. The number of unhealthy days reported in the Behavioral Risk Factor Surveillance System (BRFSS) is an example of such data: most respondents report zero days. Studies have either categorized the Healthy Days measure or used linear regression models. We used alternative regression models for these count data and examined the effect on statistical inference.
Count data are often collected in chronic disease research, and sometimes these data have a skewed distribution. The number of unhealthy days reported in the Behavioral Risk Factor Surveillance System (BRFSS) is an example of such data: most respondents report zero days. Studies have either categorized the Healthy Days measure or used linear regression models. We used alternative regression models for these count data and examined the effect on statistical inference.
Methods
Using responses from participants aged 35 years or older from 12 states that included a homeownership question in their 2009 BRFSS, we compared 5 multivariate regression models — logistic, linear, Poisson, negative binomial, and zero-inflated negative binomial — with respect to 1) how well the modeled data fit the observed data and 2) how model selections affect inferences.
Using responses from participants aged 35 years or older from 12 states that included a homeownership question in their 2009 BRFSS, we compared 5 multivariate regression models — logistic, linear, Poisson, negative binomial, and zero-inflated negative binomial — with respect to 1) how well the modeled data fit the observed data and 2) how model selections affect inferences.
Results
Most respondents (66.8%) reported zero mentally unhealthy days. The distribution was highly skewed (variance = 58.7, mean = 3.3 d). Zero-inflated negative binomial regression provided the best-fitting model, followed by negative binomial regression. A significant independent association between homeownership and number of mentally unhealthy days was not found in the logistic, linear, or Poisson regression model but was found in the negative binomial model. The zero-inflated negative binomial model showed that homeowners were 24% more likely than nonhomeowners to have excess zero mentally unhealthy days (adjusted odds ratio, 1.24; 95% confidence interval, 1.08–1.43), but it did not show an association between homeownership and the number of unhealthy days.
Most respondents (66.8%) reported zero mentally unhealthy days. The distribution was highly skewed (variance = 58.7, mean = 3.3 d). Zero-inflated negative binomial regression provided the best-fitting model, followed by negative binomial regression. A significant independent association between homeownership and number of mentally unhealthy days was not found in the logistic, linear, or Poisson regression model but was found in the negative binomial model. The zero-inflated negative binomial model showed that homeowners were 24% more likely than nonhomeowners to have excess zero mentally unhealthy days (adjusted odds ratio, 1.24; 95% confidence interval, 1.08–1.43), but it did not show an association between homeownership and the number of unhealthy days.
Conclusion
Our comparison of regression models indicates the importance of examining data distribution and selecting models with appropriate assumptions. Otherwise, statistical inferences might be misleading.
Our comparison of regression models indicates the importance of examining data distribution and selecting models with appropriate assumptions. Otherwise, statistical inferences might be misleading.
Author Information
Corresponding Author: Hong Zhou, MS, MPH, Division of Health Informatics and Surveillance, Center for Surveillance, Epidemiology and Laboratory Services, Centers for Disease Control and Prevention, 1600 Clifton Rd NE, Mailstop E91, Atlanta, GA 30333. Telephone: 404-498-6293. E-mail: HZhou1@cdc.gov.
Author Affiliations: Paul Z. Siegel, Rashid S. Njai, Charlotte Kent, Youlian Liao, William W. Thompson, Centers for Disease Control and Prevention, Atlanta, Georgia; John Barile, University of Hawaii at Manoa, Manoa, Hawaii.
No hay comentarios:
Publicar un comentario