How Do You Know Which Health Care Effectiveness Research You Can Trust? A Guide to Study Design for the Perplexed
How Do You Know Which Health Care Effectiveness Research You Can Trust? A Guide to Study Design for the Perplexed
EDITOR’S CHOICE — Volume 12 — June 25, 2015
Stephen B. Soumerai, ScD; Douglas Starr, MS; Sumit R. Majumdar, MD, MPH
Suggested citation for this article: Soumerai SB, Starr D, Majumdar SR. How Do You Know Which Health Care Effectiveness Research You Can Trust? A Guide to Study Design for the Perplexed. Prev Chronic Dis 2015;12:150187. DOI: http://dx.doi.org/10.5888/pcd12.150187.
Evidence is mounting that publication in a peer-reviewed medical journal does not guarantee a study’s validity (1). Many studies of health care effectiveness do not show the cause-and-effect relationships that they claim. They have faulty research designs. Mistaken conclusions later reported in the news media can lead to wrong-headed policies and confusion among policy makers, scientists, and the public. Unfortunately, little guidance exists to help distinguish good study designs from bad ones, the central goal of this article.
There have been major reversals of study findings in recent years. Consider the risks and benefits of postmenopausal hormone replacement therapy (HRT). In the 1950s, epidemiological studies suggested higher doses of HRT might cause harm, particularly cancer of the uterus (2). In subsequent decades, new studies emphasized the many possible benefits of HRT, particularly its protective effects on heart disease — the leading killer of North American women. The uncritical publicity surrounding these studies was so persuasive that by the 1990s, about half the postmenopausal women in the United States were taking HRT, and physicians were chastised for under-prescribing it. Yet in 2003, the largest randomized controlled trial (RCT) of HRT among postmenopausal women found small increases in breast cancer and increased risks of heart attacks and strokes, largely offsetting any benefits such as fracture reduction (3).
The reason these studies contradicted each other had less to do with the effects of HRT than the difference in study designs, particularly whether they included comparable control groups and data on preintervention trends. In the HRT case, health-conscious women who chose to take HRT for health benefits differed from those who did not — for reasons of choice, affordability, or pre-existing good health (4). Thus, although most observational studies showed a “benefit” associated with taking HRT, findings were undermined because the study groups were not comparable. These fundamental nuances were not reported in the news media.
Another pattern in the evolution of science is that early studies of new treatments tend to show the most dramatic, positive health effects, and these effects diminish or disappear as more rigorous and larger studies are conducted (5). As these positive effects decrease, harmful side effects emerge. Yet the exaggerated early studies, which by design tend to inflate benefits and underestimate harms, have the most influence.
Rigorous design is also essential for studying health policies, which essentially are huge real-world experiments (1). Such policies, which may affect tens of millions of people, include insurance plans with very high patient deductible costs or Medicare’s new economic penalties levied against hospitals for “preventable” adverse events (6). We know little about the risks, costs, or benefits of such policies, particularly for the poor and the sick. Indeed, the most credible literature syntheses conducted under the auspices of the international Cochrane Collaboration commonly exclude from evidence 50% to 75% of published studies because they do not meet basic research design standards required to yield trustworthy conclusions (eg, lack of evidence for policies that pay physicians to improve quality of medical care) (7,8).
This article focuses on a fundamental question: which types of health care studies are most trustworthy? That is, which study designs are most immune to the many biases and alternative explanations that may produce unreliable results (9)? The key question is whether the health “effects” of interventions — such as drugs, technologies, or health and safety programs — are different from what would have happened anyway (ie, what happened to a control group). Our analysis is based on more than 75 years of proven research design principles in the social sciences that have been largely ignored in the health sciences (9). These simple principles show what is likely to reduce biases and systematic errors. We will describe weak and strong research designs that attempt to control for these biases. Those examples, illustrated with simple graphics, will emphasize 3 overarching principles:
1. No study is perfect. Even the most rigorous research design can be compromised by inaccurate measures and analysis, unrepresentative populations, or even bad luck (“chance”). But we will show that most problems of bias are caused by weak designs yielding exaggerated effects.
2. “You can’t fix by analysis what you bungled by design” (10). Research design is too often neglected, and strenuous statistical machinations are then needed to “adjust for” irreconcilable differences between study and control groups. We will show that such differences are often more responsible for any differences (effects) than is the health service or policy of interest.
3. Publishing innovative but severely biased studies can do more harm than good. Sometimes researchers may publish overly definitive conclusions using unreliable study designs, reasoning that it is better to have unreliable data than no data at all and that the natural progression of science will eventually sort things out. We do not agree. We will show how single, flawed studies, combined with widespread news media attention and advocacy by special interests, can lead to ineffective or unsafe policies (1).
The case examples in this article describe how some of the most common biases and study designs affect research on important health policies and interventions, such as comparative effectiveness of various medical treatments, cost-containment policies, and health information technology.
The examples include visual illustrations of common biases that compromise a study’s results, weak and strong design alternatives, and the lasting effects of dramatic but flawed early studies. Generally, systematic literature reviews provide more conservative and trustworthy evidence than any single study, and conclusions of such reviews of the broad evidence will also be used to supplement the results of a strongly designed study. Finally, we illustrate the impacts of the studies on the news media, medicine, and policy.
This project was supported by a Thomas O. Pyle Fellowship (Dr Soumerai) from the Department of Population Medicine, Harvard Medical School, and Harvard Pilgrim Health Care Institute, Boston; and a grant from the Commonwealth Fund (no. 20120504). Dr Soumerai received grant support from the Centers for Disease Control and Prevention’s Natural Experiments for Translation in Diabetes (NEXT-D). Dr Majumdar receives salary support as a Health Scholar (Alberta Heritage Foundation for Medical Research and Alberta Innovates – Health Solutions) and holds the Endowed Chair in Patient Health Management (Faculties of Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada). We are grateful to Dr Jeanne Madden and Wendy Drobnyk for editorial assistance, Ellen Taratus for outstanding editing of this article, and Caitlin Lupton for her careful analysis of numerous articles and graphic design. The Commonwealth Fund is a national, private foundation in New York City that supports independent research on health care issues and makes grants to improve health care practice and policy. The views presented here are those of the author and not necessarily those of The Commonwealth Fund, its directors, officers or staff.
Corresponding Author: Stephen B. Soumerai, ScD, Professor of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, 133 Brookline Ave, 6th Floor, Boston, MA 02215. Telephone: 617-509-9942. Email: firstname.lastname@example.org.
Author Affiliations: Douglas Starr, College of Communication, Science Journalism Program, Boston University, Boston, Massachusetts; Sumit R. Majumdar, Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta. Dr Soumerai is also co-chair of the Evaluative Sciences and Statistics Concentration of Harvard University’s PhD Program in Health Policy.
- Majumdar SR, Soumerai SB. The unhealthy state of health policy research. Health Aff (Millwood) 2009;28(5):w900–8. CrossRef PubMed
- Krieger N, Löwy I, Aronowitz R, Bigby J, Dickersin K, Garner E, et al. Hormone replacement therapy, cancer, controversies, and women’s health: historical, epidemiological, biological, clinical, and advocacy perspectives. J Epidemiol Community Health 2005;59(9):740–8. CrossRef PubMed
- Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, et al. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med 2003;349(6):523–34. CrossRefPubMed
- Humphrey LL, Chan BK, Sox HC. Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Ann Intern Med 2002;137(4):273–84. CrossRefPubMed
- Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2(8):e124. CrossRef PubMed
- Soumerai SB, Koppel R. An ObamaCare penalty on hospitals. The Wall Street Journal; 2013 May 5. http://online.wsj.com/news/articles/SB10001424127887323741004578418993777612184. Accessed June 11, 2014.
- Black AD, Car J, Pagliari C, Anandan C, Cresswell K, Bokun T, et al. The impact of eHealth on the quality and safety of health care: a systematic overview. PLoS Med 2011;8(1):e1000387.CrossRef PubMed
- Urquhart C, Currell R, Grant MJ, Hardiker NR. Nursing record systems: effects on nursing practice and healthcare outcomes. Cochrane Database Syst Rev 2009;(1):CD002099. PubMed
- Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Belmont (CA): Wadsworth Cengage Learning; 2002.
- Light RJ, Singer JD, Willet JB. By design planning research on higher education. Cambridge (MA): Harvard University Press; 1990.
- Majumdar SR, McAlister FA, Eurich DT, Padwal RS, Marrie TJ. Statins and outcomes in patients admitted to hospital with community acquired pneumonia: population based prospective cohort study. BMJ 2006;333(7576):999. CrossRef PubMed
- Simonsen L, Reichert TA, Viboud C, Blackwelder WC, Taylor RJ, Miller MA. Impact of influenza vaccination on seasonal mortality in the US elderly population. Arch Intern Med 2005;165(3):265–72. CrossRef PubMed
- Eurich DT, Marrie TJ, Johnstone J, Majumdar SR. Mortality reduction with influenza vaccine in patients with pneumonia outside “flu” season: pleiotropic benefits or residual confounding? Am J Respir Crit Care Med 2008;178(5):527–33. CrossRef PubMed
- Eurich DT, Majumdar SR. Statins and sepsis — scientifically interesting but clinically inconsequential. J Gen Intern Med 2012;27(3):268–9. CrossRef PubMed
- Nichol KL, Nordin JD, Nelson DB, Mullooly JP, Hak E. Effectiveness of influenza vaccine in the community-dwelling elderly. N Engl J Med 2007;357(14):1373–81. CrossRef PubMed
- Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet 2004;363(9422):1728–31. CrossRef PubMed
- Campitelli MA, Rosella LC, Stukel TA, Kwong JC. Influenza vaccination and all-cause mortality in community-dwelling elderly in Ontario, Canada, a cohort study. Vaccine 2010;29(2):240–6.CrossRef PubMed
- Hillestad R, Bigelow JH. Health information technology: can HIT lower costs and improve quality? Santa Monica (CA): RAND Corporation 2005.
- Soumerai SB, Koppel R. A major glitch for digitized health-care records. The Wall Street Journal; 2012 September 17. http://www.wsj.com/articles/SB10000872396390443847404577627041964831020. Accessed June 9, 2014.
- Kellermann AL, Jones SS. What it will take to achieve the as-yet-unfulfilled promises of health information technology. Health Aff (Millwood) 2013;32(1):63–8. CrossRef PubMed
- Koppel R, Metlay JP, Cohen A, Abaluck B, Localio AR, Kimmel SE, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA 2005;293(10):1197–203.CrossRef PubMed
- Soumerai SB, Avery T. Don’t repeat the UK’s electronic health records failure. Huffington Post; 2010 December 1. http://www.huffingtonpost.com/stephen-soumerai/dont-repeat-the-uks-elect_b_790470.html. Accessed June 9, 2014.
- Simon SR, Kaushal R, Cleary PD, Jenter CA, Volk LA, Poon EG, et al. Correlates of electronic health record adoption in office practices: a statewide survey. J Am Med Inform Assoc 2007;14(1):110–7. CrossRef PubMed
- Decker SL, Jamoom EW, Sisk JE. Physicians in nonprimary care and small practices and those age 55 and older lag in adopting electronic health record systems. Health Aff (Millwood) 2012;31(5):1108–14. CrossRef PubMed
- Koppel R, Majumdar SR, Soumerai SB. Electronic health records and quality of diabetes care. N Engl J Med 2011;365(24):2338–9, author reply 2339. CrossRef PubMed
- Cebul RD, Love TE, Jain AK, Hebert CJ. Electronic health records and quality of diabetes care. N Engl J Med 2011;365(9):825–33. CrossRef PubMed
- Cochrane Reviews. London (UK): Cochrane; 2013. http://www.cochrane.org/search/site/cochrane%20reviews. Accessed June 10, 2014.
- Cummings SR, Coates TJ, Richard RJ, Hansen B, Zahnd EG, VanderMartin R, et al. Training physicians in counseling about smoking cessation. A randomized trial of the “Quit for Life” program. Ann Intern Med 1989;110(8):640–7. CrossRef PubMed
- Electronic health records and quality of diabetes care 2011. Princeton (NJ): Robert Wood Johnson Foundation; 2011. http://www.rwjf.org/en/research-publications/find-rwjf-research/2011/09/electronic-health-records-and-quality-of-diabetes-care.html. Accessed June 9, 2014.
- Gurwitz JH, Field TS, Rochon P, Judge J, Harrold LR, Bell CM, et al. Effect of computerized provider order entry with clinical decision support on adverse drug events in the long-term care setting. J Am Geriatr Soc 2008;56(12):2225–33. CrossRef PubMed
- Singh H, Spitzmueller C, Petersen NJ, Sawhney MK, Sittig DF. Information overload and missed test results in electronic health record-based settings. JAMA Intern Med 2013;173(8):702–4.CrossRef PubMed
- Ray WA, Griffin MR, Schaffner W, Baugh DK, Melton LJ 3d. Psychotropic drug use and the risk of hip fracture. N Engl J Med 1987;316(7):363–9. CrossRef PubMed
- Wagner AK, Ross-Degnan D, Gurwitz JH, Zhang F, Gilden DB, Cosler L, et al. Effect of New York State regulatory action on benzodiazepine prescribing and hip fracture rates. Ann Intern Med 2007;146(2):96–103. CrossRef PubMed
- Brauer CA, Coca-Perraillon M, Cutler DM, Rosen AB. Incidence and mortality of hip fractures in the United States. JAMA 2009;302(14):1573–9. CrossRef PubMed
- Luijendijk HJ, Tiemeier H, Hofman A, Heeringa J, Stricker BH. Determinants of chronic benzodiazepine use in the elderly: a longitudinal study. Br J Clin Pharmacol 2008;65(4):593–9.CrossRef PubMed
- Hartikainen S, Rahkonen T, Kautiainen H, Sulkava R. Use of psychotropics among home-dwelling nondemented and demented elderly. Int J Geriatr Psychiatry 2003;18(12):1135–41.CrossRef PubMed
- Wagner AK, Zhang F, Soumerai SB, Walker AM, Gurwitz JH, Glynn RJ, et al. Benzodiazepine use and hip fractures in the elderly: who is at greatest risk? Arch Intern Med 2004;164(14):1567–72. CrossRef PubMed
- Hebert JR, Clemow L, Pbert L, Ockene IS, Ockene JK. Social desirability bias in dietary self-report may compromise the validity of dietary intake measures. Int J Epidemiol 1995;24(2):389–98.CrossRef PubMed
- Adams AS, Soumerai SB, Lomas J, Ross-Degnan D. Evidence of self-report bias in assessing adherence to guidelines. Int J Qual Health Care 1999;11(3):187–92. CrossRef PubMed
- Taveras EM, Gortmaker SL, Hohman KH, Horan CM, Kleinman KP, Mitchell K, et al. Randomized controlled trial to improve primary care to prevent and manage childhood obesity: the High Five for Kids study. Arch Pediatr Adolesc Med 2011;165(8):714–22. CrossRef PubMed
- Bryant MJ, Lucove JC, Evenson KR, Marshall S. Measurement of television viewing in children and adolescents: a systematic review. Obes Rev 2007;8(3):197–209. CrossRef PubMed
- Epstein LH, Roemmich JN, Robinson JL, Paluch RA, Winiewicz DD, Fuerch JH, et al. A randomized trial of the effects of reducing television viewing and computer use on body mass index in young children. Arch Pediatr Adolesc Med 2008;162(3):239–45. CrossRef PubMed
- Soumerai SB, McLaughlin TJ, Gurwitz JH, Guadagnoli E, Hauptman PJ, Borbas C, et al. Effect of local medical opinion leaders on quality of care for acute myocardial infarction: a randomized controlled trial. JAMA 1998;279(17):1358–63. CrossRef PubMed
- Committee on Quality of Health Care in America. Institute of Medicine. To err is human: building a safer health system. Washington (DC): National Academies Press; 2002.
- Pryor D, Hendrich A, Henkel RJ, Beckmann JK, Tersigni AR. The quality ‘journey’ at Ascension Health: how we’ve prevented at least 1,500 avoidable deaths a year — and aim to do even better. Health Aff (Millwood) 2011;30(4):604–11. CrossRef PubMed
- Berwick DM, Hackbarth AD, McCannon CJ. IHI replies to The 100,000 Lives Campaign: a scientific and policy review. Jt Comm J Qual Patient Saf 2006;32(11):628–30, 631–3. PubMed
- Wachter RM, Pronovost PJ. The 100,000 Lives Campaign: a scientific and policy review. Jt Comm J Qual Patient Saf 2006;32(11):621–7. PubMed
- Agency for Healthcare Research and Quality. Statistics on hospital stays. http://hcupnet.ahrq.gov/HCUPnet.jsp?Id=538E72CAE528AF2E&Form=DispTab&JS=Y&Action=%3E%3ENext%3E%3E&__InDispTab=Yes&_Results=&SortOpt=&_Results3=OldWeight. Accessed May 26, 2015.
- Ioannidis JP, Prasad V. Evaluating health system processes with randomized controlled trials. JAMA Intern Med 2013;173(14):1279–80. CrossRef PubMed
- Ackermann RT, Kenrik Duru O, Albu JB, Schmittdiel JA, Soumerai SB, Wharam JF, et al. Evaluating diabetes health policies using natural experiments: the natural experiments for translation in diabetes study. Am J Prev Med 2015;48(6):747–54. CrossRef PubMed