full-text ►

How Do You Know Which Health Care Effectiveness Research You Can Trust? A Guide to Study Design for the Perplexed

How Do You Know Which Health Care Effectiveness Research You Can Trust? A Guide to Study Design for the Perplexed

EDITOR’S CHOICE — Volume 12 — June 25, 2015

Stephen B. Soumerai, ScD; Douglas Starr, MS; Sumit R. Majumdar, MD, MPH

Suggested citation for this article: Soumerai SB, Starr D, Majumdar SR. How Do You Know Which Health Care Effectiveness Research You Can Trust? A Guide to Study Design for the Perplexed. Prev Chronic Dis 2015;12:150187. DOI: http://dx.doi.org/10.5888/pcd12.150187 .

MEDSCAPE CME

Medscape, LLC is pleased to provide online continuing medical education (CME) for this journal article, allowing clinicians the opportunity to earn CME credit.

This activity has been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education through the joint providership of Medscape, LLC and Preventing Chronic Disease. Medscape, LLC is accredited by the ACCME to provide continuing medical education for physicians.

Medscape, LLC designates this Journal-based CME activity for a maximum of 1 AMA PRA Category 1 Credit(s)™. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

All other clinicians completing this activity will be issued a certificate of participation. To participate in this journal CME activity: (1) review the learning objectives and author disclosures; (2) study the education content; (3) take the post-test with a 75% minimum passing score and complete the evaluation at www.medscape.org/journal/pcd; (4) view/print certificate.

Release date: June 25, 2015; Expiration date: June 25, 2016

Learning Objectives

Upon completion of this activity, participants will be able to:

Define healthy user bias in health care research and means to reduce it
Assess means to reduce selection bias in health care research
Assess how to overcome confounding factors by indication in health care research
Evaluate social desirability bias and history bias in health care research

EDITORS

Ellen Taratus
Editor, Preventing Chronic Disease.
Disclosure: Ellen Taratus has disclosed no relevant financial relationships.

Camille Martin
Editor, Preventing Chronic Disease.
Disclosure: Camille Martin has disclosed no relevant financial relationships.

Jeanne Madden, PhD
Department of Population Medicine, Harvard Medical School, Boston, Massachusetts.
Disclosure: Jeanne Madden, PhD, has disclosed no relevant financial relationships.

CME AUTHOR
Charles P. Vega, MD,
Clinical Professor of Family Medicine, University of California, Irvine
Disclosure: Charles P. Vega, MD, has disclosed the following relevant financial relationships:
Served as an advisor or consultant for: Lundbeck, Inc; McNeil Pharmaceuticals; Takeda Pharmaceuticals North America, Inc.

AUTHORS AND CREDENTIALS
Stephen B. Soumerai, ScD, Professor of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute; Co-chair, Evaluative Sciences and Statistics Concentration of Harvard University's PhD Program in Health Policy, Harvard University, Boston, Massachusetts; Douglas Starr, MS, Co-director of Science Journalism Program at Boston University, Boston University, Boston, Massachusetts; Sumit Majumdar, MD, MPH, FRCPC, Professor of Medicine, Endowed Chair in Patient Health Management, Faculties of Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada

Disclosure: Stephen B. Soumerai, Douglas Starr, and Sumit Majumdar have disclosed no relevant financial relationships.

Editor’s Note: The purpose of this Editor’s Choice article is translational in nature. It is intended to illustrate some of the most common examples of potential study bias to help policy makers, journalists, trainees, and the public understand the strengths and weaknesses of various types of health care research and the kinds of study designs that are most trustworthy. It is neither a comprehensive guide nor a standard research methods article. The authors intend to add to these examples of bias in research designs in future brief and easy-to-understand articles designed to show both the scientific community and the broader population why caution is needed in understanding and accepting the results of research that may have profound and long-lasting effects on health policy and clinical practice.

Evidence is mounting that publication in a peer-reviewed medical journal does not guarantee a study’s validity (1). Many studies of health care effectiveness do not show the cause-and-effect relationships that they claim. They have faulty research designs. Mistaken conclusions later reported in the news media can lead to wrong-headed policies and confusion among policy makers, scientists, and the public. Unfortunately, little guidance exists to help distinguish good study designs from bad ones, the central goal of this article.

There have been major reversals of study findings in recent years. Consider the risks and benefits of postmenopausal hormone replacement therapy (HRT). In the 1950s, epidemiological studies suggested higher doses of HRT might cause harm, particularly cancer of the uterus (2). In subsequent decades, new studies emphasized the many possible benefits of HRT, particularly its protective effects on heart disease — the leading killer of North American women. The uncritical publicity surrounding these studies was so persuasive that by the 1990s, about half the postmenopausal women in the United States were taking HRT, and physicians were chastised for under-prescribing it. Yet in 2003, the largest randomized controlled trial (RCT) of HRT among postmenopausal women found small increases in breast cancer and increased risks of heart attacks and strokes, largely offsetting any benefits such as fracture reduction (3).

The reason these studies contradicted each other had less to do with the effects of HRT than the difference in study designs, particularly whether they included comparable control groups and data on preintervention trends. In the HRT case, health-conscious women who chose to take HRT for health benefits differed from those who did not — for reasons of choice, affordability, or pre-existing good health (4). Thus, although most observational studies showed a “benefit” associated with taking HRT, findings were undermined because the study groups were not comparable. These fundamental nuances were not reported in the news media.

Another pattern in the evolution of science is that early studies of new treatments tend to show the most dramatic, positive health effects, and these effects diminish or disappear as more rigorous and larger studies are conducted (5). As these positive effects decrease, harmful side effects emerge. Yet the exaggerated early studies, which by design tend to inflate benefits and underestimate harms, have the most influence.

Rigorous design is also essential for studying health policies, which essentially are huge real-world experiments (1). Such policies, which may affect tens of millions of people, include insurance plans with very high patient deductible costs or Medicare’s new economic penalties levied against hospitals for “preventable” adverse events (6). We know little about the risks, costs, or benefits of such policies, particularly for the poor and the sick. Indeed, the most credible literature syntheses conducted under the auspices of the international Cochrane Collaboration commonly exclude from evidence 50% to 75% of published studies because they do not meet basic research design standards required to yield trustworthy conclusions (eg, lack of evidence for policies that pay physicians to improve quality of medical care) (7,8).

This article focuses on a fundamental question: which types of health care studies are most trustworthy? That is, which study designs are most immune to the many biases and alternative explanations that may produce unreliable results (9)? The key question is whether the health “effects” of interventions — such as drugs, technologies, or health and safety programs — are different from what would have happened anyway (ie, what happened to a control group). Our analysis is based on more than 75 years of proven research design principles in the social sciences that have been largely ignored in the health sciences (9). These simple principles show what is likely to reduce biases and systematic errors. We will describe weak and strong research designs that attempt to control for these biases. Those examples, illustrated with simple graphics, will emphasize 3 overarching principles:

1. No study is perfect. Even the most rigorous research design can be compromised by inaccurate measures and analysis, unrepresentative populations, or even bad luck (“chance”). But we will show that most problems of bias are caused by weak designs yielding exaggerated effects.

2. “You can’t fix by analysis what you bungled by design” (10). Research design is too often neglected, and strenuous statistical machinations are then needed to “adjust for” irreconcilable differences between study and control groups. We will show that such differences are often more responsible for any differences (effects) than is the health service or policy of interest.

3. Publishing innovative but severely biased studies can do more harm than good. Sometimes researchers may publish overly definitive conclusions using unreliable study designs, reasoning that it is better to have unreliable data than no data at all and that the natural progression of science will eventually sort things out. We do not agree. We will show how single, flawed studies, combined with widespread news media attention and advocacy by special interests, can lead to ineffective or unsafe policies (1).

The case examples in this article describe how some of the most common biases and study designs affect research on important health policies and interventions, such as comparative effectiveness of various medical treatments, cost-containment policies, and health information technology.

The examples include visual illustrations of common biases that compromise a study’s results, weak and strong design alternatives, and the lasting effects of dramatic but flawed early studies. Generally, systematic literature reviews provide more conservative and trustworthy evidence than any single study, and conclusions of such reviews of the broad evidence will also be used to supplement the results of a strongly designed study. Finally, we illustrate the impacts of the studies on the news media, medicine, and policy.

Acknowledgments

This project was supported by a Thomas O. Pyle Fellowship (Dr Soumerai) from the Department of Population Medicine, Harvard Medical School, and Harvard Pilgrim Health Care Institute, Boston; and a grant from the Commonwealth Fund (no. 20120504). Dr Soumerai received grant support from the Centers for Disease Control and Prevention’s Natural Experiments for Translation in Diabetes (NEXT-D). Dr Majumdar receives salary support as a Health Scholar (Alberta Heritage Foundation for Medical Research and Alberta Innovates – Health Solutions) and holds the Endowed Chair in Patient Health Management (Faculties of Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada). We are grateful to Dr Jeanne Madden and Wendy Drobnyk for editorial assistance, Ellen Taratus for outstanding editing of this article, and Caitlin Lupton for her careful analysis of numerous articles and graphic design. The Commonwealth Fund is a national, private foundation in New York City that supports independent research on health care issues and makes grants to improve health care practice and policy. The views presented here are those of the author and not necessarily those of The Commonwealth Fund, its directors, officers or staff.

Top

Author Information

Corresponding Author: Stephen B. Soumerai, ScD, Professor of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, 133 Brookline Ave, 6th Floor, Boston, MA 02215. Telephone: 617-509-9942. Email: ssoumerai@hms.harvard.edu.

Author Affiliations: Douglas Starr, College of Communication, Science Journalism Program, Boston University, Boston, Massachusetts; Sumit R. Majumdar, Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta. Dr Soumerai is also co-chair of the Evaluative Sciences and Statistics Concentration of Harvard University’s PhD Program in Health Policy.

Top

References

Majumdar SR, Soumerai SB. The unhealthy state of health policy research. Health Aff (Millwood) 2009;28(5):w900–8. CrossRef PubMed
Krieger N, Löwy I, Aronowitz R, Bigby J, Dickersin K, Garner E, et al. Hormone replacement therapy, cancer, controversies, and women’s health: historical, epidemiological, biological, clinical, and advocacy perspectives. J Epidemiol Community Health 2005;59(9):740–8. CrossRef PubMed
Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, et al. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med 2003;349(6):523–34. CrossRef PubMed
Humphrey LL, Chan BK, Sox HC. Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Ann Intern Med 2002;137(4):273–84. CrossRef PubMed
Ioannidis JP. Why most published research findings are false. PLoS Med 2005;2(8):e124. CrossRef PubMed
Soumerai SB, Koppel R. An ObamaCare penalty on hospitals. The Wall Street Journal; 2013 May 5. http://online.wsj.com/news/articles/SB10001424127887323741004578418993777612184. Accessed June 11, 2014.
Black AD, Car J, Pagliari C, Anandan C, Cresswell K, Bokun T, et al. The impact of eHealth on the quality and safety of health care: a systematic overview. PLoS Med 2011;8(1):e1000387.CrossRef PubMed
Urquhart C, Currell R, Grant MJ, Hardiker NR. Nursing record systems: effects on nursing practice and healthcare outcomes. Cochrane Database Syst Rev 2009;(1):CD002099. PubMed
Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Belmont (CA): Wadsworth Cengage Learning; 2002.
Light RJ, Singer JD, Willet JB. By design planning research on higher education. Cambridge (MA): Harvard University Press; 1990.
Majumdar SR, McAlister FA, Eurich DT, Padwal RS, Marrie TJ. Statins and outcomes in patients admitted to hospital with community acquired pneumonia: population based prospective cohort study. BMJ 2006;333(7576):999. CrossRef PubMed
Simonsen L, Reichert TA, Viboud C, Blackwelder WC, Taylor RJ, Miller MA. Impact of influenza vaccination on seasonal mortality in the US elderly population. Arch Intern Med 2005;165(3):265–72. CrossRef PubMed
Eurich DT, Marrie TJ, Johnstone J, Majumdar SR. Mortality reduction with influenza vaccine in patients with pneumonia outside “flu” season: pleiotropic benefits or residual confounding? Am J Respir Crit Care Med 2008;178(5):527–33. CrossRef PubMed
Eurich DT, Majumdar SR. Statins and sepsis — scientifically interesting but clinically inconsequential. J Gen Intern Med 2012;27(3):268–9. CrossRef PubMed
Nichol KL, Nordin JD, Nelson DB, Mullooly JP, Hak E. Effectiveness of influenza vaccine in the community-dwelling elderly. N Engl J Med 2007;357(14):1373–81. CrossRef PubMed
Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet 2004;363(9422):1728–31. CrossRef PubMed
Campitelli MA, Rosella LC, Stukel TA, Kwong JC. Influenza vaccination and all-cause mortality in community-dwelling elderly in Ontario, Canada, a cohort study. Vaccine 2010;29(2):240–6.CrossRef PubMed
Hillestad R, Bigelow JH. Health information technology: can HIT lower costs and improve quality? Santa Monica (CA): RAND Corporation 2005.
Soumerai SB, Koppel R. A major glitch for digitized health-care records. The Wall Street Journal; 2012 September 17. http://www.wsj.com/articles/SB10000872396390443847404577627041964831020. Accessed June 9, 2014.
Kellermann AL, Jones SS. What it will take to achieve the as-yet-unfulfilled promises of health information technology. Health Aff (Millwood) 2013;32(1):63–8. CrossRef PubMed
Koppel R, Metlay JP, Cohen A, Abaluck B, Localio AR, Kimmel SE, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA 2005;293(10):1197–203.CrossRef PubMed
Soumerai SB, Avery T. Don’t repeat the UK’s electronic health records failure. Huffington Post; 2010 December 1. http://www.huffingtonpost.com/stephen-soumerai/dont-repeat-the-uks-elect_b_790470.html. Accessed June 9, 2014.
Simon SR, Kaushal R, Cleary PD, Jenter CA, Volk LA, Poon EG, et al. Correlates of electronic health record adoption in office practices: a statewide survey. J Am Med Inform Assoc 2007;14(1):110–7. CrossRef PubMed
Decker SL, Jamoom EW, Sisk JE. Physicians in nonprimary care and small practices and those age 55 and older lag in adopting electronic health record systems. Health Aff (Millwood) 2012;31(5):1108–14. CrossRef PubMed
Koppel R, Majumdar SR, Soumerai SB. Electronic health records and quality of diabetes care. N Engl J Med 2011;365(24):2338–9, author reply 2339. CrossRef PubMed
Cebul RD, Love TE, Jain AK, Hebert CJ. Electronic health records and quality of diabetes care. N Engl J Med 2011;365(9):825–33. CrossRef PubMed
Cochrane Reviews. London (UK): Cochrane; 2013. http://www.cochrane.org/search/site/cochrane%20reviews. Accessed June 10, 2014.
Cummings SR, Coates TJ, Richard RJ, Hansen B, Zahnd EG, VanderMartin R, et al. Training physicians in counseling about smoking cessation. A randomized trial of the “Quit for Life” program. Ann Intern Med 1989;110(8):640–7. CrossRef PubMed
Electronic health records and quality of diabetes care 2011. Princeton (NJ): Robert Wood Johnson Foundation; 2011. http://www.rwjf.org/en/research-publications/find-rwjf-research/2011/09/electronic-health-records-and-quality-of-diabetes-care.html. Accessed June 9, 2014.
Gurwitz JH, Field TS, Rochon P, Judge J, Harrold LR, Bell CM, et al. Effect of computerized provider order entry with clinical decision support on adverse drug events in the long-term care setting. J Am Geriatr Soc 2008;56(12):2225–33. CrossRef PubMed
Singh H, Spitzmueller C, Petersen NJ, Sawhney MK, Sittig DF. Information overload and missed test results in electronic health record-based settings. JAMA Intern Med 2013;173(8):702–4.CrossRef PubMed
Ray WA, Griffin MR, Schaffner W, Baugh DK, Melton LJ 3d. Psychotropic drug use and the risk of hip fracture. N Engl J Med 1987;316(7):363–9. CrossRef PubMed
Wagner AK, Ross-Degnan D, Gurwitz JH, Zhang F, Gilden DB, Cosler L, et al. Effect of New York State regulatory action on benzodiazepine prescribing and hip fracture rates. Ann Intern Med 2007;146(2):96–103. CrossRef PubMed
Brauer CA, Coca-Perraillon M, Cutler DM, Rosen AB. Incidence and mortality of hip fractures in the United States. JAMA 2009;302(14):1573–9. CrossRef PubMed
Luijendijk HJ, Tiemeier H, Hofman A, Heeringa J, Stricker BH. Determinants of chronic benzodiazepine use in the elderly: a longitudinal study. Br J Clin Pharmacol 2008;65(4):593–9.CrossRef PubMed
Hartikainen S, Rahkonen T, Kautiainen H, Sulkava R. Use of psychotropics among home-dwelling nondemented and demented elderly. Int J Geriatr Psychiatry 2003;18(12):1135–41.CrossRef PubMed
Wagner AK, Zhang F, Soumerai SB, Walker AM, Gurwitz JH, Glynn RJ, et al. Benzodiazepine use and hip fractures in the elderly: who is at greatest risk? Arch Intern Med 2004;164(14):1567–72. CrossRef PubMed
Hebert JR, Clemow L, Pbert L, Ockene IS, Ockene JK. Social desirability bias in dietary self-report may compromise the validity of dietary intake measures. Int J Epidemiol 1995;24(2):389–98.CrossRef PubMed
Adams AS, Soumerai SB, Lomas J, Ross-Degnan D. Evidence of self-report bias in assessing adherence to guidelines. Int J Qual Health Care 1999;11(3):187–92. CrossRef PubMed
Taveras EM, Gortmaker SL, Hohman KH, Horan CM, Kleinman KP, Mitchell K, et al. Randomized controlled trial to improve primary care to prevent and manage childhood obesity: the High Five for Kids study. Arch Pediatr Adolesc Med 2011;165(8):714–22. CrossRef PubMed
Bryant MJ, Lucove JC, Evenson KR, Marshall S. Measurement of television viewing in children and adolescents: a systematic review. Obes Rev 2007;8(3):197–209. CrossRef PubMed
Epstein LH, Roemmich JN, Robinson JL, Paluch RA, Winiewicz DD, Fuerch JH, et al. A randomized trial of the effects of reducing television viewing and computer use on body mass index in young children. Arch Pediatr Adolesc Med 2008;162(3):239–45. CrossRef PubMed
Soumerai SB, McLaughlin TJ, Gurwitz JH, Guadagnoli E, Hauptman PJ, Borbas C, et al. Effect of local medical opinion leaders on quality of care for acute myocardial infarction: a randomized controlled trial. JAMA 1998;279(17):1358–63. CrossRef PubMed
Committee on Quality of Health Care in America. Institute of Medicine. To err is human: building a safer health system. Washington (DC): National Academies Press; 2002.
Pryor D, Hendrich A, Henkel RJ, Beckmann JK, Tersigni AR. The quality ‘journey’ at Ascension Health: how we’ve prevented at least 1,500 avoidable deaths a year — and aim to do even better. Health Aff (Millwood) 2011;30(4):604–11. CrossRef PubMed
Berwick DM, Hackbarth AD, McCannon CJ. IHI replies to The 100,000 Lives Campaign: a scientific and policy review. Jt Comm J Qual Patient Saf 2006;32(11):628–30, 631–3. PubMed
Wachter RM, Pronovost PJ. The 100,000 Lives Campaign: a scientific and policy review. Jt Comm J Qual Patient Saf 2006;32(11):621–7. PubMed
Agency for Healthcare Research and Quality. Statistics on hospital stays. http://hcupnet.ahrq.gov/HCUPnet.jsp?Id=538E72CAE528AF2E&Form=DispTab&JS=Y&Action=%3E%3ENext%3E%3E&__InDispTab=Yes&_Results=&SortOpt=&_Results3=OldWeight. Accessed May 26, 2015.
Ioannidis JP, Prasad V. Evaluating health system processes with randomized controlled trials. JAMA Intern Med 2013;173(14):1279–80. CrossRef PubMed
Ackermann RT, Kenrik Duru O, Albu JB, Schmittdiel JA, Soumerai SB, Wharam JF, et al. Evaluating diabetes health policies using natural experiments: the natural experiments for translation in diabetes study. Am J Prev Med 2015;48(6):747–54. CrossRef PubMed