lunes, 21 de octubre de 2013

PS3-13: Re-Identification Risk Associated with ... [Clin Med Res. 2013] - PubMed - NCBI

PS3-13: Re-Identification Risk Associated with ... [Clin Med Res. 2013] - PubMed - NCBI

Clin Med Res. 2013 Sep;11(3):148. doi: 10.3121/cmr.2013.1176.ps3-13.

PS3-13: Re-Identification Risk Associated with Sharing Linked Genomic and Phenotypic Data from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH).

Abstract

Background/Aims It is now understood that conventional de-identification methods such as the HIPAA Safe Harbor standard do not guarantee anonymity of patient records, which may be vulnerable to a variety of attacks aimed at re-identifying confidential information. We present an analytic framework for evaluating these risks quantitatively in order to be able to explicitly balance privacy and scientific utility. As a concrete example, we examine implications for patient privacy of plans to deposit over 70,000 full-genome genotypes and associated clinical data in the dbGaP federally-managed data repository, as a component of a NIH-funded study conducted by the Research Program on Genes, Environment, and Health (RPGEH) at the Kaiser Permanente Northern California Division of Research (KPNC DOR). Risks are examined from multiple perspectives and risk reduction strategies discussed. Methods Two analytic approaches are described: (1) "k-anonymization", which computes risk based only on the distribution of cell sizes in the disclosed dataset; and (2) "k-map" which takes account of the characteristics of potential reference datasets - e.g., voter rolls, disease registries - which may be available to the attacker. Probabilities of re-identification were computed using a random sample of records from actual study participants, and assumed disclosure of the following phenotypic attributes: 5-year age group, sex, race (5 categories) and a set of 22 ICD9-defined common diseases. For method 2, the KPNC EMR was used as a proxy for a highly informative reference dataset. Results The first method tended to yield very conservative estimates of risk: 9.5% of subjects in the disclosed dataset had unique phenotypic attributes, while 18% were in cells of size <5 24="" 2="" 4="" 6="" allow="" and="" assessment="" assumptions="" attack="" be="" both="" can="" cells="" characteristics="" complex="" conclusions="" data="" datasets="" differing="" disclosed="" distinct="" environment.="" estimates="" factoring="" for="" genomic-phenotypic="" however="" in="" information="" involving="" is="" levels="" lower="" made="" method="" of="" p="" perspectives="" potential="" quantitative="" re-identification="" reasonable="" reduction.="" reference="" risk:="" risk="" size="" stakeholders="" strategies="" subjects="" substantially="" suggest="" surrounding="" the="" types="" were="" which="" yielded="">

KEYWORDS:

Confidentiality, Data Sharing, RPGEH

PMID:
24085938
[PubMed - in process]
PMCID:
PMC3788559
Free PMC Article

No hay comentarios: