Skip to Content
COVID-19 Resources
Cancer Diagnosis Program
Contact NExT
Show menu
Search this site
Last Updated: 06/14/21

Data Sharing: Privacy and Confidentiality

Human specimen collections often contain links to patient identities and other personal information. The privacy and confidentiality of personal information associated with human specimens, including electronic medical records and genomic data, raise important ethical and regulatory considerations.

Under the revised Common Rule (45 CFR 46), if an individual’s identity cannot “readily be ascertained or associated” with biospecimens or information that are obtained, used, studied, analyzed, or generated by researchers, then the research does not meet the regulatory definition of “human subject” and therefore does not require IRB review or informed consent. And under the federal Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA), researchers can access and share data without authorization so long as 18 specified identifiers (such as name, SSN, medical record number, dates, etc.) are removed, or if the data has been otherwise de-identified in accordance with a formal determination by a qualified expert. For more information, please visit this NIH website on HIPAA and research repositories, or more generally this NIH website on HIPAA and research.

Yet, even when individual identifiers are removed from specimens or associated data, the accessibility of linkable data in today’s highly networked culture can be ethically problematic. There is growing concern about the ability to identify individuals from information stored in pooled group level databases, and from matched samples.

Next generation sequencing technologies are increasingly employed in cancer research, and large databases have been developed linking genome data with disease risk in Genome Wide Association Studies (“GWAS”). The accumulation of potentially re-identifiable data from GWAS creates added privacy risks for research participants. In 2012, the NCI hosted a think tank concerning the identifiability of biospecimens and “omic” data to explore challenges surrounding this complex and multifaceted topic. The publication that came from the workshop can be found here.

To promote robust sharing of genomic data while simultaneously providing both transparency and appropriate protections to individuals whose data is collected, stored, and disseminated to researchers, the NIH implemented a Genomic Data Sharing Policy (GDS) effective January 20, 2015.

The GDS Policy applies to all NIH-funded research that generates large-scale human or nonhuman genomic data as well as the use of these data for subsequent research. NIH expects all funded investigators to adhere to the GDS Policy, and compliance with this Policy will become a special term and condition in the Notice of Award or the Contract Award.

For more information about this policy, please visit the links below:

The NIH has also made changes to its policy for issuing Certificates of Confidentiality, effective October 1st, 2017. For more information on Certificates of Confidentiality, see the NIH Central Resource for Grants and Funding information page here.

Respect for and protection of the interests of research participants are fundamental to NIH’s stewardship of human genomic data. The informed consent under which data or samples are collected is the basis of determination for:

  • the appropriateness of data submission to NIH-designated data repositories, and
  • whether the data should be available through unrestricted or controlled access.

Controlled-access data in NIH-designated data repositories are made available for secondary research only after investigators have obtained approval from an NIH data access committee to use the requested data for a particular project. Data in unrestricted-access repositories are publicly available to anyone.