Data, Data Everywhere...: Improving access to population health and health services research data in Canada
Charlyn Black and Kimberlyn McGrail, CHSPR, UBC
Cathy Fooks, Patricia Baranek
and Lisa Maslove,
Health Network, CPRN
The Canadian Policy Research Networks
Our mission is to create knowledge and lead public debate on social and economic issues important to the well-being of Canadians. Our goal is to help make Canada a more just, prosperous and caring society.
CPRN’s trademark is its ability to help policymakers and citizens debate the beliefs, values, frameworks, policies, programs, and “ways of doing” that will help the country to cope with social and economic transformation.
CPRN fosters integration in a world that is increasingly fragmented by discipline, jurisdiction, language, and culture. It has unique process skills for shared learning, which shape the way research is performed and the way the results are communicated. It is a neutral space, where diverse groups of people can reflect, collaborate, and struggle with their differences in order to arrive at new understandings and to identify common ground.
CPRN’s leaders are dedicated to generating constructive suggestions, based on strong analysis and a pragmatic understanding of what is possible in an imperfect world.
CPRN is independent. It is a non-profit organization with charitable status. It acquires its funding from diverse sources—federal and provincial governments, foundations, and corporations. This diversity ensures that no single voice dominates the research. The Board of Directors ensures good stewardship of these resources.
CPRN is cost-effective. Projects are ambitious in their scope, but costs and risks are spread across a number of funders. Overheads are minimized and start-up times are limited by attracting expertise from universities, think tanks, and other organizations. Dozens of people volunteer their time to participate in the governance and the research process.
The Centre for Health Services and Policy Research
The Centre for Health Services and Policy Research (CHSPR) is an independent research centre based at the University of British Columbia. CHSPR’s mission is to stimulate scientific enquiry into issues of health in population groups, and ways in which health services can best be organized, funded and delivered. Our researchers carry out a diverse program of applied health services and population health research under this agenda.
CHSPR aims to contribute to the improvement of population health by ensuring our research is relevant to contemporary health policy concerns and by working closely with decision makers to actively translate research findings into policy options. Our researchers are active participants in many policy-making forums and provide advice and assistance to both government and non-government organizations in BC, Canada and abroad.
CHSPR receives core funding from the BC Ministry of Health Services to support research with a direct role in informing policy decision-making and evaluating health care reform, and to enable the ongoing development of the BC Linked Health Database. Our researchers are also funded by competitive external grants from provincial, national, and international funding agencies.
Much of CHSPR’s research is made possible through the BC Linked Health Database, a valuable resource of data relating to the encounters of BC residents with various health care and other systems in the province. These data are used in an anonymized form for applied health services and population health research deemed to be in the public interest.
CHSPR has developed strict policies and procedures to protect the confidentiality and security of these data holdings and fully complies with all legislative acts governing the protection and use of sensitive information. CHSPR has over 30 years of experience in handling data from the BC Ministry of Health and other professional bodies, and acts as the access point for researchers wishing to use these data for research in the public interest.
This project was sponsored by the Canadian Institutes of Health Research (CIHR)—Institutes of Population and Public Health, Health Services and Policy Research, Aboriginal Peoples’ Health, Aging, Cancer Research, Circulatory and Respiratory Health, Gender and Health, Genetics, Infection and Immunity, Musculoskeletal Health and Arthritis, Neurosciences, Mental Health and Addiction, and Nutrition, Metabolism and Diabetes—with the Canadian Population Health Initiative (a part of the Canadian Institute for Health Information), Health Canada’s Centre for Surveillance Coordination and Statistics Canada. CIHR’s Institute of Population and Public Health, located at the University of Toronto, served as the administrative lead organization for this project.
The research team gratefully acknowledges the contributions of many people to the development of this report.
A Steering Committee, ably chaired by Diane Watson, provided guidance throughout the course of this project and pointed us to relevant initiatives and expertise. Thank you to: Erica Di Ruggiero, Daniel Friedman, Elizabeth Gyorfi-Dyke, Mary Catherine Lindberg, David Loukidelis, Vera Ndaba, Gregory Sherman and Michael Wolfson.
Michele O’Rourke gave us excellent secretariat support, organizing and minuting steering committee meetings. Kim Gaudreau ably took these tasks over toward the end of the project.
A series of key informant interviews provided us with in-depth understanding of many current and ongoing initiatives related to data development, access, custodianship and inventories. Interviewees included: Richard Alvarez, Nick Black, John Frank, Chuck Humphrey, Louise Ogilvie, Greg Sherman, Steve Slade, Diane Watson, Greg Webster, Michael Wolfson, Glenda Yeates and Jennifer Zelmer.
Over forty anonymous interview participants provided useful insights into the worlds of data collection, custodianship, access and use.
Lars Apland provided able research assistance with the development of the conceptual framework. Mary-Doug Wright and Catherine Howett identified and retrieved resource material. Peter Schaub produced the figures in Chapter 5 of the report.
Michele Wiens, Karen Hofmann and Carole Herbert provided expert assistance in the testing of our conceptual framework/data collection tool.
Three external reviewers—Carolyn DeCoster, Steven Lewis and Don Willison—offered thoughtful critiques of an early draft of the report that helped us refine and in some cases re-frame our findings and recommendations. We are also grateful to Judith Maxwell, president of CPRN, whose insightful comments helped us think about the governance issues surrounding our recommendations. Heidi Matkovich provided invaluable editing of an early draft and the final version of the report.
Cathy Fooks was the Principal Investigator on this project until June 30, 2004 in her previous role as Director of CPRN's Health Network. As of September 2004, Tom McIntosh stepped into this role and provided helpful input and guidance as we finished the report.
The conclusions and opinions found in the report are solely those of their authors and therefore do not necessarily represent the views of the funding partners (CIHR, CIHI-CPHI, Statistics Canada and Health Canada) and the members of the steering committee. Thus no official endorsement by the funding partners or the members of the steering committee is intended or should be inferred.
In the fall of 2002, the Canadian Institutes of Health Research (CIHR) Institutes of Population and Public Health and Health Services and Policy Research jointly issued a request for proposals (RFP) with the Canadian Population Health Initiative (a part of the Canadian Institute for Health Information), Health Canada’s Centre for Surveillance Coordination and Statistics Canada. The objectives of this RFP were to describe the current status of population-based health and health services databases in Canada and to show the potential for their use in innovative and important health research. For the purposes of this project, population-based health and health services data were defined as administrative databases, registries and survey databanks that are representative of an entire population who reside in a geographic region. The RFP noted that while Canada has some of the best-developed data repositories for studying health and health care, "the challenge now lies in enhancing access to and use of current data infrastructure for the purposes of conducting important health research and to make wise investments to increase data and analytic capacity."
The Canadian Policy Research Networks (CPRN), in partnership with the Centre for Health Services and Policy Research (CHSPR) at the University of British Columbia, was awarded the contract to undertake this project. This report presents the results of interviews with data collectors, custodians and users to identify current or emerging issues around collection, storage and use of data; reviews the current landscape of privacy and access issues in Canada; surveys international and Canadian activities in providing information about and access to data sets; outlines considerations for creating an inventory of population health and health services research databases; describes a prototype data collection tool that could assist in the development of such an inventory; and makes recommendations for moving forward the agenda of improving access to and use of Canadian data in the areas of population health and health services research.
The breadth of work undertaken meant that we were able to identify and offer recommendations for major topics, but we were not able to follow every concern in great detail. The broader context was also evolving as we conducted this study, at a speed that made it impossible for us to include everything that might be relevant or of interest. While we were not able to suggest immediate solutions in all cases, or to provide complete coverage of all topics, we hope this report provides useful direction for actions that can be taken to support researchers in population health and health services research in Canada.
The current data environment in Canada involves a large number of players, all with differing mandates and roles. There are many data custodians in Canada, both national and provincial/territorial, that collect and maintain a wide range of population-based health and health services data.
However, these collectors and custodians, including Statistics Canada, the Canadian Institute for Health Information (CIHI), Health Canada, and provincial/territorial government departments and ministries, typically have no explicit mandate to support the research community. Data are collected for public health and surveillance, or, more commonly, in the course of operating health, education and social systems.
Regardless of the original purpose of collection, secondary analysis of such data has great potential for improving our understanding of the impact of public policy and other interventions on individuals and populations. But secondary analysis also necessitates a complex set of arrangements to govern the retention, disclosure and use of data. These issues become particularly contentious when research is not recognized as a primary mandate for the collecting agency.
As a consequence, Canada has a complex, fragmented set of arrangements by which some researchers can obtain access to data for research and others cannot. Interviews with data users and data collectors/custodians highlighted these and other issues.
Views from data users and data collectors/custodians
Forty-three interview respondents from across the country were asked to: identify the issues of greatest concern to them; consider whether they thought an electronic inventory of Canadian population-based health and health services databases would be a useful tool; and to nominate areas for future investments.
Access quickly emerged as the major concern of data users, and responding to researcher requests for access as the major concern of data collectors/custodians. In organizing the observations and suggestions of interview respondents, we used a well established framework that describes several dimensions of access, each of which reflects the issues, challenges and opportunities facing data users and data collectors/custodians in the development, maintenance and use of population-based health and health services databases.
1. Availability. Research agencies should facilitate discussions between researchers/users and collectors/custodians to improve access to data. In addition, resources should be allocated to encouraging and creating greater linkages across databases and across jurisdictions.
2. Accessibility. Users suggested that every province should have a Statistics Canada Research Data Centre, and further recommended the creation of more provincial data centres within each province. Some custodians are currently exploring ways in which to extend data access through the liberalization of access policies, or the re-negotiation of licensing agreements.
3. Accommodation. Some users called for a pan-Canadian vision, a uniform, standardized policy for data access and a move away from multiple ad hoc arrangements. Others suggested the creation of a single data repository within each province whose sole purpose would be to ensure access to quality data from all government sectors. Other suggestions included the development of a universal format in which data are exported and better data documentation to make it easier for users to understand and use data. From the collector/custodian point of view, suggestions included liberalizing licensing agreements for centres in receipt of provincial government data. Most suggested that funding be increased to specifically account for the provision of data as one of their core businesses.
4. Affordability. Users recommended that public data should be free of charge and (subject to suitable privacy/confidentiality controls) available on the internet as it is in the US. Collectors/custodians recommended multi-year funding, and the recognition of the need to fund costs of data maintenance and cleaning. Technologies such as the electronic health record were seen as innovations that would decrease the work of data technicians in updating and cleaning records.
5. Acceptability. Users recommended an education program to address the public’s concerns about privacy and security and to demonstrate the value of research in improving the overall health of the population and the functioning of the health care system. Some believe that it would assist public understanding if privacy legislation distinguished between bona fide researchers and other data users such as commercial organizations. Others recommended that the CIHR could facilitate discussions with collectors/custodians with the view to negotiating greater flexibility in access to data while providing assurances about protection of privacy and confidentiality. Overall, both users and collectors/custodians reported a need to better communicate and to build relationships and trust between each other.
6. Adequacy. Users and collector/custodians recommended that data quality should be recognized as a priority and should be reflected in the resources apportioned to it. Standardization of data definitions and collection methods, increased training of personnel, education of data users, the development of better data documentation, and the introduction of technologies such as telephone-assisted surveys and electronic health records would all greatly improve data quality.
In response to a question about the utility of an electronic inventory of population databases to support population health and health services research, the majority of both users and collectors/custodians provided either support or qualified support for development of such a resource. If an inventory were to be developed, many respondents suggested that it should be user-friendly, web-based with a dedicated website, and searchable by key words and standardized variables. Many felt it would be useful to have links to the data custodian’s website, data dictionaries, data documentation, as well as links to articles/reports that used the data set. A number of respondents said an inventory should also provide a web-based portal to enhance actual data access.
Most believe that the custodian of the inventory should be an independent, national body, either a new body with this single mandate created through a federal/provincial/territorial agreement or an existing national body, such as CIHI, the CIHR Institutes of Health Services and Policy Research and Population and Public Health, or Statistics Canada.
Respondents had many suggestions for the creation of new data sets, including:
- Health services data—community care, mental health, public health, drugs
- Population health data—chronic diseases, disease staging data
- Biological and physical measurement data
- Longitudinal data—seniors, children, cohort studies
- Special populations—Aboriginal, homeless
Other suggested areas of investment included:
- negotiating standardized access and privacy policies;
- stable and ongoing funding for database maintenance and purchase;
- standardization of existing data sets and creation of data documentation;
- increased training for researchers to use large data sets and conduct secondary analysis, and for technicians to support and maintain large data sets;
- facilitating inter-regional comparisons and data linkage especially between health services data and determinants of health data;
- facilitating better understanding between collectors/custodians and users/researchers; and
- the development of a national vision and strategy for the collection, maintenance, and sharing of publicly funded Canadian data.
Privacy and access issues
The research environment is increasingly complex both because of rising public concerns about the privacy of individuals’ personal information and because many jurisdictions are creating new legislative and regulatory frameworks for the conduct of research using such personal information. To supplement the views of the interview respondents, we undertook a literature review to identify practical issues faced by researchers and data custodians using population-based health and health services data for research in Canada.
Eight primary issues related to privacy protection that face researchers and data custodians were identified from a review of academic and grey literature. These are discussed in relation to facilitating access to data while protecting individual privacy.
1. Consent. Canada has developed what has been termed a “patient-centred” model of privacy protection. Secondary uses of data in this model require special permission or special conditions before consent can be dispensed with and some mechanism is usually required to mediate competing interests. But there is considerable debate over the circumstances when explicit consent is needed for the secondary use of personal information for research purposes, leaving researchers confused about their ethical and legal obligations.
2. Data linkage. Data linkage provides for much more powerful analysis than a single data set but raises a series of concerns beyond issues of consent. There is a significant degree of variation in the access and linkage policies of data custodians among provinces. Moreover the capacity and resources to undertake the work vary considerably across the country.
3. Retention and destruction. There is no consistent approach to how data should be archived, the length of time the data should be retained, or protocols for future access, including use for audit purposes. The Canadian research community, including research funding agencies, is beginning to discuss how to better support data infrastructure and allow for compliance with emerging regulatory frameworks for the protection of privacy in research.
4. Security safeguards. There is unanimous agreement about the need for tight safeguards for the protection of data. Data custodians must be clear about what steps they take to safeguard the information they hold and must be transparent and accountable about their processes.
5. Review and oversight and the role of research ethics boards. There is consensus that some clearly competent and independent group must review research proposals to assess the trade-offs between the risks to individual privacy and the societal benefits of the research, and to ensure all possible steps are taken to maintain confidentiality. Canada is increasingly looking towards research ethics boards to play this role, and these will need national mechanisms to ensure consistency in their work (see below).
6. Multiple rules, policies and procedures. Multiple rules, policies and procedures, which vary across jurisdictions and organizations, govern access to research data in Canada. The lack of standardization in access procedures, and in data quality, extraction and linkage, are immensely frustrating for researchers, especially those who wish to work with multiple data sets (and thus custodians), or those who wish to engage in cross-jurisdictional work.
7. Public communication. Engaging the public in discussions about research, privacy and use of data was seen as very important. A view held by many in the research community is that the general public does not understand the importance of the research being undertaken or its potential societal benefits and therefore needs to be convinced of its utility. At least two projects underway in Canada aim to assess public views on the use of personal information in research, and to develop better public communication tools.
8. Legal and policy frameworks surrounding data access and privacy protection. The existing regulatory framework in Canada exhibits clear policy support for non-consensual use of personal information for research purposes, but there is considerable variation in the practicalities of doing so. The federal Personal Information Protection and Electronic Documents Act (PIPEDA) and various new provincial privacy laws have added to the complexity of the legislative environment in Canada, and introduced ambiguities about the necessary steps needed for researchers to comply with privacy legislation, especially for cross-provincial work. CIHR’s privacy best practices guidelines are one step toward harmonization or consistency, but it remains to be seen how these guidelines will work with or influence changes in privacy laws across Canada or in much needed changes from a population health and health services research perspective in the existing Tri-Council Policy Statement on Ethical Conduct for Research Involving Humans.
A number of potential options for improving access to data for health research while strengthening privacy safeguards were identified in this review. Some of these are already underway in Canada.
1. Develop privacy tool kits. A tool kit that research organizations and data custodians could use to protect privacy while allowing access to data could help to standardize practices. Appropriate tool kits would include techniques for masked (i.e. anonymized) data sharing, techniques for secure transfer of data, consent forms, and procedures to reduce re-identification.
2. Develop best practice privacy guidelines or standards. Because it is the interpretation of legislation that shapes approaches to the use and access of data, the development of best practices guidelines or voluntary standards for protecting privacy and confidentiality can support harmonization. A pan-Canadian Health Information Privacy and Confidentiality Framework has been developed by federal, provincial and territorial government officials with a view to creating a harmonized series of legislative provisions to protect personal health information.
3. Develop models of data stewardship. The roles of data stewards are worth clarifying and developing further. Separating those holding the data from those undertaking the research by relying on independent assessments of privacy risks and confidentiality protocols removes any actual or perceived conflict of interest. Success would depend on a credible process, quick turnaround, transparent decision-making and assurance of some form of meaningfully independent oversight and accountability.
4. Strengthen and improve the practices of research ethics boards. Further work is currently underway to identify innovative best practices and delineate variation in policies and practices of research ethics boards in governing privacy, confidentiality and security issues in health research.
5. Public communication about research and privacy trade-offs. Public discussion about balancing research and privacy issues could be facilitated through targeted information about data collection and use, templates for effective communication, and relationship-building with reporters interested in health issues.
Existing inventory and data access activities
An in-depth analysis of relevant initiatives to enhance documentation of and access to data resources revealed significant efforts, international and Canadian, to build inventories of data. In the US and the UK, this work involves the development of data archives, inventories of databases, and web-based portals aimed at researchers, as well as a number of complementary activities. There are a broad variety of approaches and some evolving best practices, and a complex and evolving scientific agenda related specifically to the documentation of data and research resources. Most efforts so far concentrate on the documentation of survey data, often in combination with providing actual access to those data. No best practice models were specifically targeted to the areas of population health and health services research, which cover a vast set of content issues and rely on a similarly wide and increasingly complex variety of data sources.
In Canada, most inventory-building activity is organization-specific and aimed at documenting organizational data holdings only. There is little standardization in approaches and it is not clear how useful these inventories really are to the research community. There is also little coordination in improving data documentation and access. There is no single Canadian portal to identify data sources, no standard format being used to compile information, the sources are in varying states of maintenance, there is spotty coverage by agency, and hence only narrow topic-specific information of variable quality. Finally, there is little sustained effort to provide such access in the area of administrative data, an important resource for population health and health services researchers. In short, there is no coordinated and focused development that could provide a strong foundation for Canada’s research community.
Building an inventory of databases
We were initially given the task of developing an electronic framework for the creation of an inventory of databases relevant to population health and health services research in Canada. We were able to develop a framework that might serve as the content infrastructure for such an endeavour, but our review of the international and Canadian inventory-building activities suggests that actually building a prototype “inventory” would be quite premature.
Three major areas of consideration will need to be addressed by organizations wishing to develop such an inventory. The first is the model—what is the nature of the inventory, how often will it be updated, and so on; the second is stewardship and management—who will assume responsibility for building, populating and maintaining such an inventory?; and the third is funding—from what source(s) will the considerable funding needed for both start-up and ongoing operating costs be derived? In addition to these issues, potential funders of an inventory must also consider how this effort fits with other work that is currently underway. International efforts in particular show the benefit of building more than a basic inventory of data sets. Additional efforts to preserve investments in research data and ultimately enhance our understanding of health and the factors that determine it are required.
We did, however, develop and pilot test a first version of a data collection tool that can provide the basis for creating a population health and health services research inventory of databases. A number of existing resources were reviewed to identify the best ways to collect information about databases, including general descriptive information, attributes of the data such as the unit of observation and availability of the data for research. From these, we developed a conceptual model to support consistent recording of information about the content of data sets in a manner that provides relevant information about the population and public health and health services research landscape. The utility of the tool was tested, and a prototype ‘database’ was developed from a diverse sample of candidate data sets. We also created a decision tool to be used for deciding whether a particular database is relevant for inclusion in the inventory.
Next steps and recommendations
Canada has an international reputation based on the development and implementation of a population health framework—an understanding and recognition of the many factors that influence the health status of individuals and populations. Canada is also known for the collection and research use of administrative data related to health care services. In part, this reputation is based on the availability of universal and comprehensive data about the use of health care services, data that exist because of the funding and administrative structures of provincial, territorial and federal health care services. This reputation also comes from recognition of Canadian researchers as innovators in understanding the power that such data hold, and in converting that understanding into research findings that have provided a wealth of evidence for the policy development process.
Our work suggests, however, that Canada is not currently recognized as a leader when it comes to the systematic organization, archiving, documentation of, and access to data relevant to population health and health services research. Our ten recommendations highlight opportunities to change the situation.
CIHR should take a lead in coordinating a series of activities to address privacy issues that are specific to the population health and health services research community. This work includes:
a) Clarifying the definition of research that has “public value”;
b) Developing a constellation of privacy tools and techniques (including best practice guidelines) to assist researchers and data custodians in protecting privacy while allowing access to data;
c) Strengthening the role of research ethics boards, increasing and harmonizing expertise;
d) Influencing the development and interpretation of regulatory and legislative frameworks to ensure they support privacy-sensitive research, and where possible, that they are harmonized across jurisdictions; and
e) Engaging with the public about the value of health and health services research and how it should be conducted in a privacy-sensitive manner.
CIHR should convene and lead a “coordinating body” that will focus on improving access to population health and health services research data and that will be charged with reviewing and carrying forward the recommendations in this report.
CIHR, as the lead organization for health research in the country, and in cooperation with other funders of health research, should strongly encourage key national and provincial data custodians to review their mandates, with the goal of clarifying and increasing their commitment to providing data and other supports for population health and health services research.
Data custodians of population health and health services data, including the Canadian Institute for Health Information and provincial data custodians, should be encouraged to work with privacy experts and the research community to create and make available public use microdata sets as well as to provide access to more detailed microdata sets for publicly funded research.
Provincial and regional custodians of population health and health services data should develop clear processes and equitable costing mechanisms for making data available to researchers.
CIHR should support the costs of conducting data-based research in population health and health services research by:
1) Under certain circumstances, allowing operational research budgets to include the costs of archiving and documenting large-scale data collection efforts, where there is intent to make those data more broadly available to the research community;
2) Developing funding streams that parallel the “equipment grant” program used by the basic and clinical health research domains.
CIHR should actively pursue opportunities to work with current initiatives with the potential to improve access to research data that supports development of population health and health services research:
a) In the ongoing National Consultation on Access to Scientific Research Data, to ensure that the special circumstances around access to population health and health services data (i.e. privacy considerations around personal health information and dependence on non-research data collectors and custodians) are addressed;
b) In influencing Canada Health Infoway, to explicitly consider and build in mechanisms to support researcher access to data as it invests in prototypical development of information infrastructure.
CIHR should work with partners to develop a web-based “population and public health and health services research” portal that could house an electronic inventory as well as related tools to support the research community to use existing data resources efficiently and in a privacy-sensitive manner.
The partners should review the findings from the interviews and the survey of existing activities to reassess their commitment to building, maintaining and refining an inventory.
If the partners wish to proceed with development of an inventory, they should develop an
appropriate vision and business plan. This vision/business plan should:
a) define the objectives of the resource;
b) identify the primary customers to be served;
c) identify a model that can build on Canadian activities already underway to document agency- and topic-specific data holdings;
d) identify a steward or host agency that can competently develop and manage the resource;
e) identify ongoing funding to support development over a period of at least five years; and
f) identify an evaluative process to ensure the resource developed meets the needs of all relevant stakeholders
There is a great deal that could be done to support the community of population and public health and health services researchers in Canada. Building an inventory of population-based databases, as envisioned by the funders of the RFP for this project, is one option. But there are many issues to consider prior to starting down that particular path.
There is a clear role for a body, working group, or some other organization to take the recommendations in this report and coordinate or monitor activities relevant to them. Less clear is how such a group might be formed and maintained. Our hope is that CIHR will recognize the critical importance of this work in supporting its researchers and will take on this daunting but important challenge.