Categories: Research using linked data, Datasets, ADR UK Partnership
11 May 2023
The purpose of this statement is to communicate the current position of ADR UK on the issues of synthetic data production, access, governance and use. These issues are evolving, so ADR UK’s position will be updated in line with new findings and developments.
Synthetic data, also known by other names such as artificial, dummy, simulated or fake data, has been emerging as a key area of development for supporting administrative data for research. ADR UK has identified the need for such datasets for:
- training purposes
- exploratory analysis to determine if the data is going to be helpful for a particular research project
- instances where researchers need to progress with developing their code, understanding the structure of the data, and testing different statistical methods before they can get access to the real data.
Where health data is held securely, there is a similar demand for this resource.
In 2021, ADR UK commissioned the Behavioural Insights Team to engage government departments in a discussion about synthetic data and their level of comfort in and barriers to producing it. The report documents various concerns related to:
- data quality and how well it reproduces the relationships and characteristics of the real data
- tensions between protecting privacy and retaining a level of utility of synthetic data
- systemic or technical barriers such as a lack of knowledge and understanding, ethical and legal barriers, and inconsistent technological support being available for users.
The report recommended that ADR UK should:
- encourage the use and sharing of low-fidelity synthetic data across government and with researchers
- expand the use of synthetic data for training and improve the efficiency of live projects
- develop a cross-government repository of synthetic data, accessible to government analysts and accredited researchers without a specific project proposal.
There are various approaches to the creation and use of synthetic data across the ADR UK partnership. Issues of governance and access also vary, as does the level of fidelity of different datasets. We are also aware that different levels of fidelity of any one synthetic dataset might require different levels of access and/or governance. In the document below, we present summaries from each national partnership, reflecting their approach through use cases and supporting the value of debate and difference.