Funding awarded to two projects to explore the use of synthetic data
Categories: Research using linked data, Public engagement, Datasets, Potential, ADR UK Partnership
8 April 2024
Two new projects will collect insights on the perspectives of data owners and providers, and the public, about synthetic data. The projects will focus on low-fidelity synthetic data to explore its potential benefits, costs and utility to administrative data research.
This is a jointly funded initiative from the Economic and Social Research Council (ESRC)’s Data & Infrastructure Programme and ADR UK.
Synthetic data, also known by other names such as artificial, dummy, simulated, mock or fake data, is an emerging area of development for supporting research using securely held administrative data. ADR UK has identified synthetic datasets could be used:
- for training purposes
- to explore whether a dataset could be helpful for a specific research project
- in instances where a researcher needs to progress with developing their code, understanding the structure of the data, and testing different statistical methods before they can get access to the real data.
The two newly funded projects are an important part of a wider piece of work which aims to make recommendations for developing shared terminology and agreed governance structures for synthetic data, taking the public’s perspective into account.
Data owner and data providers’ perspectives on synthetic data
Cristina Magder, Data Collections Development Manager at the UK Data Archive, University of Essex will lead a project to explore synthetic data from the perspectives of data owners and data providers. Data owners will include government departments and academic research centres. By data providers, we mean the trusted research environments that provide secure access to sensitive datasets for accredited research such as the ONS Secure Research Service and the SAIL Databank.
Cristina and her team will use surveys, case studies and a focus group to explore:
- governance and models for sharing synthetic data for research
- costs of creation and ongoing maintenance of synthetic datasets
- the efficiencies that might be gained, such as improved and more focused data access requests.
The team will synthesise findings into a report with recommendations.
Cristina said: “By fostering a deeper understanding and establishing an open and proactive dialogue with data owners and data providers, this project strives to bridge the existing knowledge gap and push the domain of synthetic data into a new era of informed and efficient usage. This will pave the way for an efficient, cost-effective, and publicly acceptable framework of synthetic data production and dissemination.”
Public attitudes to the use of synthetic data
Dr Fiona Lugg-Widger, Deputy Director for Data and Dr Rob Trubey, Routine Data Study Manager, both based at the Centre for Trials Research, Cardiff University will be leading a public consultation to explore the public’s knowledge and understanding of synthetic data, and their attitudes to it for research purposes.
Fiona explained: “We want to hear from a diverse mix of people. To do this, we will reach out to community organisations across the UK and run interactive, facilitated workshops and group discussions to elicit their perspectives. Workshops will run in June and July this year.”
The findings of both studies will be used to inform a cross-cutting programme of work which aims to develop a clear roadmap for operationalising scaled production and sharing of synthetic data in a way that is efficient and acceptable to the public, data owners and to researchers.
These projects are the result of a funding opportunity which opened in the summer of 2023. We were unable to award funding for the third grant to explore the experiences of researchers using synthetic data.
The projects will run from Spring 2024 for one year. Sign up for updates to be kept informed.
Project details
A cost-benefit analysis of low-fidelity synthetic data for data owners and providers
Project lead: Cristina Magder, Data Collections Development Manager at the UK Data Service, UK Data Archive, University of Essex
Duration: April 2024-March 2025
Total funding: £112,986 (Full economic cost)
Public understanding of and attitudes towards low-fidelity synthetic data
Project leads: Dr Fiona Lugg-Widger, Deputy Director for Data and Dr Rob Trubey, Routine Data Study Manager, Centre for Trials Research, Cardiff University
Duration: March 2024 – January 2025
Total funding: £151,117 (Full economic cost)