Funding opportunity: Operationalising the scaled production and sharing of synthetic data
Categories: Funding opportunities, ADR UK Partnership
27 February 2023
ADR UK and the Economic and Social Research Council (ESRC) are inviting applications from individuals or teams to explore the use and potential of synthetic data. The grant holder will evaluate the use of low-fidelity synthetic versions of datasets held securely within UK trusted research environments.
The full economic cost of the grant can be up to £375,000. UK Research and Innovation (UKRI)’s Digital Research Infrastructure fund, the ESRC, and ADR UK will fund 80% of the full economic cost.
We expect project proposals to be 18-24 months in duration.
The grant holder(s) will produce a report addressing the objectives for the call, including recommendations for the scaled production and sustainable sharing of low-fidelity synthetic versions of administrative and social survey data.
Applications will close on Tuesday 9 May at 16:00.
On 3 March 2023, we held a webinar for prospective applicants to discuss the opportunity and answer any questions. If you would like to watch the recording, please email Emily Oliver, Head of Research & Capacity Building at ADR UK.
About the opportunity
Synthetic data is a version of a dataset that resembles the real data, but does not include any information about real individuals. It can be a useful tool in helping researchers to understand the structure and potential of a dataset and generate code to analyse it. Yet synthetic data is not currently produced at scale to support research in the public interest.
The successful grant holder will need to:
1. Identify a collection of low-fidelity synthetic versions of datasets that are available for researchers to access through the UK Data Service, the Office for National Statistics (ONS) Secure Research Service and other trusted research environments. Datasets identified for the evaluation should include but are not limited to synthetic versions of:
- Annual Survey of Hours and Earnings (ASHE)
- Grading and Admissions Data for England (GRADE)
- MoJ Data First datasets
- Hospital Episode Statistics (HES)
- the National Pupil Database (NPD) and Longitudinal Education Outcomes (LEO), when these become available.
Proposals can also include the creation of new synthetic data, in cases where applicants can justify a need for the purposes of evaluating systems-wide operationalisation.
2. Evaluate the broad set of costs associated with creating synthetic data for data owners and trusted research environments including initial and ongoing costs, for example, updates.
3. Evaluate different models for sharing synthetic data, including implications for data owners or data providers in resourcing sharing.
4. Evaluate any improvements in efficiency for data owners and trusted research environments when synthetic data is available.
5. Evaluate how the use of low-fidelity synthetic data affects researchers’ experience of carrying out research using secure administrative or social survey data, for example:
- How far does synthetic data help users to understand the data, as well as scope research questions, in advance of applying for access to the real data?
- What is the impact of synthetic data access on the quality of applications to access real data?
- How useful is synthetic data for developing and testing code outside of the secure environment, either while waiting for access to the real data, or after access has been granted?
6. Make recommendations for scaled production and sharing of low-fidelity synthetic data which are acceptable to data owners, including identifying opportunities for automation to increase efficiency. Although the focus of the project should be on low-fidelity synthetic data, the grant holder should consider how these recommendations could relate to the operationalisation of high-fidelity synthetic data.
Public involvement and engagement
ADR UK is committed to meaningful public involvement and engagement, and will be co-leading a public consultation on public attitudes to synthetic data in parallel to this funding opportunity. The successful candidate or team will be expected to proactively engage with and contribute to the activities of this programme where appropriate.
The outcomes of the public consultation are expected to inform any recommendations for the scaled production and sharing of synthetic data in the final publishable report.
Funding opportunity deliverables
The expected deliverables of the funded proposal include:
- At least one blog published on the ADR UK website
- Academic paper(s)
- A final report setting out responses to each of the objectives for the call and published on the ADR UK website. This should include overall recommendations for the scaled production and sustainable dissemination of low-fidelity synthetic versions of secure administrative and social survey data.
Read more about why we are launching this funding opportunity.
Eligibility and funding
Proposals are welcome from individual researchers or small teams from eligible research organisations, as specified in the ESRC research funding guide.
We will be looking for demonstrable experience of qualitative or mixed-methods evaluations, together with a willingness to engage with data owners, academic and non-academic researchers, and trusted research environments across the ADR UK partnership and beyond.
|Applicant webinar||Friday 3 March 2023, 13:00 - 14:30|
|Application deadline||Tuesday 9 May 2023, 16:00|
|Panel meeting||Late June 2023|
|Latest start date||Friday 15 September 2023|
How to apply
For more details, including the assessment criteria and how to apply, visit the UKRI Funding Finder.
Read more about synthetic data and the potential impact of this grant.