Longitudinal Education Outcomes synthetic data: What is it and why is it useful?

Categories: Blogs, ADR England, Office for National Statistics, Children, young people & education, Employment & the economy

14 April 2026 Written by Professor Claire Crawford, UCL

You’re a new PhD student thinking about what to research for the next three years. You’ve heard about the Longitudinal Education Outcomes (LEO) data, which links together individuals’ records from schools, further and higher education institutions, and the tax and benefit system. But it’s daunting; tens of data tables, hundreds - maybe even thousands - of variables, with different information collected at different points in time, and variable names changing year to year.

There’s a spreadsheet with lots of detailed information, but it’s tough to visualise what this looks like in practice. Should you take the time to submit an application to access the data in the Office for National Statistics Secure Research Service (SRS)? But that would require you to specify a particular research project, and it could be a while before you can gain access within the trusted research environment.

What is the LEO synthetic data?

Enter LEO synthetic data. This is part of an ADR UK ‘research ready data’ grant-funded project I’m leading, working in partnership with the LEO programme team at the Department for Education. Together, we have created a small synthetic (or ‘dummy’) version of the data.

It preserves some of the properties of individual variables in the data, such as the average or median values, and the proportion of individuals with missing data. But, crucially, it does not include any personal data – that is, any information related to an identifiable individual.

Why have we created LEO synthetic data?

The purpose of the synthetic data is twofold:

  • To allow researchers new to LEO to better understand what the real data might look like, and how useful it might be for addressing their research questions: 
    • The synthetic version of the data preserves many of the ways in which the real data is shared, containing all potential data tables, most of the variables with the same variable names, and so on. It also provides some limited scope to check likely sample sizes which might affect the feasibility of addressing certain research questions, such as checking the proportion of individuals with linked tax records.
    • This should enable researchers to better understand whether LEO is right for their prospective project, limiting access to the real data for projects which, on closer inspection, turn out to be infeasible. It also cuts down on learning time inside the SRS, potentially reducing demand on an already busy system.
  • To facilitate some code preparation outside the SRS: 
    • This will not be perfect; the real data inside the SRS is only available via SQL server, while the synthetic data we have created is shared using other file formats to facilitate easier access and processing outside a secure environment.
    • However, even with these limitations, researchers will still be able to get a head start on code preparation prior to being granted access to the real data. This will again hopefully reduce demand on the SRS and also enable researchers to make faster progress in creating research for the public benefit.

Getting access to the LEO synthetic data

This initial version of the LEO synthetic data is now available to download from the UK Data Service (UKDS). It’s been classified as ‘safeguarded data’, which means users will need to sign UKDS’s standard end user licence agreement – plus acknowledge that they know it’s synthetic data and therefore what it can and cannot be used for – but can then download the data and use it to their heart’s content (within its permitted uses!).

Send us your feedback

The synthetic data we have created is not perfect; we know there are ways in which it could be made more useful to researchers, and we welcome your feedback.

So, if you use it, tell us what you think! How useful was it for you? What small or large improvements could we have made that would have made it even more useful? Email the LEO programme team (LEO.programme@education.gov.uk) to share your thoughts.

This is the first step on a journey – but we hope the synthetic data will help reduce the ‘skill bar’ associated with understanding and using LEO data. 

Learn more about synthetic data

Share this: