Getting started with LEO: studying the post-16 activities of young people

Categories: Blogs, ADR England, Children & young people, World of work

23 October 2025

In this blog, Dave explores how researchers can use the Longitudinal Education Outcomes (LEO) dataset to study the post-16 activities of young people - from education and training to employment - and shares practical insights from working with this powerful but complex data resource.

Over the last five years, researchers studying education in England have increasingly drawn on a relatively new data resource, Longitudinal Education Outcomes (LEO). LEO links together multiple administrative datasets – from school histories in the National Pupil Database, to employment and earnings data from HM Revenue & Customs, and benefits data from the Department for Work and Pensions.

Perhaps its most widely known use has been to calculate the returns to higher education. But the potential goes far beyond that. Among many other things, LEO can be used to explore post-16 pathways, transitions into (and out of) education, training and work, and long-term outcomes for different groups of young people.

For example:

The challenges of working with LEO

The strength of LEO is its scale. With (almost) complete coverage of the state-educated population in England since 2002, it allows researchers to study small groups in a way that would be impossible with survey data. But that same scale brings challenges:

  • Size and complexity: the datasets are vast, requiring significant processing power and storage.
  • Fragmentation: post-16 education data is spread across several sources - the School Census, the Individualised Learner Record (ILR) and the HESA student record.  Employment and benefits data are held separately, as is the NCCIS dataset of post-16 activities recorded by local authorities.
  • Sparse documentation: guidance is limited, which makes it harder for new researchers to get started.

In short, even once you’ve navigated the application process, the bar to using LEO is high.

A resource to help: the Youth Transitions Hub

To lower that barrier, we’ve put together a GitHub code repository and wiki. This contains the code we routinely use to take raw LEO data and transform it into a more usable form.

What it includes:

  • Data extraction: pulling out the bits you actually need and leaving the rest
  • Data cleaning: fixing inconsistencies and standardising formats
  • Reshaping: structuring the data so it can be queried more easily
  • Indicators:
    • yearly and monthly summaries of each person’s post-16 activities
    • “pathways” showing the routes learners take through school and further education.

Because LEO is provided in a SQL Server database, the code is written in T-SQL. We know not everyone will be familiar with this, so we’ve added detailed comments explaining what each script does.

If you use the repository and run into difficulties, we’d love to hear from you - just head to the discussion page and get in touch.

For more insights from the Youth Transitions Community Catalyst project, join our LinkedIn group.

Share this: