New derived variables available as part of Longitudinal Education Outcomes - Iteration 2.1

Categories: Data linkage programmes, Research using linked data, ADR England, Office for National Statistics, Children, young people & education

15 May 2026 Written by Prof Claire Crawford, Professor of Economics, UCL CEPEO

This blog by Professor Claire Crawford outlines recent updates to the Longitudinal Education Outcomes (LEO) dataset, including the introduction of new derived variables designed to make the data easier to use while reducing the need to access more detailed personal data. Claire leads an ADR England-funded project, working in partnership with the Department for Education (DfE), to support the development of LEO.

This week, the Office for National Statistics (ONS) released an updated version of the Longitudinal Education Outcomes (LEO) dataset for England – Iteration 2.1. LEO brings together education, tax and benefit records, enabling researchers to track individuals’ progress through the education system and into the labour market.

Iteration 2.1 is a partial update. Its main addition is three extra years of labour market data for individuals already included in LEO Iteration 2, extending the latest tax and benefit data to 2023–24. This means we can now observe the labour market outcomes of the oldest individuals in LEO up to the age of 37.

Alongside this, we at the Centre for Education Policy and Equalising Opportunities (CEPEO), funded by ADR UK, have worked with the Department for Education (DfE) – the data owners of LEO – to make a couple of other improvements. We’ve simplified the identifiers used to match individuals across different data tables, making LEO easier to use.

More importantly, we’ve created a set of derived variables that users can request. These remove the need for researchers to construct them themselves and hence minimise the amount of personal data that needs to be shared. We also hope these will serve as a ‘default’ set of variables that researchers can use consistently across projects, rather than creating their own (slightly different) measures each time.

In this initial release, we’ve focused on information relating to individuals’ school careers – specifically, their histories of eligibility for free school meals (FSM), an indicator of low family income and the main individual-level measure of disadvantage in LEO, and identification of special educational needs (SEN).

This information is drawn from the school census, which has covered all state school pupils in England since 2001–02. It is recorded as frequently as every term for those attending a state-funded school in England between the ages of 4 and 16. For some cohorts in LEO, this amounts to up to 36 terms (12 years) of data, which users would otherwise have needed to request in order to construct these variables themselves.

Researchers tailor data to suit specific research questions, so we have aimed to make these derived variables as comprehensive and flexible as possible. For each individual in LEO, we’ve created variables that allow users to identify:

FSM eligibility	SEN identification
The number of terms/years in which individuals were recorded as FSM eligible	The number of terms/years in which individuals were identified as having less severe needs (SEN without a statement)
	The number of terms/years in which individuals were identified as having more severe needs (SEN with a statement)
Whether individuals were ever recorded as being FSM eligible (between age 4 and 16)	Whether individuals were ever identified as having less severe SEN (age 4-16)
	Whether individuals were ever identified as having more severe SEN (age 4-16)
	Whether individuals were ever identified as having one of 14 specific types of SEN (between age 4 and age 16)
The total number of terms/years in which individuals appeared in the school census (to enable users to create proportion or percentage measures of these variables)

And we’ve also created all of this information separately for primary and secondary school periods, as research has shown that the timing of experiences of disadvantage matters for children’s outcomes, and this split provides some insight into when these experiences occur.

Of course, these variables aren’t going to suit everyone. For researchers who want to know specifically when an individual was first identified as having SEN, for example, or for those who want to know whether an individual was FSM eligible for a block of time vs. transitioned in and out of entitlement, these variables are not going to be perfect. But, hopefully, for those who just want to know whether someone was ever identified as being part of one of these groups, or how persistently they were identified as such, these variables should save substantial time and effort—while also reducing the amount of personal data required.

This is the first time a bespoke set of derived variables has been shared as part of LEO and we’d love to hear what you think. Did you find them useful? Are there additional things you’d have liked us to include – or indeed completely different derived variables that you’d like to see made available in future? Get in touch with us via the LEO programme team at DfE to share your thoughts (LEO.programme@education.gov.uk) or start a discussion on the LEO community hub.

What would you like to look for?

New derived variables available as part of Longitudinal Education Outcomes - Iteration 2.1

Share this: