Using linked data to evaluate special educational needs provision and offending risk

Pupils with low school performance are at increased risk of involvement with the criminal justice system. Educational interventions during childhood show some promise for mitigating this risk. But what about provisions which the UK school system already has in place for pupils with special educational needs (SEN)? Can access to SEN provision reduce risk of adolescent offending if pupils show early signs of struggling in school? How can we find out?

Randomised controlled trials aren’t always feasible

Ideally, researchers would answer this question using a randomised controlled trial (RCT). In this type of study, researchers identify a sample of individuals who could benefit from an intervention. They then randomly allocate individuals to either receive that intervention, or to a control group which does not receive the intervention. Finally, they compare the groups’ resulting outcomes.

Using our example, we would identify a sample of pupils showing signs of low school performance. Some would be randomly assigned to receive SEN provision, and the others not. Their resulting offending rates would then be compared.

But this sort of study is not feasible. It would be unethical to withhold SEN provision from a group of struggling pupils just to find out how it affects offending risk. Can we do it another way?

Target trial emulation – a possible alternative

Target trial emulation is an increasingly popular approach. Here, researchers first design a hypothetical RCT, then use existing observational data to emulate (mimic) that design. In our example, the researcher would observe whether or not pupils received SEN provision in the natural course of their school career, and compare their offending outcomes. SEN provision would not be actively supplied or withheld for the purposes of the trial alone.

The Ministry of Justice (MoJ) and Department for Education (DfE) administrative linked dataset is an ideal resource for conducting such a target trial. It contains all the necessary data on pupil demographics, attainment, SEN provision, and offending. Using this data, we could emulate our proposed RCT to investigate whether SEN provision reduces offending risk among pupils showing early signs of struggling in school.

While conducting such a target trial appears feasible, we have identified several challenges which researchers could encounter.

Challenge 1: Confounding

In an RCT, the random allocation of individuals to an intervention or control group means that if the groups show a difference in outcomes, you can be reasonably sure that this is due to the intervention – any other differences between the two groups must have arisen by chance. Without random allocation, systematic differences between the groups can affect the findings. This is called confounding.

Using our example, rates of SEN provision can vary between different schools, which in turn can differ in their offending rates. So, if we just compare pupils who receive SEN provision to those who do not, we can’t say for sure whether any observed differences in their offending rates are a direct result of that provision, or whether they merely reflect other (confounding) factors like attending different schools.

To overcome this problem in target trials, randomisation has to be mimicked using statistical procedures, which aim to make the intervention and control groups comparable with respect to confounding variables that may have an influence on the outcome.

Challenge 2: Sample identification

A target trial’s sample might be restricted by data availability. Using our example, SEN provision was only recorded by the DfE from the academic year 2001/02 onwards, and in turn the MoJ-DfE linked dataset only goes up to 2020/21 at the time of writing (although this will be refreshed periodically). If a researcher is most interested in the impact of SEN provision during primary school, the sample would therefore be limited to pupils aged 4 to 11 between 2001/02 and 2020/21. Once this process has been applied to every variable being used in the trial, the number of pupils with sufficient data coverage to be included might be limited. Researchers planning a target trial using this data would therefore need to carefully map out data availability for all of their variables, to ensure that their proposed analysis is possible.

Challenge 3: Defining timepoints

In a target trial, researchers need to be confident that all events under study happened in a particular order. For example, outcome measurement must be based on variables collected only after the intervention has taken place. If any variables occur in a different order, we cannot be confident about their causal relationships. Precise timings can be challenging to ascertain from administrative data.


Unlike RCTs, target trials can use existing administrative data without the need to allocate individuals to intervention and control groups. This can enable analyses that would not otherwise be possible, opening the potential for new insights to inform policy and practice.

Some methodological challenges do require consideration to ensure the success of such a trial.  However, our scoping work indicates that it would be possible to conduct a target trial of SEN provision and offending risk among pupils showing early signs of struggling in school. This would help us to understand whether improving access to SEN provision can mitigate offending risk, and could therefore motivate improved funding and access to these provisions.  

Dr Alice Wickersham is an ADR UK-funded Research Fellow using linked MoJ-DfE data made available through Data First.

Find out more about the Fellowship.

Share this: