Changes over time to Pupil Matching Reference Numbers in the National Pupil Database

Read publication

How PMRs are used to securely link data about children

The NPD contains longitudinal information on children enrolled in state schools and nurseries in England. Data are generated by schools, exam boards and children’s social care departments and sent in mandatory submissions to the UK Government’s Department for Education (DfE). Across several modules, data are available on, among other things:

  • pupil characteristics, including deprivation, special educational needs and social care provision
  • the school each child is enrolled in
  • their absences and exclusions
  • exam results.

Each census (conducted once in each of the academic year’s three terms) contains approximately 7 to 8 million enrolled pupils, around 93% of all school-aged children in England. NPD data have been used in a range of studies on children’s education and social care. More recently, linkage to the Hospital Episode Statistics as part of the Education and Child Health Insights using Linked Data (ECHILD) project has enabled powerful, whole population studies examining the relationships between child health and education. Find out more on the ECHILD website.

Although the DfE holds identifiable data that can be used for linkage, external, accredited researchers can apply to access to de-identified extracts in trusted research environments such as the Office for National Statistics Secure Research Service. In such cases, children’s longitudinal records can be linked together using the anonymised Pupil Matching Reference (PMR)—an encrypted identifier that is applicable within and across each of the NPD’s modules. Accuracy of the PMR is therefore essential as it is usually the only key that researchers can use to link records together.

Why understanding changes to PMRs is important

Linkage rates between NPD and Hospital Episode Statistics in ECHILD, particularly in more recent years, are around 99%, suggesting that both sources contain high quality identifier information. This in turn implies that a high degree of confidence in the PMR is warranted. NPD data are subject to a host of cleaning and validation rules prior to submission to DfE, especially around child identifiers, whose accuracy are also implicated in funding allocation. However, errors do occur and DfE allows submitting bodies to amend previously submitted records. As a result, PMRs are liable to change over time and such changes may be applied retrospectively. This means that two separate extracts of the NPD covering the same years but created at different time points are liable to contain different PMRs. This in turn could affect comparability of results across different studies. Likewise, where researchers obtain refreshes (for example, supplementing an existing extract with more recent data), links between children’s records between the pre-existing and updated data may be lost. It is therefore important to understand:

  • the extent to which PMR allocation changes over time
  • what child characteristics predict this.

In turn, we can understand the extent of possible biases that may result in analyses using NPD. The ECHILD team are fortunate to be able to investigate this issue through comparisons of two separate NPD extracts:

  • The first (made available in December 2020 and designated the “2020 extract”) contains NPD data until the 2018/19 academic year for children born from 1 September 1995.
  • The second (available in May 2023 and called the “2023 extract”) contains records until 2021/22 for children born from 1 September 1984.

By restricting the 2023 extract to children born from 1 September 1995, we were able to calculate the number of PMRs appearing in one but not the other, thereby estimating the rate of change in PMRs between the two. We also calculated relative risks of a PMR not appearing in the other extract according to key demographic variables. In doing so, we aimed to quantify changes over time in PMRs within the NPD and to identify which groups of individuals are most likely to be affected by these changes. 

What we found

The number of rows and PMRs in each extract, and the number and percentage of PMRs that appear in the other extract, are shown in Table 1 in the full publication. The number of PMRs not in the other extract was very low (<5,100) with the percentage as low as 0.005% in 2006, rising to 0.061% in 2019. These results were almost equal whether examining PMRs in the 2020 or the 2023 extract.

PMRs for children with special educational needs provision were more likely to be consistent

The relative risks of a PMR not appearing in the other extract are shown in Table 2 in the full publication. Again, results were the same regardless of extract. PMRs assigned to children with SEN provision were lesslikely to not appear in the other extract compared to those without SEN; in other words, PMRs for children with SEN provision were more likely to be consistent. A similar pattern was observed for PMRs assigned to children living outside of London compared to those living in London.

Boys, pupils of ethnic minority status, and those experiencing greater levels of deprivation had a higher risk of inconsistency

By contrast, PMRs assigned to boys, pupils of ethnic minority status, and those experiencing greater levels of deprivation were more likely to not appear in the other extract. There was a clear interaction between IDACI and free school meals eligibility: within each IDACI quintile, there was a higher relative risk of a PMR not appearing in the other extract for children with free school meal eligibility than for those without.

What these results mean

The extent to which PMRs changed between our two extracts was extremely small (<0.0614% for any given census). While it is striking that records are still being updated at least 14 to 17 years after being submitted (for example, there were 188 PMRs in the 2006 census of the 2020 extract that were not in the same census of the 2023 extract), it was more recent years that were most affected. This recency effect may be due to schools being more likely to update more recent records.

The fact that the numbers of enrolments and PMRs were virtually identical between the two extracts indicates either that PMRs are being replaced outright, or that there is splitting and merging of PMRs in roughly equal measure. Unfortunately, as there is no third persistent identifier (such as the episode key in the Hospital Episode Statistics), and because we do not have access to natural identifiers, it was not possible for us to examine this further. PMRs assigned to children with SEN were less likely to not appear in the other extract (i.e., these PMRs were more likely to persist). This could be due to better quality data being held for children to the whom the school and other agencies have had to make extra provision and resource available.

We observed that PMRs assigned to pupils of ethnic minority status and those experiencing greater levels of area-based and familial financial deprivation were most likely to be affected. Similarly, PMRs of children in London were more likely to be affected than PMRs of children outside of London. These findings reflect poorer linkage rates in ECHILD for children with non-White ethnicity, children living in more deprived areas, and children in London, all of which overlap to some degree. Non-English names, for example, are more likely to be mis-spelt and require correction. Children living in more deprived circumstances may be more likely to transfer schools, for example due to managed moves or off-rolling; it is therefore more likely that schools would hold poorer quality information for these children.

Nonetheless, these are relative risks calculated against a very small baseline. Results from this analysis indicate that only very few children are affected by changes in PMRs, at least over a 2 ½ year period (we are unable to conduct this analysis over longer periods or estimate cumulative effects). Whilst some groups are more at risk, which may induce some bias in results, the extent of such bias is likely no more than minimal overall.

Share this: