Gestational age at birth, chronic conditions, and school outcomes of children born in England
7 August 2023
Authors: Dr Nicolas Libuy, Ruth Gilbert, Louise McGrath-Lone, Ruth Blackburn, David Etoori, and Katie Harron (University College London)
Date: April 2022
A research study using secure data has found that children born even a few weeks too early are less likely to achieve expected levels of attainment than those born at 40 weeks. They are also more likely to have special educational needs. However, chronic conditions in school-aged children contribute more to special educational needs and low academic attainment than preterm birth.
This research highlights how additional support to improve school readiness prior to school entry could be targeted at groups at high risk of low academic attainment, based on early health indicators shown to influence later outcomes such as chronic conditions and preterm birth. The research informed NHS Digital strategies on linking health and education data, and led to the ECHILD project.
This research tested the feasibility of linking data from health and education administrative records in England, using a subset of the Education and Child Health Insights from Linked Data (ECHILD) - database. This involved assessing the quality of data, ensuring consistency between sources, checking for linkage bias, and developing optimal strategies for linking large administrative datasets. The research team also studied the association between gestational age and academic attainment for children and young people with underlying chronic conditions.
This research project was supported by ADR UK and the Administrative Data Research Centre for England. It was also supported by the National Institute for Health Research, Great Ormond Street Hospital Biomedical Research Centre, Health Data Research UK (funded by the Medical Research Council and eight other funders), Wellcome Trust, and the UK Research and Innovation Innovation Fellowship (also funded by the Medical Research Council).
Hear Dr Nicolas Libuy discuss this research as part of the ONS Research Excellence series on Thursday 31 August, 11:00-12:00. Register now.
This project was the first to evaluate linkages between health and education data in England. It used the following datasets:
- NHS Personal Demographics Service (PDS): A national electronic database curated by NHS Digital that contains demographic information - including sex, name, and address - for all individuals in England with an NHS number
- Hospital Episode Statistics: An episode-level administrative database that covers all admissions to NHS hospitals in England, as well as all attendances in accident and emergency (from 2007/8) and outpatient appointments (from 2003/4)
- National Pupil Database: A database curated by the Department for Education (DfE) that contains pupil-level information on all children and young people attending state-funded schools in England. It captures information on attainment tests, absences, exclusions, and alternative provisions
- ONS death registration records: The data included around 2.2 million children in England born between 1 September and 31 August in the academic years 1990/91, 1996/97, 1999/00, and 2004/5
The linkage between the National Pupil Database and Hospital Episode Statistics was carried out by NHS Digital. The de-identified linked data was then transferred to the ONS Secure Research Service by the Department for Education and NHS Digital, enabling the researchers to analyse the data within the ONS Secure Research Service.
DOI (Digital Object Identifier):Office for National Statistics, released 22 August 2022, ONS SRS Metadata Catalogue, dataset, Death Registrations Finalised - England and Wales, 10.57906/jygg-zn40
The linkage between health and educational records in England was a collaboration between University College London (UCL) researchers, the Department for Education, NHS Digital, and the ONS. The linkage between these databases can only be done using confidential, personal identifiers such as full names, postcodes, date of birth, and sex - creating technical and governance challenges. In the first part of the project, the UCL research team worked with NHS Digital and the Department for Education to develop a novel multi-step deterministic linkage algorithm. This algorithm linked for the first time longitudinal records from state schools to records in the NHS Personal Demographic Service and Hospital Episode Statistics databases. In doing so, the study evaluates the feasibility and success of linkages between health and education data, describes the linkage process, and evaluates the quality of linkages.
Linkage quality was evaluated by calculating the distribution of pupils linked at each step of the algorithm according to region, ethnic group, decile of deprivation (measured by the Income Deprivation Affecting Children Index), and cohort year. The researchers calculated the overall linkage rate as the percentage of pupils in the National Pupil Database who were linked to any Hospital Episode Statistics record. They evaluated potential bias resulting from missed matches by comparing the characteristics of pupils in the National Pupil Database who were linked to Hospital Episode Statistics records with pupils who were not.
Following the creation of the linked database, in the second part of the study, the researchers focused on children born in England between 1 September 2004 and 31 August 2005. This was to generate evidence on child development, measured through school attainment and provision for SEN across the spectrum of gestational age. This included children born early term and post-term, and with and without chronic health conditions. The research team evaluated school attainment using state examinations for Key Stage 1 (age 7) and Key Stage 2 (age 11), and any SEN by age 11. The team stratified analyses (separated the data into groups) by chronic health conditions up to age 2 and size relative to gestation and calculated population attributable fractions. These represent the proportion of low academic attainment (or SEN) in the whole population, which can be attributed to exposure (in this case, preterm birth) if it’s assumed that preterm birth causes these outcomes.
The results from the linkage evaluation show that linkage accuracy improved over time. 92% of school children born in 1990/91 linked to a hospital record, whereas 99% of those born in 2004/5 linked. The researchers found that stringent rules requiring exact links disproportionately included children of white ethnic backgrounds, whereas more relaxed rules at later steps linked more children from other ethnic groups. Ethnic minorities and more deprived school children were less likely to link to a hospital record. Bias due to linkage errors could lead to an underestimation of the health needs of disadvantaged groups. These findings are relevant to users of any type of linked data where there is no unique, high-quality identifier for linkage. This study shows the value of transparent reporting of linkage errors, and highlights the potential to use missing data methods to minimise linkage bias.
In the second part of the study, there is evidence that chronic conditions in school-aged children contribute more to special educational needs and low academic attainment than preterm birth alone.
However, the prevalence of chronic conditions increased with lower gestational age at birth: 6.1% of children born at 40 weeks had a chronic condition compared with 38.8% of those born before 32 weeks. Children born preterm were still less likely to achieve expected levels of attainment than those born at 40 weeks, and more likely to have SEN:
- The percentage of children not achieving the expected level at Key Stage 1 increased from 7.6% at 41 weeks of gestation to 50% at 24 weeks.
- A similar pattern was seen at Key Stage 2.
- Special educational needs provision ranged from 29% at 41 weeks of gestation to 82.6% at 24 weeks.
In terms of the risk of low academic attainment related to preterm birth, the most at-risk group for not achieving expected levels at Key Stage 1 and Key Stage 2 was early term children (37-38 weeks) with chronic conditions.
This project achieved the first at-scale linkage between Hospital Episode Statistics and the National Pupil Database. By linking education and health data in England, this study informed NHS Digital strategies (such as linkage algorithms) for further linkages of this data. This has resulted in the ECHILD project.
Accredited researchers can apply to DfE and NHS Digital to access the ECHILD database. ECHILD captures longitudinal, linked information on more than 14 million children and young people. It is a major new resource that is providing new insights into the inter-relationships between health and education. This has led to research on the impact of Covid-19 on vulnerable groups. It has also led to further work as part of the Department for Education’s Data Improvement Across Government programme, as ECHILD is a core part of this work.
This study enabled the researchers to recommend that additional support prior to school entry should be targeted at high-risk groups, based on early health indicators and socioeconomic factors shown to influence later outcomes.
Publications and reports
- International Journal of Epidemiology article, April 2022: Gestational age at birth, chronic conditions and school outcomes: a population-based data linkage study of children born in England
- International Journal of Epidemiology Data Resource Profile, 2021: Data Resource Profile: The Education and Child Health Insights from Linked Data (ECHILD) Database
- The Lancet Digital Health correspondence, June 2021: Ethnic bias in data linkage
- International Journal of Population Data Science article, September 2021: Linking education and hospital data in England: linkage process and quality
Blogs, news posts, and videos
Presentations and awards
- Early Career Researcher, ONS Research Excellence Awards 2022
- Population Data BC Power of Population Data Science Webinar Series, November 2021: Linking Education and Hospital Data in England
- Institute for Government and ADR UK Data Bites #22, September 2021: Getting things done with data in government
About the ONS Secure Research Service
The ONS Secure Research Service is an accredited trusted research environment, using the Five Safes Framework to provide secure access to de-identified, unpublished data. If you use ONS Secure Research Service data and would like to discuss writing a future case study with us, please ensure you have reported your outputs here: Outputs Reporting Form