ECHILD, children’s social care and the impact of code sharing
Categories: ADR England, Office for National Statistics, Children & young people, Health & wellbeing, Impact, Potential
17 October 2024
Author: Dr Matthew Jay (Senior Data Scientist, ECHILD)
Date: September 2024
A researcher working with ECHILD (Education and Child Health Insights from Linked Data) has developed and shared code designed to clean children’s social care data included in the National Pupil Database. This addresses many of the common cleaning steps that users need, speeding up analysis and increasing the potential of impactful research using this data.
About Children’s Social Care data
There are two national datasets containing information on children receiving Children’s Social Care (CSC) services:
- the Child in Need (CiN) census (which also includes information on referrals to CSC that result in no further action)
- the Children Looked After (CLA) return.
Both datasets are included in the National Pupil Database, which data on pupil and school characteristics, educational outcomes, and social care.
The de-identified ECHILD database links this data to the Hospital Episode Statistics and other health datasets for England. It is therefore possible to use the CSC datasets to examine, longitudinally, both the health and education of children receiving CSC services. ECHILD is available for accredited researchers to apply to access in the Office for National Statistics (ONS) Secure Research Service.
The challenge
A key challenge of the CSC datasets is their inherent complexity. As with all administrative data, which is primarily collected for non-research purposes, a significant amount of work is required to clean the data to ensure it is research-ready and analysis-ready.
Particular characteristics of this data add an extra layer of challenge, especially for new users. For example, children’s records are duplicated in the CiN census if a referral is open in more than one year and, as is common to many datasets, children’s records across time may contain conflicting information. As administrative data users know, this kind of effort is often the hardest and most time-consuming part of working with this type of data.
Addressing this challenge
To address this, researcher Dr Matthew Jay from the ECHILD team has developed R code for cleaning the CSC datasets, which is freely available to access via GitHub, with supporting documentation. This code performs a set of cleaning procedures that many users would need to carry out. For example, the code checks for inconsistent dates, cleans conflicting information across children’s records, and derives year variables. It also derives flags to indicate, for instance, whether a child was found by CSC to be in need and/or placed on a Child Protection Plan.
Users are free to apply this code within the ECHILD data. They are encouraged to adapt it to suit their needs and requirements, suggest improvements and engage in dialogue with the ECHILD team to help improve the code as a shared resource for all CSC data users. As ECHILD holds a complete copy of the CSC data, including children without pupil matching reference numbers, the code can also be applied to projects outside of ECHILD that just use the CSC data, or that use CSC data linked to other sources.
An important caveat: it is impossible to clean and generate a single dataset that will be suitable for all analyses. This code performs some operations that the team envisaged most users will want to perform, but others may legitimately disagree with some decisions made. For example, where information across a child’s records differs, the code takes the modal value rather than the last recorded value – really an empirical question that others may wish to study. Some users may need to carry out additional cleaning. For this reason, users are strongly encouraged to carefully study the code and its documentation and to adapt it if they feel changes are necessary.
For any queries about this code, please contact Matthew Jay at matthew.jay@ucl.ac.uk.
Impact
The CSC data cleaning code, developed and shared by the ECHILD team, has supported the usability of this data, increasing the potential of future policy-relevant research. The code has already aided analysis within a number of projects. Researchers in the NIHR (National Institute for Health and Care Research) Children and Families’ Policy Research Unit at UCL have used this code in their work to assess mortality and hospital trajectories of adolescents who received special educational needs (SEN) support or had experiences of social care while in school. The findings of this work were presented at the International Population Data Linkage network Conference 2024 in Chicago.
A key aim of linked datasets, such as ECHILD, and code sharing is to facilitate analysis at pace. Having removed the need for bespoke data cleaning from scratch on a per project basis has enabled faster analysis, saving researchers valuable time. Some of these researchers have shared testimonies:
- “It has saved me significant time developing a pipeline from scratch” – Dr Isobel Ward, Principal Statistician (ONS) and Researcher (UCL)
- “Access to this information saved me hours of work (re)processing a database, with which I was unfamiliar” – German Pulido, PhD Student (UCL)
- “This cleaned dataset is of high quality and very easy to follow, significantly reducing the time and effort required for me” – Steffi Shi, Research Assistant (UCL)
This code has also ensured validity of the data is maintained, by limiting researcher errors when cleaning the data, and improved consistency across analyses with these datasets:
- “ (the) code has ensured that I am identifying a group of children with CSC involvement in a manner that is consistent with the wider research group” – Dr Isobel Ward, Principal Statistician (ONS) and Researcher (UCL)
- “Having the data cleaned by an expert is beneficial because it increases consistency of the analysis across different projects” – German Pulido, PhD Student (UCL)
How to access ECHILD data
You can find out more about the ECHILD dataset in the ADR UK Data Catalogue.
Researchers seeking to securely access this dataset must first become accredited researchers under the Digital Economy Act. Researchers should apply for accreditation through the ONS Research Accreditation Service.
Find out more about how to access the dataset on the ECHILD website.
University College London, released 30 July 2024, ONS SRS Metadata Catalogue, dataset, Education and Child Health Insights from Linked Data - England, https://doi.org/10.57906/j1gr-gm30
Code sharing links
- ECHILD data cleaning code share
- ONS Secure Research Service Metadata Catalogue
- ADR UK Learning Hub: Available shared code
About the ONS Secure Research Service
The ONS Secure Research Service is an accredited trusted research environment, using the Five Safes Framework to provide secure access to de-identified, unpublished data.
If you use ONS Secure Research Service data and would like to discuss writing a future case study with us, please get in touch at IDS.Impact@ons.gov.uk. Please also report any outputs here: Outputs Reporting Form.