New data cleaning code available for users of children’s social care datasets

Categories: ADR England, Office for National Statistics, Children, young people & education, Health & wellbeing

17 October 2024

New R code for is available for cleaning the children’s social care (CSC) datasets included in the National Pupil Database. This code was developed by the ECHILD (Education and Child Health Insights from Linked Data) team. It is freely available to access via GitHub.

Data on children receiving social care services primarily comes from two national datasets: the Child in Need census and the Children Looked After return. These CSC records are included in multiple ADR UK flagship datasets, including ECHILD and Growing Up in England.

As with any large administrative dataset, the CSC data needs to be cleaned to remove idiosyncrasies, such as duplicate information and conflicting records. To address this challenge, the ECHILD team have created some R code to help expedite the data cleaning process for the CSC datasets. This resource was produced for all CSC data users, and the ECHILD team encourage users to engage with them to suggest improvements to the code for the benefit of all.

Disclaimer: This code is based on the ECHILD team’s experience using the data and, while it accounts for many users’ potential needs, it may not meet the data cleaning requirements of all users. Please review the related documentation on GitHub before applying the code.

To learn more about this code and the impact it has achieved so far, read the full impact case study. You can find out more about code sharing on the ADR UK Learning Hub.

For any queries about this code, please contact Matthew Jay at matthew.jay@ucl.ac.uk.

What would you like to look for?

New data cleaning code available for users of children’s social care datasets

Share this: