The new UK Government wants a National Data Library: a brilliant aspiration, if built on solid foundations
Categories: Blogs, ADR UK Partnership
21 August 2024
In this blog Dr Emma Gordon, Director of ADR UK, responds to the Labour manifesto pledge to “create a National Data Library to bring together existing research programmes and help deliver data-driven public services, whilst maintaining strong safeguards and ensuring all of the public benefit”. Emma builds on a related blog published by Gavin Freeguard last month.
Since 2018, ADR UK has worked closely with UK and devolved governments and academic partners to pioneer secure access to administrative data for research across the UK. This has transformed public sector datasets into valuable research assets that benefit wider society. As such, we now have strong foundations for the UK Government to create a National Data Library. We also now have a much clearer sense of the scale of public good research that could be delivered, if we collectively worked together to plug some data gaps. The UK Government has an important role to play, if we are to achieve this.
Connected journeys through society
For me, I am constantly drawn to understanding how people’s health impacts on other areas of their lives. For example, how does children’s health impact on their educational attainment? And how does the health of people impact on their ability to work?
Across the ADR UK programme, we constantly challenge ourselves to consider what the missed uses of data are. In other words, what research could be carried out, if only we could link dataset x to dataset y, and make that available for research use? For me, I am constantly drawn to understanding how people’s health impacts on other areas of their lives. For example, how does children’s health impact on their educational attainment? And how does the health of people impact on their ability to work?
To give a specific example, working in collaboration with the Ministry of Justice, we have made great strides in opening up access to linkable datasets across the full range of courts data for England and Wales. This means for the first time, we can track cohorts of people through the criminal, family, and civil courts to understand, for example, how legal problems such as homelessness or debt can interact with involvement with the criminal justice system. We can also understand the patterns related to reoffending, including what repeat defendants return to court for, how these offences can become more serious over time, and the relationship between offending and local levels of deprivation.
For the first time, by linking social care, education outcomes and policing data, we are also now able to understand the risk factors between children’s involvement in the care system, their educational attainment over time, and their likelihood of offending.
Now, imagine if we could also link health data in, to understand the interplay between different health conditions and convictions and cautions. If we know a large proportion of the prison population come into the system with diagnosed mental health conditions, could it be that finding a way to better support these people through better access to services and treatment much earlier in their lives, would lower crime rates over time?
More broadly, there is growing evidence that improvements in life expectancy (and healthy life expectancy) are stalling within the UK and health inequalities are widening. Poor health across the population is also contributing to the UK’s sluggish economic growth. While other countries allow access to population-level linked data to facilitate research on the drivers of poor health and impacts on economic growth, this is not possible – yet – in the UK. This missed use of data severely constrains the government’s ability (both at a UK and devolved levels) to design policies to improve population health and economic outcomes and to evaluate the success of these interventions, limiting evidence-informed decision making across all four UK nations.
Laying the groundwork for sustainable data infrastructure
The ADR UK model is to do the work up-front around data governance, cleaning, and linkage, so that de-identified, research-ready, curated datasets can be maintained over time. To tie this back to the National Data Library concept, this means that the latest data is already there in the “library” (or trusted research environment), ready to be accessed. The researchers don’t first need to work with the data owners to create it. Not only does this greatly shorten data access times, but it also means that bodies of knowledge around using these complex datasets can be built up over time. Researchers can share code and derived data concepts, so the researchers that come after can iterate, refine, and build on what has gone before. None of this was possible with the previous “create and destroy” model of accessing these types of datasets, which was hugely inefficient, both for data owners and researchers.
I am very proud of the work of the ADR UK partnership in opening up access to data across all four UK nations ... This means we now have a model for how to do this in each of the four UK nations, which gives us an incredibly strong foundation to build on, to create a federated, four-nation National Data Library.
I am very proud of the work of the ADR UK partnership in opening up access to data across all four UK nations. We have done this iteratively, building in training and capacity building activities and meaningful public engagement every step of the way. This means we now have a model for how to do this in each of the four UK nations, which gives us an incredibly strong foundation to build on, to create a federated, four-nation National Data Library.
However, we know there are gaps we need to fill if we are to truly realise the full potential of our model. Linking health data with administrative data at scale for England is a key one. We now have the precedent of the ECHILD (Education and Child Health Insights from Linked Data) dataset, linking education outcomes data with health data. There is already a growing body of policy-relevant research done by the team that were funded to create it. Now we have been able to open the dataset up for wider research use, this research use will grow even faster.
We now need to build on this with other linked datasets, so we can truly understand the interplay between people’s health and other aspects of their lives. In Scotland and Wales, we already have a vast wealth of health data available for research use. With these nations, it’s about working with UK Government departments to facilitate the linkage of other data (for example, benefits and income) with health data, to help fill in these gaps.
A National Data Library for the public good
Clearly, ADR UK will only ever be one part of a (hopefully) much bigger jigsaw of investments that could make up a future National Data Library. Across UKRI (UK Research & Innovation), there are many more investments that could form other parts of this jigsaw. Add in the suite of successful government-led programmes and data services, and this jigsaw starts to look much more complete. If the UK Government can harness all that is good about investments such as ADR UK and support them to be even better, then I can see the National Data Library could become a reality, for the greater good of society.
Read more about the work of Ministry of Justice Data First or ECHILD project. Sign up to the ADR UK Newsletter to receive updates.