Can we use linked administrative data to identify social disadvantage?
Written by 6 February 2020
Dr Serena Pattaro from the Scottish Centre for Administrative Data (SCADR), part of ADR Scotland, shares her reflections on the value of longitudinal administrative data as a reliable alternative or complement to survey data.
My research, in collaboration with Nick Bailey and Chris Dibben and recently published in Social Indicators Research, illustrates the huge potential of using longitudinal linked administrative data to identify social disadvantage.
This research started off with a substantive question concerning the link between employment and poverty. We know that paid work is central to poverty risks but it is less known how much work is needed to help people exit poverty. It is also unclear how important stability of employment is for lifting people out of poverty. We wanted to know whether we can capture labour market histories through administrative data in ways that remain meaningful for identifying poverty risks.
Related to the substantive question, we also had a methodological question. So, for example, we know that administrative data from the benefits and tax systems are widely used in the UK to construct indicators of social disadvantage. Key examples are Free School Meals and Indices of Multiple Deprivation that are used both by researchers but also by policy makers to identify and target resources to pupils, schools and communities in the most deprived areas or circumstances. These indicators are often based on cross-sectional measures, which are measured at one point in time. In our research, we asked: can we do any better than this? Can we identify social disadvantage using longitudinal (measured over time) rather than cross-sectional measures?
What I did
I reconstructed individual labour market histories using routinely collected administrative data on welfare benefits from DWP and annual earnings from employment from HMRC. These were linked to data from a large national survey, the Poverty and Social Exclusion Survey UK (PSE-UK) conducted in 2012. I used the survey data to compare and validate both the cross-sectional and longitudinal indicators of social disadvantage from administrative data.
I was lucky to join a team of researchers who were involved with the design and analysis of the PSE-UK. It was relatively straightforward for us to access the linked administrative-survey data, because the survey participants had already provided consent for their survey data to be linked to administrative data. In addition, the data linkage itself was carried out internally by DWP analysts who linked both DWP and HMRC admin records to PSE-UK via their sponsored Family and Resources Survey (FRS) conducted in 2010/11. FRS is an annual survey which is often used to produce official national statistics on income and poverty. In 2010/11 respondents were asked to provide permission to be re-contacted in a follow-up survey (PSE-UK 2012).
What I found
My research shows that the longitudinal measures reconstructed from administrative data match closely survey responses and help to improve predictions of poverty risks, compared to measures based on cross-sectional data. This is the first research using admin data linked to poverty data from the Poverty and Social Exclusion Survey. The results are broadly in line with prior research which was commissioned by DWP. This was conducted by Stephen McKay to evaluate the linkage between FRS 2010/11 data and admin records from DWP and HMRC. The results are reported in his DWP working paper Evaluating approaches to Family Resources Survey data linking in 2012. Our research takes this one step further: we test how the longitudinal admin data measures performed when including these in a model predicting current poverty risks, in contrast to comparable survey measures and cross-sectional admin measures.
Working with limitations of linked administrative data
There are of course some limitations when using administrative data. For example, I had to work with a limited set of welfare benefits, mainly Jobseeker’s Allowance and Income Support, and this only covered paid employment and not self-employment. Another important issue is that unemployment captured by administrative data may be different from unemployment as reported by survey respondents. We need to be aware of these discrepancies when making sense of the results we find from our research.
Despite these limitations, we can still show that there are clear gains when moving from cross-sectional to longitudinal administrative measures and from a single domain - welfare benefits - to two domains where we add earnings in our modelling exercise.
This blog was originally published on the Scottish Centre for Administrative Data Research (SCADR) website.