Maximising the value of administrative data
On Thursday 4 July 2019, experts from government departments, public bodies, NGOs and academia came together to discuss how to maximise the value of administrative data at a roundtable organised by ADR UK (Administrative Data Research UK) and the Institute for Government.
The event began with four opening presentations which highlighted good practice in making use of administrative data. A roundtable discussion followed which covered the risks and benefits of increasing access to administrative data to researchers, how to enable access securely and how advocates can make the case for maximising the value of administrative data to policy makers. The key themes covered during the event are summarised below.
1. Reframing the debate
Often the debate around the use of government data is framed around the ethical risks of increasing access to data. Participants agreed it is vital to acknowledge that all government data is at least potentially identifiable, after personal identifiers have been removed, which means sharing data in the wrong way can lead to breaches of privacy and put vulnerable people at risk. Officials outlined examples of ‘worst possible scenarios’ when information gets into the wrong hands which data owners must keep in mind. However, these risks must be balanced against the risks of not increasing access to data within safe settings: deaths that could have been prevented; abusive relationships that could have been identified and acted on; lives that could have been improved. As well as considering the risks of sharing data, we should ask: ‘is it ethical not to use data to improve people’s lives when we can?’
2. Providing secure access at scale
One of the key challenges is providing secure access to data at scale. Departments are faced with a large number of requests for data. Often researchers don’t know exactly what they want from the data and ask for ‘all of it’, which is unfeasible in the case of many of the enormous datasets government holds. To address this, government should make ‘standard’ slices of the data available to researchers. These may be sufficient for 80% of requests, leaving the department with more capacity to focus on the 20% which require bespoke solutions. Researchers should also ask themselves whether they need individual-level data or could use data aggregated up to a local area level. In addition, departments could do more to inform researchers about what data they have: many don’t even have inventories of the data they hold.
3. Understanding what is in the data
Many administrative datasets have vast numbers of variables and there is a challenge in identifying the officials who understand how the data was originally entered and thus what information is really conveyed. It is, however, crucial that researchers understand this to ensure they use the data appropriately. Data collected for operational purposes may not always capture the information that is of most interest for policy research but, provided researchers understand the nuances and limitations, they can often find ways around these using statistical methods, though the right approach may be different for each research question. Understanding and documenting what the administrative data contains is a challenge but vital to ensure that the data is used in a sensible way.
4. Linking datasets
As one of the participants highlighted, one challenge in linking administrative data in the UK is that UK residents do not have any unique identifier. National Insurance numbers, for example, are only used for tax and benefit records, while health records are typically identified by a person’s NHS number. Furthermore, some research questions require identifying people who are in the same family, whereas many administrative datasets do not identify these family links. Therefore, one challenge in linking administrative data is identifying the same individuals in each dataset and how individuals relate to one another. There are several different possible approaches to overcoming this challenge. For example, for some research questions, it may be sufficient to aggregate data within a local area, while for other applications it may be important to precisely identify the same individuals or people in the same household or family. It will be important to ensure that lessons are shared about the most successful approaches to linking different datasets but also that there is scope for matching in different ways if a new question requires a different approach. Depending on the research question, different researchers may have different appetites for better/poorer quality data matches. It was suggested that linked datasets should contain information on the quality of the match in order to allow individual researchers to select their own sub-sample.
5. Learning from best practice
There are a large number of barriers to increasing data access, including around legal frameworks, political opposition and resourcing challenges. But much can be learned from successful examples. One major government programme managed to secure access to data from multiple key departments and agencies, but only after two years of work, during which time they were the heaviest users of the department’s legal team. The work was possible because they secured buy-in from senior colleagues early on, and carried out detailed assessments to ensure the data was of sufficient quality throughout, supported by external academic experts. Throughout, it needed patience to surmount legal and data protection hurdles. However, the experience made their next administrative data project much quicker, and other departments and agencies could learn from the experience. Central government could also learn from the devolved administrations, which have made faster progress in some areas.
6. Persuading ministers
Ministers are often nervous about increasing access to data because of concerns around data security and interpretation. Officials and researchers have to be aware of and receptive to these concerns when they attempt to secure political buy-in. But they should also make the case that using accredited safe settings to widen data access can de-risk data sharing. Also, identifying if a policy hasn’t worked will enable the department to adapt its approach and achieve its goals more quickly. Linking administrative data could be the quickest way of enabling officials to provide advice on the policies that will work. In New Zealand, officials won cross-party support for an integrated data system by demonstrating how the investment would improve prevention, enabling them to appeal to the centre left in social justice terms and the centre right in efficiency terms.
7. Clarifying legal constraints
Officials have to interpret a range of laws relating to data access, such as the Digital Economy Act and the General Data Protection Regulation (GDPR), as well as legislation specific to individual departments. However, there is often confusion about how these should be interpreted and applied. For instance, the Health and Social Care Act has been interpreted differently in England and Wales, which has led to difficulties in accessing health data in England for purposes not readily categorised as medical research. Local authorities are quite risk-averse in light of GDPR, even though in most cases the only change is the need to communicate what they are doing with people’s data. There is a case for better training around data access legislation, given a lack of understanding in this area is often a blocker for delivering research and insights using administrative data. Some organisations have invested a lot of time and resources into understanding the legal challenges, but these insights need to be shared and training made more widely available.
8. Educating researchers
Many researchers also feel that they don’t properly understand the risks involved in using administrative data, which affects the way they talk to officials and the way they use data if they have access to it. They need training on these risks, but at the moment such training is patchy and it’s not clear who’s responsible for providing it.
9. What next?
Advocates of using administrative data better often have positive conversations but they need to ensure they are not just conversations and instead build collaboration into the way they work. ADR UK can help with that. As a group of practitioners, attendees agreed they needed to work out who it is they need to persuade, what the most effective strategies for communicating are, and how they can share lessons more systematically.