WDS, or WDS? Or WDS?: the challenges and benefits of research publication tracking

Written by Alex Hulkes 17 September 2020

As someone who works with data (in ESRC, where I lead our Insights team as part of the collective UKRI analytical effort) I can very much get behind the aims of ADR UK.

ADR UK is a significant and high-profile initiative, supported by UKRI through ESRC with money from the National Productivity Investment Fund. This means that we as a funder need to understand the overall results of the investment and its benefits to the UK. There will of course be an extensive and rigorous evaluation of the programme, but in the meantime I’d like to share a brief analysis which uses publications gathered from the Dimensions bibliometric database to paint a picture of who is using administrative data of the kind accessed through ADR UK, and what they are doing with it.

I say paint a picture because it is currently practically impossible to identify reliably through bibliometric means all research outputs that use named data sources (I should emphasise that this is definitely not a problem with Dimensions data specifically.) There are too many ways in which a publication which does use administrative data might not be identified in a bibliometric database as doing so, and even more ways in which something that’s nothing to do with administrative data might be picked up in error.

For example, the Welsh Demographic Service can easily be searched for in the titles or abstracts of journal articles, and any publications which cite it should be pretty visible. But what if an author, quite reasonably, abbreviates their data source to the WDS? That’s fine: just search for WDS too. I did that. But be prepared – as I wasn’t until I had to – to have to remove lots of publications about ‘White Dwarf Stars’ and ‘Water Distribution Systems’ from your dataset before you analyse it.

This fuzziness places limits on how complete the data, from any source, can ever be and therefore how detailed and reliable the description of the research landscape can be. That’s not to say that it would be impossible to do a more rigorous job of this analysis, but that was not the aim. I kept it deliberately short and sweet: illustrative rather than definitive.

Even with its limitations, the analysis seems to have been worthwhile, even if the resolution is limited and the focus is likely to be a little off. The publications identified suggest that there is a large and vibrant community of researchers from a range of backgrounds using administrative data. There is plenty of international collaboration and a strong focus on health issues, which suggests a good potential return to the taxpayers who ultimately fund much of the work. And it looks like the data is of much more than purely academic interest.

It would be interesting in future to track these features of the landscape, and perhaps to pin them down with a bit more rigour in an evaluation proper. If you’re a researcher using an ADR UK dataset, you can help to evidence the benefits of what you do by making sure that you mention that fact in a clear and consistent way*. Ideally one that doesn’t introduce too much astronomy into the picture.

*ADR UK will be offering best practice guidance to researchers on how to do this soon, potentially involving the use of Digital Object Identifiers (DOIs) – watch this space.

Read the full report.

