Synthetic data: how a shared language will help advance public good research
16 October 2025
New peer-reviewed article from Emily Oliver - ADR UK synthetic data lead - and academic partners, explores a shared language for synthetic data use.
Data that can be highly valuable for public good research can also sometimes be extremely sensitive. Synthetic data is made to mimic real data, without containing any information about real individuals, making it useful for researchers planning research or learning how to use sensitive data.
But currently, no shared standards or governance exist for synthetic data. And efforts to create them are stymied by a lack of agreed terminology with clear definitions.
Reviewing the existing literature
To tackle this, ADR UK teamed up with academic colleagues to search existing literature for definitions that could be more widely adopted. In a peer-reviewed article published this week in the International Journal of Population Data Science (IJPDS), the authors argue that having unsettled definitions of key synthetic data terms poses a significant challenge leading to miscommunication and misunderstandings. This could also lead to slowing down the potential for its routine production and use for publicly beneficial purposes.
Four key terms identified
Through analysis of the existing literature, the team homed in on the use of four key terms, particularly relevant in the context of privacy preservation: synthetic data, utility, utility measures, and fidelity. They suggest definitions for these terms and make recommendations for their future use.
“Getting everyone on the same page when it comes to terminology for synthetic data is an important starting point. From there we can start to think about frameworks for more routine provision, and build trust and confidence amongst providers, users and the public that synthetic data is worthwhile and value for money.”
Bringing more than clarity and building public discource
Defining terminology goes beyond providing clarity. It plays an important role in how technologies are developed and how policy and legal agendas evolve. It also steers the direction of public discourse – particularly around the safeguarding of personal data, a heavily debated topic.
Next steps
Take a look at the full article for further details: A Review of Synthetic Data Terminology for Privacy Preserving Use Cases | International Journal of Population Data Science
The authors welcome opinions from readers in response and encourage further debate on the topic: hub@adruk.org