Inside the 2025 Data Collision Datathon: Using synthetic data for innovation and research capacity building

27 October 2025 Written by Dr Stephanie Lee and Prof Adam Chee

Datacise Open Learning is the ADR Wales Training and Capacity Building programme, dedicated to developing skills, advancing knowledge, and fostering collaboration across the administrative data research community. It is delivered as part of ADR UK.

Data Collision: Where Social Meets Health

Data Collision: Where Social Meets Health brought together a diverse community of researchers, analysts, academics, clinicians, managers, and support professionals from both the social and health domains to explore how data can improve young people’s mental health journeys.

Held on 15 to 16 September 2025 at Cardiff University’s Spark (Social Science Research Park), the event preceded the ADR UK Conference 2025 and marked a significant milestone in collaborative, data-driven learning.

Hosted by ADR Wales and Datacise Open Learning, the two-day datathon united social scientists, data scientists, statisticians, health professionals, and other domain specialists from across the social and policy sectors around a shared goal: How could linked data better support young people as they transition from children’s to adult mental health services?

In this way, the event addressed a real world question with direct relevance to policy and practice.

Connecting minds through data

The 2025 Data Collision Datathon was designed to catalyse innovation through collaboration and learning. Participants worked side by side in a hands-on, peer learning environment, exploring how synthetic data can be used safely to test ideas, strengthen analytical confidence, and build transferable skills.

By connecting people, disciplines, and data, the event embodied the mission of Datacise Open Learning; advancing capacity building through shared experience, responsible experimentation, and the practical use of synthetic data as a powerful tool for learning and discovery.

A safe space to learn and experiment

Unlike traditional research settings, participants worked with synthetic datasets: data that mirrors real-world structures but contains no identifiable information.

The synthetic datasets were developed and delivered through the expertise and technology support of the Secure eResearch Platform (SeRP) team. Their work made it possible for participants to explore complex data linkages safely within a secure, simulated environment, providing a realistic yet risk-free space to learn, experiment, and collaborate.

This innovative approach created a secure environment for experimentation, enabling teams to test analytical methods, explore approaches responsibly, and collaborate freely without the ethical constraints of live data.

“It was good to get your hands dirty with the data - Jupyter was a great platform for learning.” 

Datathon participant

Key highlights: from synthetic data use to real-world application

This datathon marked several important milestones for the UK’s data research community.

  • Synthetic data used for training: The first event in the UK to apply synthetic datasets for research capacity building, allowing participants to explore data safely while gaining practical analytical and governance skills transferable to real-world contexts. The use of synthetic data within a trusted research environment demonstrated how innovation can be achieved ethically and securely.
  • Collaboration and community: Participants from universities, research organisations, health services, and government came together to co-design solutions that bridged the social and health domains.
  • Hands-on learning: The datathon promoted experiential, peer-led learning supported by expert mentorship, creating a dynamic environment where ideas could be tested, refined, and shared.
  • Real-world relevance: Centred on a real policy question about the transition from children’s to adult mental health services, the event connected analytical practice directly to public service improvement.
  • Building lasting capacity: The event provided a repeatable model for future datathons and training programmes, reinforcing ADR Wales’s and Datacise Open Learning’s commitment, within the wider ADR UK strategy, to developing a skilled, confident, and collaborative research workforce.

A community of diverse minds: multidisciplinary research

The Data Collision Datathon convened a truly multidisciplinary community bridging the social and health research domains.

This mix of experience and perspective was central to the event’s success. Working in cross-sector teams, participants brought complementary skills to the table, blending technical expertise with contextual understanding to tackle a shared research challenge.

Guided by mentors from Swansea University, Cardiff University, King’s College London, the University of Glasgow, and the Saw Swee Hock School of Public Health (National University of Singapore), participants were encouraged to think creatively, share knowledge, and challenge assumptions.

The atmosphere was described as “intense, enjoyable, and stretching with no time for procrastination”, capturing the energy and focus that defined the event.

"We leaned on each other’s strengths, listened carefully, and stayed agile."

Datathon participant

The challenge: understanding mental health transitions

In partnership with Social Care Wales, the datathon set a real world research challenge:

How does access to mental health services change for young people as they transition from children’s to adult services, and what impact does this have on their outcomes?

Working with synthetic versions of Welsh administrative datasets, including hospital, GP, education, and social care records, participants explored how data linkage can reveal patterns that inform better policy and practice.

Three teams, three approaches

Each of the three teams brought a unique analytical perspective, showcasing the breadth of approaches within the data community, from classical statistics to modern analytical techniques.

Team A: Tracking Depression Related Health Service Use from Adolescence into Adulthood
Members: Alex Jehu, Catherine Millson, Diana Contreras Mojica, Jen Keating, Jonathan Aron, Louisa Roberts

Team A applied statistical methods such as descriptive analysis and linear regression to examine depression related care transitions. Their approach prioritised clarity, reproducibility, and interpretability, demonstrating that strong statistical reasoning remains fundamental to evidence-based research.

Team B: Primary Care for Mental Health Service Continuation in Looked After Children in Wales
Members: Rabeea’h Aslam, Ioana Filipaș, Angela Kubik, Emel Yorganci, Lowri O’Donovan, Lama Shakhshir

Team B focused on ensuring continuity of primary care support for looked-after children by identifying patterns and key factors influencing service access during the transition to adulthood. Their work stood out for its practical insights, teamwork, and policy relevance, demonstrating how collaborative analysis can generate meaningful understanding for public good.

Team C: Bridging the Gap, An AI-Driven Mental Health Vulnerability Tool
Members: Arturo Lonighi, Caitlyn Donaldson, Ceri Parsons, Linda Kirkpatrick, Nell Warner, Oliver Cumming, Sarah Ledden

Team C developed an innovative prototype called MAD VAT (Multi Agency Dynamic Vulnerability Assessment Tool), a conceptual model designed to identify young people most at risk of falling through the gaps in mental health support. Using machine learning algorithms such as Random Forests and XGBoost, they demonstrated how synthetic data can simulate predictive tools for social good.

Mentors and guidance

Mentorship played a pivotal role in shaping the success of the datathon. Throughout the event, participants received expert guidance from mentors representing leading research, academic, and data institutions.

Drawing on deep expertise in data science, statistics, social research, and health informatics, mentors supported teams in refining their analytical approaches, strengthening methodological rigour, and ensuring responsible use of data.

The mentoring framework encouraged continuous engagement, from early orientation to active feedback rounds, creating an environment where participants could explore ideas freely and gain confidence in applying new techniques.

"All the mentors were fantastic, approachable, knowledgeable, and generous with their time. They kept us focused, inspired, and supported throughout."

Datathon participant 

Mentors included:

  • Assoc. Prof Mengling Feng and Asst. Prof Swapnil Mishra (National University of Singapore)
  • Dr Katy Huxley (Cardiff University)
  • Dr Yijing Li (King’s College London)
  • Prof Adam Chee, Chris Orton, Dr Ting Wang, and Lewis Hotchkiss (Swansea University)

Their collective input ensured the datathon not only advanced technical understanding but also embodied the collaborative spirit that defines Datacise Open Learning.

Judging and recognition

The datathon concluded with an afternoon of presentations and reflection, where teams showcased their findings to a multidisciplinary judging panel comprising representatives from ADR UK, DARE UK, NHS Wales, Social Care Wales, and a member of the public.

Each team was evaluated on criteria including innovation, methodological rigour, collaboration, and relevance to real-world impact.

"It was remarkable to see the diversity of approaches, from robust statistical models to creative data-driven solutions, all developed in under 48 hours."

Judging panellist

After careful deliberation, Team B was named the overall winner for its clear, practical insights into the continuity of mental health care for looked-after children and its strong demonstration of collaborative problem-solving.

Trophies and certificates were presented at the ADR UK Conference 2025 closing ceremony.

The judging process reflected what the datathon stood for: not competition, but community, celebrating how collaboration can transform complex data into meaningful understanding and positive change for society.

Collaboration in action

Beyond coding and analysis, the datathon highlighted the importance of people and partnerships in research.

Participants emphasised the value of collaboration across disciplines, peer learning, and mentorship, and learn from experienced mentors. Many described it as “a stretching yet rewarding experience” that built confidence, improved teamwork, and strengthened communication skills under time pressure.

Why synthetic data matters

The datathon illustrated how synthetic data can transform training and capacity building. This work aligns closely with ADR UK’s focus on promoting secure, ethical, and innovative uses of administrative data to inform better decisions and deliver public benefit.

By mimicking the structure of real datasets, it enables researchers to practise data linkage, experiment with analytical methods, and consider ethical implications, all within a controlled, risk-free environment.

This approach builds the confidence and competence required to handle sensitive information responsibly within real trusted research environments.

"What struck me was both the scientific rigour and innovative use of coding that all the teams were able to bring to their projects, that were all designed to answer a real policy question, even when working with synthetic data."

Dr Emma Gordon, Director of ADR UK

Key outcomes

The Data Collision Datathon achieved lasting impact across learning, innovation, and collaboration. It strengthened partnerships between health and social research communities, showcased how diverse analytical approaches can enhance understanding, and promoted hands-on, experiential learning supported by expert mentorship.

Its inclusive design proved that diversity of expertise drives creativity and problem solving, while its collaborative spirit fostered a supportive community committed to advancing responsible, data-driven research and innovation.

Looking ahead

Plans for Data Collision 2026 are already underway. Building on this success, the datathon model will continue to strengthen the UK’s administrative data research capacity through ADR UK-supported initiatives, promoting responsible innovation, cross sector collaboration, and the ethical use of data for public good.

Acknowledgements

The organisers gratefully acknowledge the support of ADR UK, DARE UK, Social Care Wales, and the judging panel, whose partnership made this event possible.

The development and delivery of the synthetic datasets were made possible through the expertise and technological support of the Secure eResearch Platform (SeRP) team, whose work enabled participants to explore complex data linkages safely within a secure, simulated environment.

Special thanks are also extended to mentors and colleagues from the Saw Swee Hock School of Public Health Singapore, for their contribution to the datathon programme design and delivery, drawing upon their work with the MIT Critical Data Consortium, an initiative of the Massachusetts Institute of Technology (MIT). 

A Post-Datathon Compendium

For a comprehensive account of the datathon, download the full Data Collision Datathon: A Post-Datathon Compendium.

Share this: