TDCC-SSH: Synthetic data

Leveraging the potential of sensitive data in SSH research

Synthetic data is data with (more or less) the same properties as an original dataset but without its privacy-sensitive information. By making synthetic data available instead of (or prior to) the actual dataset, scientists gain faster and easier access to confidential data. In this project, two tools for creating synthetic data are used to unlock existing datasets, including datasets archived at DANS.

Work packages

Package Title Description
WP1 Synthetic data generation tools Creating open source software (metasyn and DP-CGANS) to produce privacy-friendly and realistic synthetic data.
WP2 Integrating tools in data repositories Creating a plug-in to allow synthetic data generation in DANS Data Station SSH in the ingest pipeline and/or the user interface.
WP3 Synthetic data use-cases in SSH Creating pilot implementations of synthetic data at various partner institutions.
WP4 Legal and privacy constraints Creating a whitepaper or publication on how to overcome implementation issues with synthetic versions of sensitive data.
WP5 Outreach and project management Organising events, creating teaching materials, and ensuring our project runs smoothly.

The full proposal is openly available: doi:10.5281/zenodo.15697035
Follow our progress here: github.com/tdcc-synthetic-data

Team

profile picture

Erik-Jan van Kesteren

Principal investigator

Utrecht University

WP1 WP3 WP5

profile picture

Freek Dijkstra

Project manager

SURF

WP4 WP5

profile picture

Chang Sun

Assistant professor

Maastricht University

WP1 WP3

profile picture

Ricarda Braukmann

Data Station Manager

DANS

WP2

profile picture

Team member (TBD)

Researcher

Utrecht University

WP4 WP5

profile picture

Raoul Schram

Research engineer

Utrecht University

WP1 WP3

profile picture

Tim Kok

Project manager

SURF

WP4 WP5

profile picture

André Castro

Team Lead Integration Systems Team

DANS

WP2

profile picture

Alessandra Polimeno

Software Developer

DANS

WP2

profile picture

René van Horik

Research Data Management Specialist

DANS

WP2

Funding

This project was funded by the TDCC-SSH Challenge call 2024 under grant ID DOI:10.61686/KMOUW39663.

TDCC TDCC

Partners

Utrecht University Maastricht University DANS SURF youth CBS Firmbackbone duo kb odissei