Leveraging the potential of sensitive data in SSH research
Synthetic data is data with (more or less) the same properties as an original dataset but without its privacy-sensitive information. By making synthetic data available instead of (or prior to) the actual dataset, scientists gain faster and easier access to confidential data. In this project, two tools for creating synthetic data are used to unlock existing datasets, including datasets archived at DANS.
| Package | Title | Description |
|---|---|---|
| WP1 | Synthetic data generation tools | Creating open source software (metasyn and DP-CGANS) to produce privacy-friendly and realistic synthetic data. |
| WP2 | Integrating tools in data repositories | Creating a plug-in to allow synthetic data generation in DANS Data Station SSH in the ingest pipeline and/or the user interface. |
| WP3 | Synthetic data use-cases in SSH | Creating pilot implementations of synthetic data at various partner institutions. |
| WP4 | Legal and privacy constraints | Creating a whitepaper or publication on how to overcome implementation issues with synthetic versions of sensitive data. |
| WP5 | Outreach and project management | Organising events, creating teaching materials, and ensuring our project runs smoothly. |
The full proposal is openly available: doi:10.5281/zenodo.15697035
Follow our progress here: github.com/tdcc-synthetic-data
Erik-Jan van Kesteren
Principal investigator
Utrecht University
Freek Dijkstra
Project manager
SURF
Chang Sun
Assistant professor
Maastricht University
Ricarda Braukmann
Data Station Manager
DANS
Team member (TBD)
Researcher
Utrecht University
Raoul Schram
Research engineer
Utrecht University
Tim Kok
Project manager
SURF
André Castro
Team Lead Integration Systems Team
DANS
Alessandra Polimeno
Software Developer
DANS
René van Horik
Research Data Management Specialist
DANS
This project was funded by the TDCC-SSH Challenge call 2024 under grant ID DOI:10.61686/KMOUW39663.