Understanding the balance between utility, privacy and the uncertainty associated with synthetic data sets

Sensitive datasets are often too inaccessible to make the most effective use of them (for example healthcare or census micro-data). Synthetic data – artificially generated data used to replicate the statistical components of real-world data but without any identifiable information – offers an altetnative. However, synthetic data is poorly understood in terms of how well it preserves the privacy of individuals on which the synthesis is based, and also of its utility (i.e. how representative of the underlying population the data are).

Last Modified: 5/14/2023
Added on: 6/22/2021

