Distortion Metrics
Introduction
Any privacy treatments on a Dataset inherently remove information from the Source data. However, the specifics of the treatments have a major impact on the utility of the treated data. The Privacy Dynamics anonymizer uses a proprietary process that is designed to minimize the distortion of treated data and therefore maximize utility while achieving privacy targets.
We use Distortion as a proxy for utility. After each treatment, we measure the Distortion of the treated data, and include Distortion metrics on the Dataset Report.
Metric Definitions
- Rows Treated: A row is considered "treated" if any field on that row was changed.
- Cells Treated: The overall share of cells (instances of a field on a record) that were changed.
- Distribution of Values: The profile of the values in the selected field, before and after treatment.
The Impact of k
Since Privacy Dynamics uses a group-based micro-aggregation approach, setting k impacts the Distortion of the Destination dataset. A higher k means larger groups, more privacy, and more distortion. A lower k (down to 2) provides less privacy, but may be appropriate for applications that are particularly sensitive to distortion.