File:Estimating the success of re-identifications in incomplete datasets using generative models.pdf
From Wikimedia Commons, the free media repository
Jump to navigation
Jump to search
Size of this JPG preview of this PDF file: 456 × 600 pixels. Other resolutions: 182 × 240 pixels | 365 × 480 pixels | 584 × 768 pixels | 1,239 × 1,629 pixels.
Original file (1,239 × 1,629 pixels, file size: 7.01 MB, MIME type: application/pdf, 9 pages)
File information
Structured data
Captions
Summary
[edit]DescriptionEstimating the success of re-identifications in incomplete datasets using generative models.pdf |
English: While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model. |
Date | |
Source | https://www.nature.com/articles/s41467-019-10933-3 |
Author | Luc Rocher ORCID: orcid.org/0000-0002-9956-11871,2,3, Julien M. Hendrickx1 & Yves-Alexandre de Montjoye2,3 |
Licensing
[edit]This file is licensed under the Creative Commons Attribution 4.0 International license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 22:21, 21 April 2021 | 1,239 × 1,629, 9 pages (7.01 MB) | Koavf (talk | contribs) | Uploaded a work by Luc Rocher ORCID: orcid.org/0000-0002-9956-11871,2,3, Julien M. Hendrickx1 & Yves-Alexandre de Montjoye2,3 from https://www.nature.com/articles/s41467-019-10933-3 with UploadWizard |
You cannot overwrite this file.
File usage on Commons
The following page uses this file:
Metadata
This file contains additional information such as Exif metadata which may have been added by the digital camera, scanner, or software program used to create or digitize it. If the file has been modified from its original state, some details such as the timestamp may not fully reflect those of the original file. The timestamp is only as accurate as the clock in the camera, and it may be completely wrong.
Short title | Estimating the success of re-identifications in incomplete datasets using generative models |
---|---|
Image title | Nature Communications, doi:10.1038/s41467-019-10933-3 |
Author | Luc Rocher |
Software used | Springer |
Conversion program | iText® 5.3.5 ©2000-2012 1T3XT BVBA (AGPL-version) |
Encrypted | no |
Page size | 595.276 x 782.362 pts |
Version of PDF format | 1.4 |
Hidden category: