I know Im late but I heard about differential privacy this week for the first time. The idea is to introduce noise in the data so as to make it impossible to find out afterwards who answered what to a question, without damaging global results.
Let’s say you make a study about drugs and want to question people about their habits. Instead of just asking them if they do, you ask them to flip a coin. If they get head, they honestly answer the question. If it’s tails, they flip it again. This time, if it’s head they answer yes, and if it’s tails they answer no.
With this method, it’s impossible to know precisely who does drug, because 25% of people answered yes just because of the coin, so any individual can pretend he said yes because of the coin. But on a larger scale, you can still remove non relevant answers and deduce the average number of users in the population.
If 100 people take this test and you get 35 yes, you can assume that 20% of people do drug ((35-25)/50).
It seems that Apple is using this technique a lot, and so does Google and I guess a few others. If I understand the concept of how it works in social studies, it’s stil unclear to me how the tech and data industries leverage it. Not what we really win down the road.