'Posterior distribution for randomized responses
Suppose you want to know how many people cheat on their taxes. If you ask them directly, it is likely that some of the cheaters will lie. You can get a more accurate estimate if you ask them indirectly, like this: Ask each person to flip a coin and, without revealing the outcome,
• If they get heads, they report YES.
• If they get tails, they honestly answer the question, “Do you cheat on your taxes?”
If someone says YES, we don’t know whether they actually cheat on their taxes; they might have flipped heads. Knowing this, people might be more willing to answer honestly. Suppose you survey 100 people this way and get 80 YESes and 20 NOs. Based on this data, what is the posterior distribution for the fraction of people who cheat on their taxes? What is the most likely quantity in the posterior distribution?
==
This is a question from a book Think Bayes, and I cannot figure out how to compute the posterior distribution, hence reaching out for help.
Solution 1:[1]
hypos = np.linspace(0, 1, 100)
prior = Pmf(1, hypos)
likelihood = { 'Y': 0.5+hypos/2, 'N': 1-hypos }
prior.normalize()
prior.plot(label='prior')
dataset = 'Y' * 80 + 'N' * 20
for data in dataset: prior *= likelihood[data]
prior.normalize()
prior.plot(label='80 YES, 20 NO')
decorate(xlabel='Proportion of cheaters', ylabel='PMF')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Serge |