k-man 26 Report post Posted March 12, 2013 Here is a modified version of Bonanova's puzzle Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box. Your tickets have numbers p_{1}, p_{2}, p_{3}, p_{4} and p_{5} on them. You don't know how many tickets are in the box, but you are asked to estimate how many there are. What estimate of n has the highest likelihood of being correct? Quote Share this post Link to post Share on other sites
0 bushindo 14 Report post Posted March 12, 2013 Here is a modified version of Bonanova's puzzle Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box. Your tickets have numbers p_{1}, p_{2}, p_{3}, p_{4} and p_{5} on them. You don't know how many tickets are in the box, but you are asked to estimate how many there are. What estimate of n has the highest likelihood of being correct? Here's an answer We wish to find P( N | p_{1}, p_{2}, p_{3}, p_{4}, p_{5}). Let's assume that we have a prior distribution P(N). Also, the probabilities of the obtained number given N is P( p_{1}, p_{2}, p_{3}, p_{4}, p_{5} | N ) = 1/( _{N}C_{5}). So, here's the Bayesian derivations So, if we have P(N), the derivation above is straightforward. We are not given P(N), so let's assume that N is uniformly distributed between max( p_{1}, p_{2}, p_{3}, p_{4}, p_{5 }) and infinity. Unlike the previous bonanova puzzle, under this improper prior assumption, the denominator term in the equation above converges. Then P( N | p_{1}, p_{2}, p_{3}, p_{4}, p_{5}) is proportional to 1/( _{N}C_{5}), which implies that we should pick N = max( p_{1}, p_{2}, p_{3}, p_{4}, p_{5 }). Quote Share this post Link to post Share on other sites
0 Krishna Kutty 2 Report post Posted March 13, 2013 the greatest of p1,p2,p3,p4 and p5 is the best guess from what i have learned from the last puzzle. Quote Share this post Link to post Share on other sites
0 k-man 26 Report post Posted March 13, 2013 I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle Quote Share this post Link to post Share on other sites
0 bushindo 14 Report post Posted March 13, 2013 I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle You're right I'm convinced that the answer, as you say, is not the same as bonanova's answer. My previous answer is incorrect, but I have yet to locate the faulty reasoning. Back to the drawing board. Quote Share this post Link to post Share on other sites
0 bushindo 14 Report post Posted March 17, 2013 I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle After some more thoughts about this problem, I think I found my source of confusion and am ready to take another stab at the answer I think the key here is that there are several different estimates of N, depending on what kind of property you want your estimate to have. Under the requirement that our estimate of N has "the highest likelihood of being correct", then I think the previous answer in post 2 (N = max{ p1, p2, p3, p4, p5} ) is still correct. Perhaps an example would make it clear the different types of estimates one can have. Suppose that we have a coin labelled 0 and 1 on the two sides. The coin is biased such that it would land on the 1 face 3/4 of the time. If we want the estimate of the next flip that has the highest likelihood of being right, we would choose 1. However, if we want an estimate of the flip, x, that minimizes the squared error between the flip value and x, we would choose x = .75. I was originally convinced that my previous Bayesian formulation is incorrect, but was unable to find the error in the reasoning. I tried re-deriving the Bayesian answer by conditioning on the summary statistics max{ p1, p2, p3, p4, p5} instead of the set { p1, p2, p3, p4, p5}, and I got the same answer as before. So, apparently that Bayesian formulation is correct. It then occurred to me that we might be thinking of different estimator for N as the solution. From earlier, we found that P( N | { p1, p2, p3, p4, p5} ) is proportional to 1/_{N}C_{5} for N between max{ p1, p2, p3, p4, p5} and infinity. Then it makes sense that the N with the highest likelihood of being correct is N = max{ p1, p2, p3, p4, p5}. This is known as the maximum a posteriori estimate, so N_{max_posteriori} = max{ p1, p2, p3, p4, p5} However, if the OP had asked for the estimate of N such that we minimize the squared difference between N and our estimate (see minimum mean error estimate), then the answer would have been the posterior mean of the function P( N | { p1, p2, p3, p4, p5} ) N_{min_squared_error} = sum( P( N | { p1, p2, p3, p4, p5} ) * N ) for N from max{ p1, p2, p3, p4, p5} and infinity. For instance, let's look at a concrete examle. Let's say we sampled 5 points, and they are (50, 14, 26, 4, 24). Then, N_{max_posteriori} = max{50, 14, 26, 4, 24} = 50 N_{min_squared_error} = sum( P( N | { p1, p2, p3, p4, p5} ) * N ) to infinity = 65.32685. (Note the non-integer nature of this estimate. This guarantees that this guess of N will never be precisely correct since N is an integer) For some posterior distribution (e.g., gaussian), the N_{max_posteriori} and N_{min_squared_error} are the same. But apparently, they are different in this situation. The requirements on the estimate of N in the OP ("the highest likelihood of being correct") points towards the maximum a posteriori estimate as the solution, though. Quote Share this post Link to post Share on other sites
0 plasmid 41 Report post Posted March 17, 2013 I agree with Bushindo's and Krishna's answers on this one. You might be tempted to say that the most likely value for n is twice the mean of the five numbers. It might make intuitive sense that including data from more numbers than just the largest one should give you a better estimate of the true underlying distribution than just considering the largest number. And I think that would be the correct answer if you were JUST given the average of the five numbers. But I don't think that applies if you're given a particular specific set of numbers.As a very small example, suppose you drew three numbers {2, 3, 4} and need to decide whether to say n is more likely to be 4 or 5. There are 1/4 ways to draw that distribution from a set of 4, and 1/10 ways to draw it from a set of 5. If n=4 or n=5 would have been equally likely in the absence of any information about which numbers were drawn, then given that distribution, I think you would have to say that n=4 is now more than twice as likely as n=5. Quote Share this post Link to post Share on other sites
Here is a modified version of Bonanova's puzzle
Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box.
Your tickets have numbers p_{1}, p_{2}, p_{3}, p_{4} and p_{5} on them.
You don't know how many tickets are in the box, but you are asked to estimate how many there are.
What estimate of n has the highest likelihood of being correct?
Share this post
Link to post
Share on other sites