Posted March 12, 2013 Here is a modified version of Bonanova's puzzle Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box. Your tickets have numbers p_{1}, p_{2}, p_{3}, p_{4} and p_{5} on them. You don't know how many tickets are in the box, but you are asked to estimate how many there are. What estimate of n has the highest likelihood of being correct? 0 Share this post Link to post Share on other sites

0 Posted March 12, 2013 Here is a modified version of Bonanova's puzzle Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box. Your tickets have numbers p_{1}, p_{2}, p_{3}, p_{4} and p_{5} on them. You don't know how many tickets are in the box, but you are asked to estimate how many there are. What estimate of n has the highest likelihood of being correct? Here's an answer We wish to find P( N | p_{1}, p_{2}, p_{3}, p_{4}, p_{5}). Let's assume that we have a prior distribution P(N). Also, the probabilities of the obtained number given N is P( p_{1}, p_{2}, p_{3}, p_{4}, p_{5} | N ) = 1/( _{N}C_{5}). So, here's the Bayesian derivations So, if we have P(N), the derivation above is straightforward. We are not given P(N), so let's assume that N is uniformly distributed between max( p_{1}, p_{2}, p_{3}, p_{4}, p_{5 }) and infinity. Unlike the previous bonanova puzzle, under this improper prior assumption, the denominator term in the equation above converges. Then P( N | p_{1}, p_{2}, p_{3}, p_{4}, p_{5}) is proportional to 1/( _{N}C_{5}), which implies that we should pick N = max( p_{1}, p_{2}, p_{3}, p_{4}, p_{5 }). 0 Share this post Link to post Share on other sites

0 Posted March 13, 2013 the greatest of p1,p2,p3,p4 and p5 is the best guess from what i have learned from the last puzzle. 0 Share this post Link to post Share on other sites

0 Posted March 13, 2013 I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle 0 Share this post Link to post Share on other sites

0 Posted March 13, 2013 I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle You're right I'm convinced that the answer, as you say, is not the same as bonanova's answer. My previous answer is incorrect, but I have yet to locate the faulty reasoning. Back to the drawing board. 0 Share this post Link to post Share on other sites

0 Posted March 17, 2013 I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle After some more thoughts about this problem, I think I found my source of confusion and am ready to take another stab at the answer I think the key here is that there are several different estimates of N, depending on what kind of property you want your estimate to have. Under the requirement that our estimate of N has "the highest likelihood of being correct", then I think the previous answer in post 2 (N = max{ p1, p2, p3, p4, p5} ) is still correct. Perhaps an example would make it clear the different types of estimates one can have. Suppose that we have a coin labelled 0 and 1 on the two sides. The coin is biased such that it would land on the 1 face 3/4 of the time. If we want the estimate of the next flip that has the highest likelihood of being right, we would choose 1. However, if we want an estimate of the flip, x, that minimizes the squared error between the flip value and x, we would choose x = .75. I was originally convinced that my previous Bayesian formulation is incorrect, but was unable to find the error in the reasoning. I tried re-deriving the Bayesian answer by conditioning on the summary statistics max{ p1, p2, p3, p4, p5} instead of the set { p1, p2, p3, p4, p5}, and I got the same answer as before. So, apparently that Bayesian formulation is correct. It then occurred to me that we might be thinking of different estimator for N as the solution. From earlier, we found that P( N | { p1, p2, p3, p4, p5} ) is proportional to 1/_{N}C_{5} for N between max{ p1, p2, p3, p4, p5} and infinity. Then it makes sense that the N with the highest likelihood of being correct is N = max{ p1, p2, p3, p4, p5}. This is known as the maximum a posteriori estimate, so N_{max_posteriori} = max{ p1, p2, p3, p4, p5} However, if the OP had asked for the estimate of N such that we minimize the squared difference between N and our estimate (see minimum mean error estimate), then the answer would have been the posterior mean of the function P( N | { p1, p2, p3, p4, p5} ) N_{min_squared_error} = sum( P( N | { p1, p2, p3, p4, p5} ) * N ) for N from max{ p1, p2, p3, p4, p5} and infinity. For instance, let's look at a concrete examle. Let's say we sampled 5 points, and they are (50, 14, 26, 4, 24). Then, N_{max_posteriori} = max{50, 14, 26, 4, 24} = 50 N_{min_squared_error} = sum( P( N | { p1, p2, p3, p4, p5} ) * N ) to infinity = 65.32685. (Note the non-integer nature of this estimate. This guarantees that this guess of N will never be precisely correct since N is an integer) For some posterior distribution (e.g., gaussian), the N_{max_posteriori} and N_{min_squared_error} are the same. But apparently, they are different in this situation. The requirements on the estimate of N in the OP ("the highest likelihood of being correct") points towards the maximum a posteriori estimate as the solution, though. 0 Share this post Link to post Share on other sites

0 Posted March 17, 2013 I agree with Bushindo's and Krishna's answers on this one. You might be tempted to say that the most likely value for n is twice the mean of the five numbers. It might make intuitive sense that including data from more numbers than just the largest one should give you a better estimate of the true underlying distribution than just considering the largest number. And I think that would be the correct answer if you were JUST given the average of the five numbers. But I don't think that applies if you're given a particular specific set of numbers.As a very small example, suppose you drew three numbers {2, 3, 4} and need to decide whether to say n is more likely to be 4 or 5. There are 1/4 ways to draw that distribution from a set of 4, and 1/10 ways to draw it from a set of 5. If n=4 or n=5 would have been equally likely in the absence of any information about which numbers were drawn, then given that distribution, I think you would have to say that n=4 is now more than twice as likely as n=5. 0 Share this post Link to post Share on other sites

Posted

Here is a modified version of Bonanova's puzzle

Suppose

tickets numbered 1-nare in a box, and you drawnfiveof them without putting them back in the box.Your tickets have numbers

on them.p,_{1}p,_{2}p,_{3}pand_{4}p_{5}You don't know how many tickets are in the box, but you are asked to estimate how many there are.

What estimate of

has the highest likelihood of being correct?n## Share this post

## Link to post

## Share on other sites