• 0
Sign in to follow this  
Followers 0

How Many Tickets? (Part 2)

Question

Posted · Report post

Here is a modified version of Bonanova's puzzle

Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box.

Your tickets have numbers p1, p2, p3, p4 and p5 on them.

You don't know how many tickets are in the box, but you are asked to estimate how many there are.

What estimate of n has the highest likelihood of being correct?


0

Share this post


Link to post
Share on other sites

6 answers to this question

  • 0

Posted · Report post

Here is a modified version of Bonanova's puzzle

Suppose n tickets numbered 1-n are in a box, and you draw five of them without putting them back in the box.

Your tickets have numbers p1, p2, p3, p4 and p5 on them.

You don't know how many tickets are in the box, but you are asked to estimate how many there are.

What estimate of n has the highest likelihood of being correct?

Here's an answer

We wish to find P( N | p1, p2, p3, p4, p5). Let's assume that we have a prior distribution P(N). Also, the probabilities of the obtained number given N is P( p1, p2, p3, p4, p5 | N ) = 1/( NC5). So, here's the Bayesian derivations

post-14842-0-19129300-1363116895_thumb.p

So, if we have P(N), the derivation above is straightforward.

We are not given P(N), so let's assume that N is uniformly distributed between max( p1, p2, p3, p4, p5 ) and infinity. Unlike the previous bonanova puzzle, under this improper prior assumption, the denominator term in the equation above converges. Then P( N | p1, p2, p3, p4, p5) is proportional to 1/( NC5), which implies that we should pick N = max( p1, p2, p3, p4, p5 ).

0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

the greatest of p1,p2,p3,p4 and p5 is the best guess from what i have learned from the last puzzle.

0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle

0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle

You're right

I'm convinced that the answer, as you say, is not the same as bonanova's answer. My previous answer is incorrect, but I have yet to locate the faulty reasoning. Back to the drawing board.

0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

I wouldn't post this puzzle if the answer was the same as the original Bonanova's puzzle

After some more thoughts about this problem, I think I found my source of confusion and am ready to take another stab at the answer

I think the key here is that there are several different estimates of N, depending on what kind of property you want your estimate to have. Under the requirement that our estimate of N has "the highest likelihood of being correct", then I think the previous answer in post 2 (N = max{ p1, p2, p3, p4, p5} ) is still correct.

Perhaps an example would make it clear the different types of estimates one can have. Suppose that we have a coin labelled 0 and 1 on the two sides. The coin is biased such that it would land on the 1 face 3/4 of the time. If we want the estimate of the next flip that has the highest likelihood of being right, we would choose 1. However, if we want an estimate of the flip, x, that minimizes the squared error between the flip value and x, we would choose x = .75.

I was originally convinced that my previous Bayesian formulation is incorrect, but was unable to find the error in the reasoning. I tried re-deriving the Bayesian answer by conditioning on the summary statistics max{ p1, p2, p3, p4, p5} instead of the set { p1, p2, p3, p4, p5}, and I got the same answer as before. So, apparently that Bayesian formulation is correct.

It then occurred to me that we might be thinking of different estimator for N as the solution. From earlier, we found that

P( N | { p1, p2, p3, p4, p5} ) is proportional to 1/NC5 for N between max{ p1, p2, p3, p4, p5} and infinity.

Then it makes sense that the N with the highest likelihood of being correct is N = max{ p1, p2, p3, p4, p5}. This is known as the maximum a posteriori estimate, so

Nmax_posteriori = max{ p1, p2, p3, p4, p5}

However, if the OP had asked for the estimate of N such that we minimize the squared difference between N and our estimate (see minimum mean error estimate), then the answer would have been the posterior mean of the function P( N | { p1, p2, p3, p4, p5} )

Nmin_squared_error = sum( P( N | { p1, p2, p3, p4, p5} ) * N ) for N from max{ p1, p2, p3, p4, p5} and infinity.

For instance, let's look at a concrete examle. Let's say we sampled 5 points, and they are (50, 14, 26, 4, 24). Then,

Nmax_posteriori = max{50, 14, 26, 4, 24} = 50

Nmin_squared_error = sum( P( N | { p1, p2, p3, p4, p5} ) * N ) to infinity = 65.32685. (Note the non-integer nature of this estimate. This guarantees that this guess of N will never be precisely correct since N is an integer)

For some posterior distribution (e.g., gaussian), the Nmax_posteriori and Nmin_squared_error are the same. But apparently, they are different in this situation. The requirements on the estimate of N in the OP ("the highest likelihood of being correct") points towards the maximum a posteriori estimate as the solution, though.

0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

I agree with Bushindo's and Krishna's answers on this one. You might be tempted to say that the most likely value for n is twice the mean of the five numbers. It might make intuitive sense that including data from more numbers than just the largest one should give you a better estimate of the true underlying distribution than just considering the largest number. And I think that would be the correct answer if you were JUST given the average of the five numbers. But I don't think that applies if you're given a particular specific set of numbers.

As a very small example, suppose you drew three numbers {2, 3, 4} and need to decide whether to say n is more likely to be 4 or 5. There are 1/4 ways to draw that distribution from a set of 4, and 1/10 ways to draw it from a set of 5. If n=4 or n=5 would have been equally likely in the absence of any information about which numbers were drawn, then given that distribution, I think you would have to say that n=4 is now more than twice as likely as n=5.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Recently Browsing   0 members

    No registered users viewing this page.