Colored balls in a bag

BMAD · May 6, 2014

There are N balls in the bag. Each one has a distinct color. You draw one ball randomly from the bag, record its color and put it back. Calculate the probability that after T draws you've seen M distinct colors.

(T > 0, M > 0, M <= min(T, N) )

m00li · May 6, 2014

Required probability = selecting T balls from an infinite choice of m colors where atleast 1 ball from each of m colors is present/ selecting T balls from an infinite choice of n colors (not all n colors need be present in the selection)

Calculating Denominator = Selecting T balls from an infinite selection of n color balls:

Assume a string of n-1 0s. These 0s divide the string into n sections. We can fill T 1s in these T sections in ^n+T-1C_T ways. Now, assume each section represents one of the n distinct colors. The number of 1s in each section represents the number of balls of that color. So, this representation gives us the number of ways selecting T balls from a set of n color balls where T<=n

Hence, Denominator = ^n+T-1C_T

Now, Numerator can be arrived at as

first selecting m colors = ⁿC_m ways

then selecting 1 ball from each of these m colors = 1 way

then selecting the remaining T-m balls from an infinite selection of these m color (not all m need be present in the selection) = ^m+(T-m)-1C_T-m (refer denominator) = ^T-1C_T-m

Hence required probability = ⁿC_m*^T-1C_T-m/^n+T-1C_T

bonanova · May 9, 2014

There will be M draws of a different color, each with p_i = (N+1-i)/N for i=1 to M.
Then there will be (T-M) draws of any of the M specified colors, each with p = M/N

Sequence does not matter, so take them in the above order.

First M draws: p₁ = N! / N^M (N-M)!

Last T-M draws: p₂ = (M/N)^T-M

Answer: p = p₁p₂ = N! M^T-M / N^T(N-M)!

bonanova · May 10, 2014

M different colors means: exactly M different colors, not: at least M colors, right?

plasmid · May 10, 2014

Bonanova: using that formula with 3 balls being drawn 3 times...

probability of drawing 1 color

3! 1^3-1 / 3³(3-1)! = 6*1 / 27*2 = 6/54

probability of drawing 2 colors

3! 2^3-2 / 3³(3-2)! = 6*2 / 27*1 = 12/27

probability of drawing 3 colors

3! 3^3-3 / 3³(3-3)! = 6*1 / 27*1 = 6/27

Sum of all three of those is less than 1.

I'm having trouble following m00li's logic. I got lost as soon as he started talking about selecting balls from an infinite choice of colors (which was pretty soon).

m00li · May 11, 2014

Bonanova: using that formula with 3 balls being drawn 3 times...
probability of drawing 1 color
3! 1^3-1 / 3³(3-1)! = 6*1 / 27*2 = 6/54
probability of drawing 2 colors
3! 2^3-2 / 3³(3-2)! = 6*2 / 27*1 = 12/27
probability of drawing 3 colors
3! 3^3-3 / 3³(3-3)! = 6*1 / 27*1 = 6/27
Sum of all three of those is less than 1.
I'm having trouble following m00li's logic. I got lost as soon as he started talking about selecting balls from an infinite choice of colors (which was pretty soon).

You are not alone Plasmid. I am having trouble too I must have been high on something that day

My solution is for cases where a 'selection' is being made out of n different colors. I should have calculated permutations and not selections.

plasmid · May 11, 2014

If you number the colors of the balls {1, 2, 3 ... N} then you can consider the T balls that you draw as an ordered list (B
₁, B₂, B₃ … B_T) where the value of each B_X term is the number corresponding to the color pulled on draw number X. So if there were four colored balls {1=red, 2=blue, 3=white, 4=purple} and you had three draws and got (red, white, red), then you would have (B₁=red=1, B₂=white=3, B₃=red=1).

If there are N different colored balls and you draw T balls with replacement, then there are N^T different possible outcomes for the ordered list (B₁, B₂, B₃ ... B_T). The probability that you will draw M different colors is [the number of different ordered lists that include M different colors] divided by [the total number of different possible ordered lists, which is N^T].

The enumerate the different possible (B₁, B₂, B₃ ... B_T) lists with M different colors, I will temporarily lose generality: WITH loss of generality, number the colors that are drawn from color #1 to color #M so the terms in the list (B₁, B₂, B₃ ... B_T) each must take on a value from 1 to M, and each number from 1 to M must appear in the list at least once. To convince you that this does lose generality, consider the case in the first paragraph with N=4 and T=3 – if you number the colors from 1 to M and solve for M=1, then only one color (color #1) can appear in the list and the only solution is (1, 1, 1), but in actuality there are four solutions: (red, red, red,), (blue, blue, blue), (white, white, white), and (purple, purple, purple). In general, if you consider the solution found by renumbering the colors from 1 to M, then to get the real solution you must then multiply by (N choose M) to account for each of the different color combinations that could be used in the set from 1 to M. So the answer will be along the lines of (N choose M) * (ways of arranging M different colors in a list that's T elements long).

The number of ways of making a list that's T elements long using only M=1 colors is easy: that's just 1. In fact, that's too easy... to make things more complicated I'll rewrite that as 1^T.

The number of ways of making a list that's T elements long using M=2 colors can be calculated as (total number of ways of making a list with either M=1 or M=2 colors) minus (total number of ways of making a list with M=1 color for each of those two colors). The first of those two terms is just 2^T, and in general would be M^T. The second term for the case of M=2 would be (number of solutions if you have 1 color) times (number of ways of picking 1 color out of the M colors) = 1^T times (2 choose 1). So the final solution for M=2 is
2^T – (2 choose 1)*1^T

(Remember that I'm writing 1 as 1^T to make things complicated.)

The number of ways of making a list that's T elements long using M=3 can be similarly calculated as (number of ways of making a list with M=1 or M=2 or M=3 colors) minus (number of ways of making a list with M=2 colors) minus (number of ways of making a list with M=1 colors). That gives
3^T – (3 choose 2)*(2^T – (2 choose 1)*1^T) – (3 choose 1)*1^T

These terms are starting to get long, so for simplicity I'll write S_X as the solution for M=X (remembering that we're losing generality by renumbering the colors from 1 to M as I pointed out earlier).
S₁ = 1^T
S₂ = 2^T – (2 choose 1)*S₁
S₃ = 3^T – (3 choose 2)*S₂ – (3 choose 1)*S₁
S_M = M^T – (M choose M-1)*S_M-1 – (M choose M-2)*S_M-2 – … – (M choose 1)*S₁

To account for all different combinations of colors that could be used in the solution set (reversing the loss of generality imposed by renumbering from 1 to M), the final answer would be
(N choose M)*S_M
which is converted to a probability by dividing by N^T.

I cannot come up with a good way of simplifying that to keep the calculation from becoming unwieldy with a bunch of recursion.

#!/usr/bin/perl
use strict;
use warnings;
 
use List::Util qw(min sum);
use Math::Counting ':student';
# If you don't already have List::Util and Math::Counting installed,
#   then type "cpan List::Util" and "cpan Math::Counting" from the
#   command prompt
 
my $iterations=10000;
 
for(my $balls=1; $balls<21; $balls++) {
  for(my $draws=1; $draws<21; $draws++) {
    
    # First, run the simulation
    # @colors is an array where $colors[X] is the number of trials
    #   where X different color balls were drawn
    # @drawn is an array where $drawn[X] is zero if the color X
    #   has not been drawn and is one if color X has been drawn
    my @colors = (0) x ($balls+1);
    for(my $iteration=0; $iteration<$iterations; $iteration++) {
      my @drawn = (0) x ($balls+1);
      for(my $draw=0; $draw<$draws; $draw++) {
        $drawn[int(rand($balls))] = 1;
      }
      $colors[sum(@drawn)]++;
    }
    
    # Next, find out what the calculated answer would be
    # S(X) and M are as they are explained in the answer on brainden
    my @s = (0);
    for(my $m=1; $m<$balls+1; $m++) {
      $s[$m] = $m**$draws;
      for(my $iteration=1; $iteration<$m; $iteration++) {
        $s[$m] -= combination($m, $iteration)*$s[$iteration];
      }
    }
    
    # Last, print the results
    print "Balls (N): $balls, Draws (T): $draws\n";
    for(my $color=1; $color < min($balls, $draws)+1; $color++) {
      print "Simulated  probability of drawing $color colors (M): " . ($colors[$color] / $iterations) . "\n";
      print "Calcualted probability of drawing $color colors (M): " . (combination($balls, $color)*$s[$color]/($balls**$draws)) . "\n";
    }
    print "\n";
  }
}

bonanova · May 12, 2014

My formula made such sense to me (this is often the case until someone explains the logic that I overlook) that I posted it. Then, embarrassingly, it did not agree with simulations.

The second part of the probability (after M colors appeared in M draws)
appeared to be incorrect by a factor of 2. I've been chasing that factor of 2
for several days, but only part time owing to preparations for a choral
performance this weekend. More, perhaps, to follow this week.

plasmid, your post #7 may have found it, since you have agreement with simulation.
I'll read it after I've given my approach another shot.

FWIW here is my simulation of 10000 cases for N = 10.
The 10 columns are for M = 1 ... 10.
The 20 rows are for T = 1 ... 20
The table entries are percent, rounded to nearest integer.

My equations agree for the major diagonal.

But they underestimate the probabilities elsewhere.

10 MNTHIST 10000
100 0 0 0 0 0 0 0 0 0
10 90 0 0 0 0 0 0 0 0
1 27 72 0 0 0 0 0 0 0
0 6 43 50 0 0 0 0 0 0
0 1 18 51 30 0 0 0 0 0
0 0 6 34 45 15 0 0 0 0
0 0 2 18 43 31 6 0 0 0
0 0 1 8 33 40 17 2 0 0
0 0 0 4 21 41 28 6 0 0
0 0 0 2 12 36 35 14 1 0
0 0 0 1 7 28 38 22 4 0
0 0 0 0 4 20 38 29 8 1
0 0 0 0 2 15 34 35 13 1
0 0 0 0 1 10 30 38 19 3
0 0 0 0 1 6 24 39 25 5
0 0 0 0 0 4 20 39 30 7
0 0 0 0 0 3 16 37 35 10
0 0 0 0 0 2 12 34 39 13
0 0 0 0 0 1 9 32 42 17
0 0 0 0 0 1 6 28 44 22

m00li · May 25, 2014

More recursions:

Let f(t,m) denote the number of ways of arranging m colored balls in t slots, where t >= m and supply of each m coloured ball is infinite. Then final answer is ⁿC_mf(t,m) / n^t

Here are the different recursions I have come up with for f(t,m) (alas, no direct formula in m,n)

1) m^t - ^mC₁f(t,1) - ^mC₂f(t,2) - ^mC₃f(t,3) .... - ^mC_m-1f(t,m-1)

2) ^tC₁f(t-1,m-1) + ^tC₂f(t-2,m-1) + ^tC₃f(t-3,m-1) + ... + ^tC_t-m+1f(m-1,m-1)

3) mf(t-1,m-1) + m²f(t-2,m-1) + m³f(t-3,m-1) + ... + m^t-m+1f(m-1,m-1)

4) m( f(t-1,m) + f(t-1,m-1) )

where f(t,1) = 1, f(t,2) = 2^t-2, and f(t,t) = t!

bonanova · May 25, 2014

So, I wonder. Is there no closed form solution?

m00li · May 25, 2014

More recursions:

Let f(t,m) denote the number of ways of arranging m colored balls in t slots, where t >= m and supply of each m coloured ball is infinite. Then final answer is ⁿC_mf(t,m) / n^t

Here are the different recursions I have come up with for f(t,m) (alas, no direct formula in m,n)

1) m^t - ^mC₁f(t,1) - ^mC₂f(t,2) - ^mC₃f(t,3) .... - ^mC_m-1f(t,m-1)

2) ^tC₁f(t-1,m-1) + ^tC₂f(t-2,m-1) + ^tC₃f(t-3,m-1) + ... + ^tC_t-m+1f(m-1,m-1)

3) mf(t-1,m-1) + m²f(t-2,m-1) + m³f(t-3,m-1) + ... + m^t-m+1f(m-1,m-1)

4) m( f(t-1,m) + f(t-1,m-1) )

where f(t,1) = 1, f(t,2) = 2^t-2, and f(t,t) = t!

And finally one kind of non recursive form:

5) f(t,m) = m^t - ^mC₁(m-1)^t + ^mC₂(m-2)^t - ^mC₃(m-3)^t + ....(-1)^m-1(^mC_m-1) (This one gives the AHA insight into the solution)

Edited May 25, 2014 by m00li

Sign In

Colored balls in a bag

Question

Link to comment

Share on other sites

10 answers to this question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members