Jump to content
BrainDen.com - Brain Teasers
  • 0

Birthday Distribution


Tom Mercer
 Share

Question

What are the odds that this group of employed people at an American corporation (multiple nationalities, ethnicities, balanced genders) has naturally distributed birthdays?

04/08/13 04/08/13 04/27/13 05/02/13 05/12/13 06/12/13 06/26/13 07/03/13 07/19/13 08/05/13 08/08/13 09/16/13 10/06/13 10/09/13 10/10/13 10/26/13 11/04/13 11/19/13 11/26/13 12/04/13

Link to comment
Share on other sites

5 answers to this question

Recommended Posts

  • 0

Sorry if you find this disappointing, but I don't think the question can be answered in any meaningful way that you would be hoping for. At least not as far as I know.

Statistics can answer certain questions, and there are well-developed tools for answering questions along the lines of "I have two groups of patients: those that were treated with a drug and those that were treated with placebo. I have a null hypothesis that the the two groups had the same change in blood pressure after treatment. I'll test the probability that the observed changes in blood pressure for the two groups could have been drawn from the same distribution, assuming that the distributions are Gaussian. If that probability is very low (usually <5% by convention), then I'll feel comfortable rejecting the null hypothesis and therefore concluding that the drug had an effect."

Those types of questions can be answered. In this case, if your null hypothesis is that the birthdays were drawn from a uniform distribution, then most of the tools that are based on statistics for Gaussian distributions wouldn't apply. I think you would be stuck having to say that any possible distribution of birthdays would be equally likely to have been drawn from a uniform distribution.

Edit: if you had a very large sample of people, you could perhaps ask the question: "What is the probability that the number of people with a birthday on each day of the year follows a distribution that you would expect to see if birthdays fell on random days?" But this example obviously has too few people for that sort of analysis.

If you have a different null hypothesis, like you noticed that there are no birthdays in January-March and want to compute the probability that that would occur, then it would be possible. But then you run into problems associated with post-hoc analysis. For example, if you have a group of 11 people and you notice that none of them have a birthday in January, then you can compute the probability that for any group of 11 people you would have no one with a January birthday (which I think would be ~38%) and therefore claim that your distribution where there are no January birthdays was improbable. But it would miss the point that with a group of 11 people, there MUST be at least one month without a birthday, so you would be hard pressed to really claim that your distribution was improbable.

Edited by plasmid
Link to comment
Share on other sites

  • 0

Ok, to bound the question further, what is the probability that 20 people have no birthdays for 124 CONSECUTIVE days of the year (any 124 consecutive days, to avoid the post-hoc errors plasmid describes).

I'm looking for some explanation 1. is this distribution unlikely (how unlikely) 2. why/how could this happen - self-selection through interviews...

I'm baffled. This is the office where I work, and I know it's statistically significant, but don't know how unlikely this is. I'm also curious what could cause this? Is this a known trait of other offices or social groups?

Gladwell's book talks about athletes getting selected for size, which means almost all NHL players are born during 3 months. However, I can't think of anything in an office that would anti-select for 4 months of the year, especially given that the office is mixed gender and mixed nationality and mixed ethnicity.

What's going on here?

Link to comment
Share on other sites

  • 0

Getting the probability that a set of N people all have a birthday that's not in January-March (and those specific months) would be straightforward.

The probability that one person has a birthday that isn't within January-March (assuming that being born on any month of the year is equally probable and neglecting days/month) would be 0.75. The probability that N people all have a birthday outside of the January-March range would be 0.75

N. So in this case, 0.7520 ~= 0.003, or about 0.3%. Accounting for days/month and leap years ends up not making a significant difference.

Phil's simulation might be the only reasonable way of answering the case where the "gap" can be anywhere. Analytically solving for the probability that if you pick N birthdays you will get a largest gap between birthdays of at least X seems tricky.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...