• 0

Birthday Distribution

Question

Posted · Report post

What are the odds that this group of employed people at an American corporation (multiple nationalities, ethnicities, balanced genders) has naturally distributed birthdays?

04/08/13 04/08/13 04/27/13 05/02/13 05/12/13 06/12/13 06/26/13 07/03/13 07/19/13 08/05/13 08/08/13 09/16/13 10/06/13 10/09/13 10/10/13 10/26/13 11/04/13 11/19/13 11/26/13 12/04/13

0

Share this post


Link to post
Share on other sites

5 answers to this question

  • 0

Posted · Report post

There are only these 20 people / birthdays. Each date is one person. It's in 2013 because LibreOffice spreadsheet where I wrote down the birthdays has to have a year. You can ignore the year 13.

0

Share this post


Link to post
Share on other sites
  • 0

Posted (edited) · Report post

Sorry if you find this disappointing, but I don't think the question can be answered in any meaningful way that you would be hoping for. At least not as far as I know.

Statistics can answer certain questions, and there are well-developed tools for answering questions along the lines of "I have two groups of patients: those that were treated with a drug and those that were treated with placebo. I have a null hypothesis that the the two groups had the same change in blood pressure after treatment. I'll test the probability that the observed changes in blood pressure for the two groups could have been drawn from the same distribution, assuming that the distributions are Gaussian. If that probability is very low (usually <5% by convention), then I'll feel comfortable rejecting the null hypothesis and therefore concluding that the drug had an effect."

Those types of questions can be answered. In this case, if your null hypothesis is that the birthdays were drawn from a uniform distribution, then most of the tools that are based on statistics for Gaussian distributions wouldn't apply. I think you would be stuck having to say that any possible distribution of birthdays would be equally likely to have been drawn from a uniform distribution.

Edit: if you had a very large sample of people, you could perhaps ask the question: "What is the probability that the number of people with a birthday on each day of the year follows a distribution that you would expect to see if birthdays fell on random days?" But this example obviously has too few people for that sort of analysis.

If you have a different null hypothesis, like you noticed that there are no birthdays in January-March and want to compute the probability that that would occur, then it would be possible. But then you run into problems associated with post-hoc analysis. For example, if you have a group of 11 people and you notice that none of them have a birthday in January, then you can compute the probability that for any group of 11 people you would have no one with a January birthday (which I think would be ~38%) and therefore claim that your distribution where there are no January birthdays was improbable. But it would miss the point that with a group of 11 people, there MUST be at least one month without a birthday, so you would be hard pressed to really claim that your distribution was improbable.

Edited by plasmid
0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

Ok, to bound the question further, what is the probability that 20 people have no birthdays for 124 CONSECUTIVE days of the year (any 124 consecutive days, to avoid the post-hoc errors plasmid describes).

I'm looking for some explanation 1. is this distribution unlikely (how unlikely) 2. why/how could this happen - self-selection through interviews...

I'm baffled. This is the office where I work, and I know it's statistically significant, but don't know how unlikely this is. I'm also curious what could cause this? Is this a known trait of other offices or social groups?

Gladwell's book talks about athletes getting selected for size, which means almost all NHL players are born during 3 months. However, I can't think of anything in an office that would anti-select for 4 months of the year, especially given that the office is mixed gender and mixed nationality and mixed ethnicity.

What's going on here?

0

Share this post


Link to post
Share on other sites
  • 0

Posted (edited) · Report post

out of 100000 sims, i got 15565 years in which no one out of 20 people was born within a 120 day time span.

so roughly 15% chance.

Edited by phil1882
0

Share this post


Link to post
Share on other sites
  • 0

Posted · Report post

Getting the probability that a set of N people all have a birthday that's not in January-March (and those specific months) would be straightforward.

The probability that one person has a birthday that isn't within January-March (assuming that being born on any month of the year is equally probable and neglecting days/month) would be 0.75. The probability that N people all have a birthday outside of the January-March range would be 0.75

N. So in this case, 0.7520 ~= 0.003, or about 0.3%. Accounting for days/month and leap years ends up not making a significant difference.

Phil's simulation might be the only reasonable way of answering the case where the "gap" can be anywhere. Analytically solving for the probability that if you pick N birthdays you will get a largest gap between birthdays of at least X seems tricky.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.