What are the odds that this group of employed people at an American corporation (multiple nationalities, ethnicities, balanced genders) has naturally distributed birthdays?

Welcome to BrainDen.com - Brain Teasers Forum. Like most online communities you must register to post in our community, but don't worry this is a simple free process. To be a part of BrainDen Forums you may create a new account or sign in if you already have an account. As a member you could start new topics, reply to others, subscribe to topics/forums to get automatic updates, get your own profile and make new friends. Of course, you can also enjoy our collection of amazing optical illusions and cool math games. If you like our site, you may support us by simply clicking Google "+1" or Facebook "Like" buttons at the top. If you have a website, we would appreciate a little link to BrainDen. Thanks and enjoy the Den :-) |

Guest Message by DevFuse

Started by Tom Mercer, Sep 25 2013 03:51 AM

5 replies to this topic

Posted 25 September 2013 - 03:51 AM

What are the odds that this group of employed people at an American corporation (multiple nationalities, ethnicities, balanced genders) has naturally distributed birthdays?

Posted 25 September 2013 - 03:53 AM

There are only these 20 people / birthdays. Each date is one person. It's in 2013 because LibreOffice spreadsheet where I wrote down the birthdays has to have a year. You can ignore the year 13.

Posted 27 September 2013 - 02:36 AM

Sorry if you find this disappointing, but I don't think the question can be answered in any meaningful way that you would be hoping for. At least not as far as I know.

Statistics can answer certain questions, and there are well-developed tools for answering questions along the lines of "I have two groups of patients: those that were treated with a drug and those that were treated with placebo. I have a null hypothesis that the the two groups had the same change in blood pressure after treatment. I'll test the probability that the observed changes in blood pressure for the two groups could have been drawn from the same distribution, assuming that the distributions are Gaussian. If that probability is very low (usually <5% by convention), then I'll feel comfortable rejecting the null hypothesis and therefore concluding that the drug had an effect."

Those types of questions can be answered. In this case, if your null hypothesis is that the birthdays were drawn from a uniform distribution, then most of the tools that are based on statistics for Gaussian distributions wouldn't apply. I think you would be stuck having to say that any possible distribution of birthdays would be equally likely to have been drawn from a uniform distribution.

*Edit: if you had a very large sample of people, you could perhaps ask the question: "What is the probability that the number of people with a birthday on each day of the year follows a distribution that you would expect to see if birthdays fell on random days?" But this example obviously has too few people for that sort of analysis. *

If you have a different null hypothesis, like you noticed that there are no birthdays in January-March and want to compute the probability that that would occur, then it would be possible. But then you run into problems associated with post-hoc analysis. For example, if you have a group of 11 people and you notice that none of them have a birthday in January, then you can compute the probability that for any group of 11 people you would have no one with a January birthday (which I think would be ~38%) and therefore claim that your distribution where there are no January birthdays was improbable. But it would miss the point that with a group of 11 people, there MUST be at least one month without a birthday, so you would be hard pressed to really claim that your distribution was improbable.

Statistics can answer certain questions, and there are well-developed tools for answering questions along the lines of "I have two groups of patients: those that were treated with a drug and those that were treated with placebo. I have a null hypothesis that the the two groups had the same change in blood pressure after treatment. I'll test the probability that the observed changes in blood pressure for the two groups could have been drawn from the same distribution, assuming that the distributions are Gaussian. If that probability is very low (usually <5% by convention), then I'll feel comfortable rejecting the null hypothesis and therefore concluding that the drug had an effect."

Those types of questions can be answered. In this case, if your null hypothesis is that the birthdays were drawn from a uniform distribution, then most of the tools that are based on statistics for Gaussian distributions wouldn't apply. I think you would be stuck having to say that any possible distribution of birthdays would be equally likely to have been drawn from a uniform distribution.

If you have a different null hypothesis, like you noticed that there are no birthdays in January-March and want to compute the probability that that would occur, then it would be possible. But then you run into problems associated with post-hoc analysis. For example, if you have a group of 11 people and you notice that none of them have a birthday in January, then you can compute the probability that for any group of 11 people you would have no one with a January birthday (which I think would be ~38%) and therefore claim that your distribution where there are no January birthdays was improbable. But it would miss the point that with a group of 11 people, there MUST be at least one month without a birthday, so you would be hard pressed to really claim that your distribution was improbable.

**Edited by plasmid, 27 September 2013 - 02:47 AM.**

Posted 29 September 2013 - 08:13 PM

Ok, to bound the question further, what is the probability that 20 people have no birthdays for 124 CONSECUTIVE days of the year (any 124 consecutive days, to avoid the post-hoc errors plasmid describes).

I'm looking for some explanation 1. is this distribution unlikely (how unlikely) 2. why/how could this happen - self-selection through interviews...

I'm baffled. This is the office where I work, and I know it's statistically significant, but don't know how unlikely this is. I'm also curious what could cause this? Is this a known trait of other offices or social groups?

Gladwell's book talks about athletes getting selected for size, which means almost all NHL players are born during 3 months. However, I can't think of anything in an office that would anti-select for 4 months of the year, especially given that the office is mixed gender and mixed nationality and mixed ethnicity.

What's going on here?

Posted 29 September 2013 - 09:55 PM

Spoiler for

**Edited by phil1882, 29 September 2013 - 09:55 PM.**

Posted 30 September 2013 - 01:47 PM

Getting the probability that a set of N people all have a birthday that's not in January-March (and those specific months) would be straightforward.

Phil's simulation might be the only reasonable way of answering the case where the "gap" can be anywhere. Analytically solving for the probability that if you pick N birthdays you will get a largest gap between birthdays of at least X seems tricky.

Spoiler for

Phil's simulation might be the only reasonable way of answering the case where the "gap" can be anywhere. Analytically solving for the probability that if you pick N birthdays you will get a largest gap between birthdays of at least X seems tricky.

0 members, 0 guests, 0 anonymous users