bonanova Posted June 18, 2021 Report Share Posted June 18, 2021 You are given an exhaustive 5-letter English word list. How would you determine the probability that 7 randomly chosen letters form at least one of those words? Or the expected number of words they form? Quote Link to comment Share on other sites More sharing options...
0 harey Posted June 25, 2021 Report Share Posted June 25, 2021 A table with English words by the set of all combinations (with repetition) of 7 letters. However, even with some tricks like sorting the letters, I do not think my computer would give me the answer overnight. Remains the Monte-Carlo method. However, I am pretty sure it is not the expected answer. A hint? Quote Link to comment Share on other sites More sharing options...
0 Thalia Posted July 26, 2021 Report Share Posted July 26, 2021 Spoiler Can't come up with all the math right now but I'll take a general stab at it. There are 26 letters in the alphabet so 26^5 possible 5 letter sequences. Hopefully the word list was in a word document so you have a word count... You pick 7 letters at random. So 7!/(7-5)!=2520 sequences that may or may not be words. Then you'd determine the probability that after picking 2520 sequences out of 26^5 possibilities, you haven't formed a word. Subtract that off 1 and you have the probability of getting at least one word. As for the expected number of words, divide the number of words in the list by 26^5 to get the percentage of sequences that are words. Take that percentage of 2520. I think... that class was so long ago... Quote Link to comment Share on other sites More sharing options...
0 plasmid Posted August 22, 2021 Report Share Posted August 22, 2021 Building on Thalia's answer... Spoiler The problem of finding the probability of getting a word is tricky. For the sake of simplicity I'll assume you're given a list of words that are culled for any pairs of words that use the same set of letters in a different order. Suppose you were given a list of all 2 letter words so you know the probability of any two random letters forming a word is P2 = [number of 2 letter combinations that are words] / [number of possible random 2 letter combinations] x [number of 2 letter combinations that can be formed from pulling 2 random letters (if you get the letters EM then the word ME counts)] P2 = [length of culled word list] / 262 x 2! (I'm going to neglect correcting it for the possibility of pulling two identical letters where a swap wouldn't generate a new word since that makes the math easier), and you were asked the probability that a random 3 letters would form a word. If you allow a naive approach, you could say the odds that the first two of the three random letters would form a word are P2, and so are the odds that the second and third letters would form a word, and so are the odds that the first and third letters would form a word. So the odds that a random 3 letters would NOT form a word is the product of the odds that no pair will form a word. The odds that the random 3 letters WOULD form a 2 letter word is then P3 = 1 - (1-P2) x (1-P2) x (1-P2), or 1 – (1-P2)3. By that logic, the probability that seven random letters could generate a five letter word would be given by P5 = [length of culled word list] / 265 x 5! P7 = 1 – (1-P5)[7 Choose 5] = 1 – (1-P5)21 The problem with that logic is that it assumes independence of events: that the probability that 5 of the 7 random letters forms a word is not affected by the probability that any other set of 5 of the 7 random letters forms a word. (That's an implicit assumption when you say [the probability that events X and Y both happen] = [the probability that event X happens] times [the probability that event Y happens].) In reality, some letters are more commonly used than others and if you get a Q in the list then all of the probabilities involving that letter drop a lot and if you get an E then all of the probabilities involving that letter increase a lot. So the formula above might give you a rough answer, but if you want a more accurate answer then you would need some way of correcting for that. And I'm not sure how to do it. Perhaps ironically, getting the expected number of words that are formed might be easier. In that case you don't care whether events are independent or not, you can just calculate the total size of the list (not culling for words with the same set of letters in different orders), say the number of expected words formed by a random 5 letters would be E5 = [length of unculled word list] / 265 x 5! and the number of different sets of 5 letters out of 7 random letters is 21 so E7 = 21 x E5 (Maybe you would also need to correct for making the same word with different sets of letters, like not having a random 3 letters of MME count the word ME twice, depending on how you want to handle that scenario.) While it's true that some random 7 letters will give a ton of words and other random 7 letters will give few or none, if you just care about getting the average then that doesn't matter and you don't need to pull your hair out worrying about non-independence of the probabilities like with the first problem. (I'm still cheating by neglecting correcting for the possibility of drawing repeats of the same letter in that 5! term, but that seems like a relatively minor sin for this estimation.) Quote Link to comment Share on other sites More sharing options...
Question
bonanova
You are given an exhaustive 5-letter English word list. How would you determine the probability that 7 randomly chosen letters form at least one of those words? Or the expected number of words they form?
Link to comment
Share on other sites
3 answers to this question
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.