Latest Tweets

Central limit theorem for binomial distribution

Central limit theorem is widely used in probability and statistics. It states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger even if the original variables themselves are not normally distributed.

Central limit theorem

In probability we are mostly using De Moivre-Laplace theorem, which is a special case of $CLT$. It states that the normal distribution may be used as an approximation to the binomial distribution under certain conditions.

For every $n\geq 1$, let $X_{n}\sim B(n,p)$ with $p\in (0,1)$. If $Z\sim N(0,1)$, for every $x \in \mathbb{R}$ we have:
$$\lim_{n \to \infty} P \Big(\frac{X_{n}-np}{\sqrt{npq}} \leq x \Big)=P(Z \leq x)=\Phi(x)$$

Proposition. This version of $CLT$ is often used in this form: For $b \in \mathbb{R}$ and large $n$ $$P(X_{n} \leq b)=P \Big(\frac{X_{n}-np}{\sqrt{npq}} \leq \frac{b-np}{\sqrt{npq}}\Big) \approx \Phi\Big(\frac{b-np}{\sqrt{npq}}\Big)$$


Example 1

If every newborn baby has an equal chance of being a boy or a girl find the probability that among $1000$ newborns there is a maximum of $490$ girls.

Solution 

Looking at the formula above we can see we need $4$ pieces of information.

  • $n$ – the number of “trials”, i.e. number of newborns
  • $p$ – probability of our event, i.e. having a girl
  • $q=1-p$
  • $b$ – our limit,  i.e. maximum number of girls

In the text of this problem we can find all the information we need.

  • $n=1000$
  • $p=\frac{1}{2}, q=\frac{1}{2}$
  • $b=490$

Now we can put all the information into formula for $CLT$ for binomial variables. $$P(X_{n} \leq b)=P \Big(\frac{X_{n}-np}{\sqrt{npq}} \leq \frac{b-np}{\sqrt{npq}}\Big) \approx \Phi\Big(\frac{b-np}{\sqrt{npq}}\Big)$$ Consequently, it looks like this $$P(X_{n} \leq 490) \approx \Phi\Big(\frac{490-1000\cdot \frac{1}{2}}{\sqrt{1000\cdot \frac{1}{2} \cdot \frac{1}{2}}}\Big)$$ Moreover, when we calculate the expression we have: $$P(X\leq 490)\approx \Phi(-0.6325)$$ Using the rule $\Phi(-x)=1-\Phi(x)$ we have $$P(X\leq 490)\approx 1-\Phi(0.6325)$$ From the table of normal distribution we read the value of $\Phi(0.6325)$ which is equal to $0.7357$. Finally, we can calculate wanted probability $$P(X\leq 490)\approx 1-0.7357=0.2643$$ The probability of having maximum of $490$ girls out of $1000$ newborns is $26.43\%$.


Example 2 

There are $40$ questions in the exam and each one has $4$ answers with only one being correct. The correct answer gives $15$ points and wrong answer takes away $5$ points. If student randomly picks answers, what is the probability he will have at least $120$ points?

Solution 

In this case, we need to count two things: number of correct answers (and consequently the number of wrong ones) and number of points in total. For easier calculations, we’ll use two random variables.

$\bullet$ $X$ – # of correct answers
$\bullet$ $Y$ – # of points in total

$X$ is a binomial variable, $X\sim B(\frac{1}{4})$. We want to connect these two variables into one equation. The total number of points is gotten when we add $15$ points for every correct answer and subtract 5 points for every wrong one. Since $X$ is the number of correct answers, the number of wrong ones is $40-X$.  When we put that into equation, we have: $$Y=15 \cdot X-5(40-X)$$ Moreover, it’s equal to $$Y=15Y-200+5X=-200+20X$$ Further, total number of points has to be greater than $120$. $$P(Y\geq 120)=P(-200+20X\geq 120)=P(X\geq 16)$$ Since $CLT$ works for values less than a real number, we want to rewrite the the probability that way. $$P(X\geq 16)=1-P(X < 16)=1-P(X \leq 15)=1-\Phi\Big(\frac{15-40\cdot \frac{1}{4}}{\sqrt{40\cdot \frac{1}{4} \cdot \frac{3}{4}}}\Big)= 1-\Phi(1.8257)=1-0.9664=0.0336$$ Finally, the probability student got at least $120$ by randomly choosing answers is equal to $3.36\%$


Example 3

Wheel roulette has $18$ red, $18$ black pockets and one zero. Suppose we play on red pockets. If a ball falls into red pocket we get $\$1$ otherwise we lose $\$1$. What are the odds we are winning after $1000$ rounds?

Solution

We can see that we have to have a thousand rounds, therefore $n=1000$. Using the rest of the text we can also calculate the needed probabilities:

$\bullet$ probability of winning a dollar is: $p=\frac{18}{37}$. Consequently, $q=\frac{19}{37}$.

Let $X$ be the number of winning rounds. If we want to be on the winning side after $1000$ rounds that means we have to win more than $500$ rounds, i.e. we need the probability of $X>500$. $$P(X>500)=1-P(X\leq 500)=1-\Phi\Big(\frac{500-1000\cdot \frac{18}{37}}{\sqrt{1000 \cdot \frac{18}{37} \cdot \frac{19}{37}}}\Big)$$

When we calculate the expression in $\Phi$ we get: $1-\Phi(0.85)$

Using the table of normal distribution we read the value of $\Phi(0.85)$ which is equal to $0.0.8023$. Finally, $$1-0.8023=0.1977$$

The probability we will be winning after $1000$ rounds is $19.77\%$.


Example 4

In $1920.$ in Chicago there were $2$ trains for $1000$ passengers going from Chicago to Los Angeles. Lets say each train has the same possibility to be chosen by passengers. What is the minimal number of required seats each train has to have in order to be $99\%$ sure each passenger will have its own seat?

Solution

The number of passengers will be interpreted as number of trials, meaning $n=1000$. Lets $X$ be the number of people on the first train. Consequently, the number of people on the second train is equal to $1000-X$.

However, $X$ can’t have any value we could think of. Surely, it has to be less than $1000$. But we also need to take into consideration that it has to be less than a minimal number of required seats. Let’s note the minimal number of seats with parameter $k$.

Moreover, the number of people in each train has to be less than a minimum of required seats, meaning it has to be less than $k$. The probability of that expression needs to be at least $99\%$. $$P(X \leq k, 1000-X \leq k) \geq 0.99$$

Since both inequalities need to happen, it means we are looking for their intersection.

Intersection: $X \leq k, 1000-X \leq k  \rightarrow  X \leq k, -X \leq k -1000$ Consequently, $$1000-k \leq X \leq k$$ Remember, from the previous lesson we can rewrite it using the standard normal distribution:

$$P(1000-k \leq X \leq k)=\Phi\Big(\frac{k-1000\cdot \frac{1}{2}}{\sqrt{1000\cdot \frac{1}{2}\cdot \frac{1}{2}}}\Big) – \Phi\Big(\frac{1000-k -1000\cdot \frac{1}{2}}{\sqrt{1000\cdot \frac{1}{2}\cdot \frac{1}{2}}}\Big)$$ Moreover, $$P(1000-k \leq X \leq k)=\Phi\Big(\frac{k-500}{\sqrt{250}}\Big) – \Phi\Big(\frac{500-k}{\sqrt{250}}\Big) \geq 0.99$$ To solve this, we’ll use another rule of normal distribution: $\Phi(x)=1-\Phi(-x)$. Rewrite the expression above as:  $$P(1000-k \leq X \leq k)=\Phi\Big(\frac{k-500}{\sqrt{250}}\Big) -\Big(1- \Phi\Big(\frac{k-500}{\sqrt{250}}\Big)\Big)\geq 0.99$$ Which is equal to $$2\Phi\Big(\frac{k-500}{\sqrt{250}}\Big)-1\geq 0.99$$ Further, $$\Phi\Big(\frac{k-500}{\sqrt{250}}\Big)\geq 0.995$$

Now, in the table of normal distribution we look for the first value of $Z$ greater than $0.995$. Then we find the original $Z$ through right column and row, which is $2.58$. Consequently, $\Phi(2.58)\geq 0.995$. Moreover, $$\frac{k-500}{\sqrt{250}}\geq 2.58$$ When we calculate this inequality we get $$k\geq 540.79$$i.e. $$k\geq 541$$ The minimal number of required seats per train is $541$.