Quantiles

Quantiles are very useful tool in statistics. We use them to summarize a group of numbers. For instance, if we have a big list of numbers, we simply focus on few numbers.

Intuitively speaking, ‘quantile’ means that the sample is divided into equal – sized parts.

Definition

Let $y_{1}, \cdots, y_{N}$ be a grouped statistical sequence, i.e. $y_{1} \leq \cdots y_{N}$. Let us denote

$$r = Int \left(j\frac{N}{n} + 1\right).$$

Quantiles of order $n$ are values $K_{1}, \cdots, K_{n-1}$ which we calculate using the following formula:

$$K_{j} = \begin{cases} y_{r} & \text{if j\frac{N}{n}\notin \mathbf{N}} \\ \frac{y_{r-1} + y_{r}}{2} & \text{if j \frac{N}{n} \in \mathbf{N}} \end{cases}, j = 1, \cdots, n-1.$$

Quantiles of order $n$ determine $n$ intervals: $\left[y_{1}, K_{1}\right>, \left<K_{1}, K_{2}\right> \cdots, \left<K_{n-1}, y_{N}\right]$.

Furthermore, in each of these intervals are less than or equal to $\frac{100}{n}\%$ of values of a sequence.

Special types of quantiles

Quantile of order $2$ is a median, quantiles $Q_{1}, Q_{2}, Q_{3}$ of order $4$ are called quartiles, quantiles of order $10$ are called deciles and quantiles of order $100$ are called percentiles. In other words, quartiles divide the distribution into $4$ equal parts, deciles into $10$ equal parts and percentiles into $100$ equal parts.

Notice that the second quartile always corresponds to the median of the given set.

$Q_{1}$ is called the lower quartile and $Q_{3}$ the upper quartile.

The lower quartile is the middle value of the lower half.

The upper quartile is the middle value of the upper half.

The deciles are $9$ values which split the data set into $10$ equal – sized parts.

Quartiles are special cases of percentiles. The $25$ – th percentile is also called the first quartile. The $50$ – th percentile is also called the median. The $75$ – th percentile is also called the third quartile.

The percentiles of a distribution are $99$ values which split the data set into $100$ equal – sized parts. A percentile gives us information about what number is higher than a certain percent of the rest of the dataset. For instance, the ”$60th$ percentile” means that the number is higher than $60 \%$ of the other given numbers.

Percentiles are often used to report scores in test. For example, if you are at the $70$ – th percentile, it means that your score was better than $70 \%$ of test takers.

In addition, here is the list of some other specific quantiles:

Terciles quantiles of order $3$

Quintilesquantiles of order $5$

Sextiles –  quantiles of order $6$

Septiles – quantiles of order $7$

Octiles – quantiles of order $8$

Duodeciles quantiles of order $12$

Vigintilesquantiles of order $20$

Permilles quantiles of order $1000$

Examples

Example 1: Find the quartiles for the following data: $-1, -3, 0, -1, -1, 5, 0, -3, 1, 2, 3, 3$.

Solution:

First, we need to put the list of given numbers in order: $-3, -3, -1, -1, -1, 0, 0, 1, 2, 3, 3, 5$.

Furthermore, $N = 12, n = 4$.

From $\frac{N}{4} = 3, 2\frac{N}{4} = 6, 3\frac{N}{4} = 9$ we get

$$Q_{1} = \frac{y_{3}+y_{4}}{2} = -1, Q_{2} = M_{e} = \frac{y_{6} + y_{7}}{2} = 0, Q_{3} = \frac{y_{9} + y_{10}}{2} = 2.5.$$

Example 2: Find the deciles $D_{1}, D_{3}$ and $D_{8}$ for the following data: $22, 20, 24, 30, 32, 28, 35$.

Solution:

First, we need to put the list of given numbers in order: $20, 22, 24, 28, 30, 32, 35$.

Furthermore, $N = 7, n = 10$.

From $\frac{N}{10} = \frac{7}{10} = 0.7 \notin \mathbf{N}$ we get $r = Int(0.7) + 1 = 0 + 1 = 1$ and

$$D_{1} = y_{1} = 20.$$

From $3\frac{N}{10} = 3\frac{7}{10} = 2.1 \notin \mathbf{N}$ we get $r = Int(2.1) + 1 = 2 + 1 = 3$ and

$$D_{3} = y_{3} = 24.$$

From $8\frac{N}{10} = 8\frac{7}{10} = 5.6 \notin \mathbf{N}$ we get $r = Int(5.6) + 1 = 5 + 1 = 6$ and

$$D_{8} = y_{6} = 32.$$

Quantiles for grouped data

If a distribution of numeric variable is grouped in classes, then the $j – th$ quantile class of order $n$ is defined as first class $[L_{1}, L_{2}]$ whose cumulative frequency is greater than or equal to $j \frac{N}{n}$.

If $f_{kvant}$ is a frequency of the $j – th$ quantile class, $l$ its size, and $F(L_{1})$ cumulative frequency (the sum of all frequencies) before $j – th$ quantile class, then the $j – th$ quantile is estimated with a value

$$K_{j} = L_{1} + \frac{j\frac{N}{n} – F(L_{1})}{f_{kvant}}l. \ (*)$$

Example 3: Salaries of employees of a certain company are grouped in classes and shown in the table below. Calculate the first, second and third quartile class. Interpret the results. Solution:

Notice that instead of (cumulative) frequencies $f_{i}$ we can observe (cumulative) percentages $p_{i}100 = \frac{f_{i}}{N}100$. Therefore, multiplying the numerator and denominator from the formula (*) by $\frac{100}{N}$, we get

$$K_{j} = L_{1} + \frac{j \frac{N}{n}\frac{100}{N} – \frac{F(L_{1})}{N}100}{\frac{f_{kvant}}{N}100}l = L_{1} + \frac{j \frac{100}{N} – \frac{F(L_{1})}{N}100}{p_{kvant}100}l,$$

where $p_{kvant}$ is a proportion of the  j – th quantile class of order n, i.e. proportion of the first class whose cumulative percentage is greater than or equal to $j \frac{100}{n}$.

From $\frac{N}{4} = 25, 2 \frac{N}{4} = 50, 3 \frac{N}{4} = 75$ we see that the first quantile class is $1500.5 – 1700.5$, the median class is $1700.5 – 1900.5$ and the third quantile class is $1900.5 – 2100.5$. Furthermore,

$$Q_{1} = 1500.5 + \frac{25 – 21.7}{16.5}200 = 1540.5$$

$$M_{e} = 1700.5 + \frac{50 – 38.2}{23.8}200 = 1799.7$$

$$Q_{3} = 1900.5 + \frac{75 – 62}{14.9}200 = 2075.$$

In conclusion, up to a quarter of employees have salary less than $1540.5$ €, up to half of the employess have salary less than $1799.7$ €, while up to a quarter of employees have salary higher than $2075$ €.