**Statistics** is dealing with studying various methods for collecting, analyzing and presenting the empirical data. There are two categories: **descriptive** and **inferential statistics**.

**Descriptive statistics **focuses on describing a sample, without trying to conclude anything about a larger population.

**Inferential statistics** collects informations from sample. After that, it makes conclusions about the population from which the sample was selected.

## Population and samples

**Population** is a set or a collection under observation. The term often refers to a group of people, but sometimes it refers to objects, events or observations. Furthermore, population can be **finite** or **infinite**. It is finite if it is possible to count its members and infinite when that’s not possible.

* Example 1:* The population of students at specific university.

* Example 2: *If we are studying height of adult men, the population is a set of heights of all the men in the world.

** Example 3: **All daily minimum temperatures in January for major cities in Europe.

Researches would often like to know certain informations about populations, but they don’t have data for every person or thing in the population. For that reason, they often select a **sample** of the population.

**Sample is every subset of population**, i.e. a smaller group of elements of population which represents the population. It is important that sample is **random**. It means that every element of the population has an equal chance to be selected. The process of selecting the sample from the population is called **sampling**. Researchers develop hypotheses about the population based on informations collected from sample.

For example, let’s say you go to a chocolate bar and see that owner offers some samples of chocolate products. Of course, you won’t taste all the products from the bar. And, surely, the owner wouldn’t be very satisfied if you taste everything for free. Therefore, your opinion about chocolate bar would be based only on several products, i.e. on samples they offer.

**Parameter** is a data about an entire population. It is a value that describes a characteristic of a population. For example, the **population mean** and **standard deviation**. Furthermore, parameter depends on all elements of population.

## Numeric and categorical variables

A **variable** is any characteristic, quantity or number that we can measure or count. We also call it **data item**. There are two main types: numeric and categorical.

**Numeric variables **(or **quantitative variables**) can be interval or ratio.

**Interval variables** are variables who join some **real number** to every member of population and they have an intrinsic order. Furthermore, **measuring unit** and **agreement zero** are defined. An example is a variable that joins air temperature (in the same place and at the same time) to each day.

**Ratio** **variables **are variables that have the same properties as interval variables. The difference is that an agreement zero isn’t defined. In other words, it means that zero of the measurement indicates that a property on observed element doesn’t exist. An example is measuring the time a person needs to ran $100$ metres.

**Categorical variables **(or **qualitative variables**) can be nominal or ordinal.

**Nominal variables **are variables who join some **attribute** to every member of population and do not have an intrinsic order. For example, we join number $1$ to every person who answers the referendum question with ‘yes’. In opposite, we join number $0$.

**Ordinal variables **are variables who join some **symbol** or **number** to every member of population. Also, they have an intrinsic order. For example, we join grade of some course to every student from certain university. Another example is if we join a degree of professional qualification to every person of some group.

## Discrete and continuous variables

If a numeric variable can take on any value from some interval $\left<a, b\right> \subseteq \mathbf{R}$ for $a, b \in \mathbf{R}, a<b$, it is called a **continuous variable**. Otherwise, it is called a **discrete variable**.

* Example 4: * The height of all people is a continuous variable defined on population of all people who are born or who will be born.

* Example 5: *Suppose we flip a coin and watch the outcomes; ‘head’ or ‘tail’. Variable that joins number $1$ to outcome ‘head’ and number $0$ to outcome ‘tail’ is a discrete variable.

## High – dimensional variable

We can observe more different variables on the same population as one **high – dimensional variable.** We can show this variable using **matrices**. For example, if $X = (X_{1}, \cdots, X_{k})$, where $X_{i}, i = 1, \cdots, k$ are variables on the same population $S$, then the corresponding matrice is:

$$\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1k} \\ \vdots & \vdots & \vdots & \vdots \\ a_{i1} & a_{i2} & \cdots & a_{ik} \\ \vdots & \vdots & \vdots & \vdots \end{bmatrix}.$$

Values $a_{ij}$ represent values of *j* – th variable $X_{j}$ in *i* – th member of population. Therefore, the first column represents $X_{1}(S)$, the second column $X_{2}(S)$ and so on.

** Example 6: **Data about citizens that census taker collects are one high dimensional variable. For instance: age, place of birth, number of household members etc.