Probability is a number that reflects the possibility that a particular event will occur. In other words, it quantifies (on a scale of 0-1 or 0-100%) how likely the event is to occur.

Probability is a part of mathematics that provides models for describing random processes. These mathematical tools can be used to create theoretical models for random phenomena and use them to make predictions. Like all models, the probability model is a simplification of the world. However, the model is useful as soon as it captures the essential features.

In this article, we present 9 basic formulas and concepts in probability that every data scientist should understand and master in order to deal with any project with probability.

The probability of an event is always between 0 and 1 (or 0% and 100%),

[0 le P(A) le 1]

  • If the event is impossible: (P (A) = 0 )
  • If the event is certain: (P (A) = 1 )

For example, throwing a 7 with a standard six-sided dice (faces range from 1 to 6) is impossible, so its probability is equal to 0. Throwing a head or tail with a coin is sure, so its probability is equal to 1.

If the elements of the sample space (all possible results of the randomized experiment) are equally probable (= all elements have the same probability), the probability of the event is equal to the number of favorable cases (number of ways it can occur) divided by the number of possible cases (total results):

[P(A) = frac{text{number of favourable cases}}{text{number of possible cases}}]

For example, all six-way dice numbers are equally probable because they all have the same probability of occurrence. Thus, the probability of rolling 3 on the dice is

[P(3) = frac{text{number of favourable cases}}{text{number of possible cases}} = frac{1}{6}]

because there is only one favorable case (there is only one face with 3), and there are 6 possible cases (because there are a total of 6 faces).

The probability of the complement (or vice versa) of an event is:

[P(text{not A}) = P(bar{A}) = 1 – P(A)]

For example, the probability of not rolling 3 dice is:

[P(bar{A}) = 1 – P(A) = 1 – frac{1}{6} = frac{5}{6}]

The probability of combining two events is the probability of either:

[P(text{A or B)} = P(A cup B) = P(A) + P(B) – P(A cap B)]

Assume that the probability of a fire in two houses in a given year is:

  • in house A: 60%, then (P (A) = 0.6 )
  • in house B: 45%, then (P (B) = 0.45 )
  • in at least one of the two houses: 80%, so (P (A cup B) = 0.8 )

Graphically we have

Probability of fire in the house A or house B is

[P(A cup B) = P(A) + P(B) – P(A cap B)]
[= 0.6 + 0.45 – 0.25 = 0.8]

In summary (P (A) ) and (P (B) ), The intersection point of A and B, i.e. (P (A cap B) ), is calculated twice. For this reason, we subtract it to calculate it only once.

If two events are mutually exclusive (i.e., two events that cannot occur simultaneously), the probability of both events is equal to 0, so the formula above becomes

[P(A cup B) = P(A) + P(B)]

For example, event “Throw 3” and event “Throw 6” on a six-sided dice are two mutually exclusive events because they cannot both occur simultaneously. Since their combined probability is equal to 0, the probability of throwing 3 or 6 on six-sided dice is

[P(3 cup 6) = P(3) + P(6) = frac{1}{6} + frac{1}{6} = frac{1}{3}]

If two events are independent, the probability of the two events (i.e., the joint probability) is the probability of the two events:

[P(text{A and B)} = P(A cap B) = P(A) cdot P(B)]

For example, if two coins are flipped, the probability for both coins is the tail

[P(T_1 cap T_2) = P(T_1) cdot P(T_2) = frac{1}{2} cdot frac{1}{2} = frac{1}{4}]

Please note that (P (A cap B) = P (B cap A) ).

If two events are mutually exclusive, their common probability is equal to 0:

[P(A cap B) = 0]

The independence of the two events can be ensured using the formula above. If equality is in force, the two events are said to be independent, otherwise the two events are said to be dependent. Formally, events A and B are independent only and only

[P(A cap B) = P(A) cdot P(B)]

  • In the example of two coins:

[P(T_1 cap T_2) = frac{1}{4}]


[P(T_1) cdot P(T_2) = frac{1}{2} cdot frac{1}{2} = frac{1}{4}]

so the next equality is true

[P(T_1 cap T_2) = P(T_1) cdot P(T_2) = frac{1}{4}]

These two events are thus independent, marked (T_1 { perp ! ! ! Perp} T_2 ).

  • Example of a fire in two houses (see part 4):

[P(A cap B) = 0.25]


[P(A) cdot P(B) = 0.6 cdot 0.45 = 0.27]

so the next equality is not valid

[P(A cap B) ne P(A) cdot P(B)]

These two events are thus dependent (or not independent), marked (No ! Perp ! ! ! Perp B ).

Suppose two events A and B and (P (B)> 0 ). The conditional probability of a given (knowing) B is the probability of event A, given that event B has occurred:

[P(A | B) = frac{P(A cap B)}{P(B)}]
[= frac{P(B cap A)}{P(B)} text{ (since } P(A cap B) = P(B cap A))]

Note that in general the probability of A to B is not the same as the probability of B to A, (P (A | B) not P (B | A) ).

From the conditional probability formula we can derive a multiplicative law:

[P(A | B) = frac{P(A cap B)}{P(B)} text{ (Eq. 1)}]
[P(A | B) cdot P(B) = frac{P(A cap B)}{P(B)} cdot P(B)]
[P(A | B) cdot P(B) = P(A cap B) text{ (multiplicative law)}]

If the two events are independent, (P (A cap B) = P (A) cdot P (B) )and:

  • (P (B)> 0 ), the conditional probability changes

[P(A | B) = frac{P(A cap B)}{P(B)}]
[P(A | B) = frac{P(A) cdot P(B)}{P(B)}]
[P(A | B) = P(A) text{ (Eq. 2)}]

  • (P (A)> 0 ), the conditional probability changes

[P(B | A) = frac{P(B cap A)}{P(A)}]
[P(B | A) = frac{P(B) cdot P(A)}{P(A)}]
[P(B | A) = P(B) text{ (Eq. 3)}]

Equations 2 and 3 mean that knowing that one event occurred has no effect on the probability of the outcome of another event. This is, in fact, the definition of independence: if knowing that one event occurred does not help predict (does not affect) the outcome of another event, the two events are essentially independent.

Bayesian theorem

The Bayes theorem can be derived from the formulas of conditional probability and multiplicative law:

[P(B | A) = frac{P(B cap A)}{P(A)} text{ (from conditional probability)}]
[P(B | A) = frac{P(A cap B)}{P(A)} text{ (since } P(A cap B) = P(B cap A))]
[P(B | A) = frac{P(A | B) cdot P(B)}{P(A)} text{ (from multiplicative law)}]

which corresponds

[P(A | B) = frac{P(B | A) cdot P(A)}{P(B)} text{ (Bayes’ theorem)}]


Suppose the following problem to illustrate the conditional probability and the Bayes theorem:

A blood test is performed to determine the presence of the disease. When a person has a disease, the test can reveal the disease in 80 percent of cases. When the disease is not present, the test is negative in 90% of cases. Experience has shown that the probability of the occurrence of the disease is 10%. The researcher wants to know the probability that an individual has the disease because the test result is positive.

To answer this question, the following events are defined:

  • P: The test result is positive
  • D: The person has a disease

In addition, we illustrate the sentence with a tree diagram:

(The sum of all four scenarios must be equal to 1, because these 4 scenarios include all possible cases.)

We look for the probability that an individual has the disease because the test result is positive, (P (D | P) ). According to the conditional probability formula (Equation 1) we have:

[P(A | B) = frac{P(A cap B)}{P(B)}]

Our problems:

[P(D | P) = frac{P(D cap P)}{P(P)}]
[P(D | P) = frac{0.08}{P(P)} text{ (Eq. 4)}]

From the tree diagram, we can see that a positive test result is possible in two scenarios: (i) when a person has a disease or (ii) when a person does not actually have a disease (because the test is not always correct). To find the probability of a positive test result (P (P) ), we need to summarize these two scenarios:

[P(P) = P(D cap P) + P(bar{D} cap P) = 0.08+0.09=0.17]

Eq. 4 then comes

[P(D | P) = frac{0.08}{0.17} = 0.4706]

The probability of disease when the test result is positive is only 47.06%. This means that in this particular case (by the same percentages), an individual has less than one chance of getting sick knowing that his or her test is positive!

This relatively small percentage is due to the fact that the disease is quite rare (only 10% of the population suffers) and that the test is not always correct (sometimes it detects the disease even if it is not present, and sometimes it does not detect it even if it is present ). As a result, a higher proportion of healthy people have a positive result (9%) compared to the percentage of people who have a positive result and actually have the disease (8%). This explains why several diagnostic tests are often performed before the test result is reported, especially for rare diseases.

Based on the disease and diagnostic test presented above, we explain the most common accuracy measures:

  • False negatives
  • False positives
  • Sensitivity
  • Accuracy
  • Positive predictive value
  • Negative forecast value

Before diving into the details of these precision measurements, here is an overview of the measures and a tree diagram with tags added for each of the 4 scenarios:

Adapted from Wikipedia

False negatives

False negative (FN) is the number of people marked incorrectly No disease or condition, when in reality it is present. It’s like telling women who are 7 months pregnant that she’s not pregnant.

From the tree diagram we have:

[FN = P(D cap bar{P}) = 0.02]

False positives

False positives (FPs) are the number of people who are mislabeled as a disease or condition, when in reality it is No present. It’s like telling a man she’s pregnant.

From the tree diagram we have:

[FP = P(bar{D} cap P) = 0.09]


The sensitivity of a test, also called the recovery procedure, measures the ability of a test to detect a condition when a disease is present (percentage of patients who have been diagnosed with the disease):

[ Sensitivity = frac{TP}{TP + FN}]

where TP is a real positive.

From the tree diagram we have:

[ Sensitivity = frac{TP}{TP + FN} = P(P|D) = 0.8]


The specificity of a test measures the ability of a test to correctly rule out a disease in the absence of the disease (percentage of healthy people who have been found to be disease-free):

[Specificity = frac{TN}{TN + FP}]

where TN are real negatives.

From the tree diagram we have:

[Specificity = frac{TN}{TN + FP} = P(bar{P} | bar{D}) = 0.9]

Positive predictive value

A positive predictive value, also called accuracy, is the percentage of positives that corresponds to the occurrence of a condition, so the percentages of positive results that are true positive results:

[PPV = frac{TP}{TP+FP}]

From the tree diagram we have:

[PPV = frac{TP}{TP+FP} = P(D | P) = frac{P(D cap P)}{P(P)}]
[= frac{0.08}{0.08+0.09} = 0.4706]

Negative forecast value

The negative predictive value is the proportion of negatives corresponding to the absence of conditions, so the proportions of true negative results are:

[NPV = frac{TN}{TN + FN}]

From the tree diagram we have:

[NPV = frac{TN}{TN + FN} = P(bar{D} | bar{P}) = frac{P(bar{D} cap bar{P})}{P(bar{P})}]
[= frac{0.81}{0.81+0.02} = 0.9759]

To use the formula in Title-2, must be able to calculate the number of possible elements.

There are 3 main computational techniques in probability:

  1. Multiplication
  2. Permutation
  3. Combination

See below how to calculate the number of possible elements for similar results.


The multiplication is as follows:

[#(A times B) = (#A) times (#B)]

where ( # ) is the number of elements.


In the restaurant, the customer has to choose an appetizer, main course and dessert. The restaurant serves 2 appetizers, 3 main courses and 2 desserts. How many different choices are possible?

There are 12 different options (i.e. (2 cdot 3 cdot 2 )).


The number of permutations is as follows:

[P^r_n = n times (n – 1) times cdots times (n – r + 1) = frac{n !}{(n – r)!}]

with (r ) length, (n ) number of elements, and (r le n ). Please note that (0! = 1 ) and (k! = k times (k – 1) times (k – 2) times cdots times 2 times 1 ) if (k = 1, 2, points)

Order is important in permutations!


Calculate the permutations of series length 2 (A = {a, b, c, d } ), without repeating the letter. How many permutations do you find?


[P^4_2 = frac{4!}{(4-2)!} = frac{4cdot3cdot2cdot1}{2cdot1} = 12]

In R


x <- c("a", "b", "c", "d")

# See all different permutations
perms <- permutations(
  n = 4, r = 2, v = x,
  repeats.allowed = FALSE
##       [,1] [,2]
##  [1,] "a"  "b" 
##  [2,] "a"  "c" 
##  [3,] "a"  "d" 
##  [4,] "b"  "a" 
##  [5,] "b"  "c" 
##  [6,] "b"  "d" 
##  [7,] "c"  "a" 
##  [8,] "c"  "b" 
##  [9,] "c"  "d" 
## [10,] "d"  "a" 
## [11,] "d"  "b" 
## [12,] "d"  "c"
# Count the number of permutations
## [1] 12


The number of combinations is as follows:

[C^r_n = frac{P^r_n}{r!} = frac{n !}{r!(n – r)!} = {n choose r}]
[= frac{n}{r} times frac{n – 1}{r – 1} times dots times frac{n – r + 1}{1}]

with (r ) length, (n ) number of elements, and (r le n ).

The order is No important combinations!


What is the probability that there are 3 girls and 2 boys in a family of 5 children? Assume that the probabilities of having a girl and a son are the same.


  • Number of three girls and 2 boys (favorable cases): (C ^ 3_5 = {5 select 3} = frac {5!} {3! (5-3)!} = 10 )
  • Number of possible cases: (2 ^ 5 = 32 )

( Rightarrow P (3 text {girls and 2 boys}) = frac { text {# favorable cases}} { text {# possible cases}} ) [= frac{10}{32} = 0.3125]

In R

  • Number of 3 girls and 2 boys:
choose(n = 5, k = 3)
## [1] 10
## [1] 32

Probability of 3 girls and 2 boys:

choose(n = 5, k = 3) / 2^5
## [1] 0.3125

Thanks for reading. I hope this article helped you understand the most important formulas and concepts in probability theory.

As always, if you have a question or suggestion related to the topic covered in this article, add it as a comment so other readers can benefit from the discussion.

every time a new article is published.


Please enter your comment!
Please enter your name here