Statistics and Probability

CourseMachine Learning
SemesterS1 2023

Statistics and Probability

  • Machine Learning is built around the analysis of data, and building models based on data
  • Require a statistical and probabilistic approach to analyse data.

Discrete Random Variables

  • A sample of data is drawn from some probabilit distribution.
  • For example, a roll of a dice is an example of a probabilistic situation with discrete outcomes.
  • The probability of obtaining a value ii from a 6-sided dice roll (e.g. rolling a 3 - i=3i=3) is given by:

pi=Pr(x=vi),i=1,...,6 p_i = Pr(x=v_i), i=1,...,6

  • We also know (from axioms) that the following must be true:

pi0 p_i \ge 0

i=1mpi=1\sum_{i=1}^{m} p_i=1

Continuous Random Variables.

  • We now consider some xRx\in\R - that is, some value of xx that can take on any real value.
  • From axioms we know that the probability that some value of x lies in the interval (a,b)(a,b) is given by:

Pr[x(a,b)]=abp(x)dx Pr[x\in (a,b)]=\int_{a}^{b} p(x) dx

  • Where p(x)p(x) is the probabilit ydensity function.
  • As before, we know that the following properties are true:

p(x)0 p(x) \ge 0

p(x)dx=1 \int_{-\infty}^{\infty} p(x) dx = 1

  • Computationally, sample from distributions (such as for random number generation) is important for ML
  • Typically, we assume that distributions for the variables in our data, and build models/estimates from there.
  • Important things in ML:
    • Bayes rule, conditional probability
    • Expected value, summary statistics
    • Multivariate distributions.