*These notes closely follow Introduction to Probability, book and video lectures, by Dimitri P. Bertsekas and John N. Tsitsiklis.*

We go over the basics of probabilistic modelling, including the three probability axioms. Then we discuss conditional probability and Bayes' theorem. We end with the topic of Independence.

In this note we go over what Discrete Random Variables are, and how we calculate their Probability Mass Functions. We look at a few common types of discrete variables. Finally we discuss Expectation, Variation, Conditional Probabilities and Independence of discrete variables.

We study continuous random variables and their probability density functions. PDFs replace PMFs from the discrete case. We then discuss Cumulative Densities, which can be used to unify calculations for both discrete and continuous variables. There is a near one-one correspondence between the topics in discrete and continuous cases.

Given the PDF/PMF of a random variable $X$, how do we get the PDF/PMF of a derived random variable $Y = f(X)$? This is exactly what we will be going over in this short note. We will also contrast linear transformations of a random variables with linear transformations of a function.

We have already seen how to derive the distribution of a function of a random variable $Y = g(X)$, given the distribution of $X$ itself. Here we learn how to derive distribution of the convolution of two different variables, and calculate their correlation and covariance. Covariance and Correlation play an important role in predicting value of one random variable, given another random variable (machine learning).

Probability was introduced as the measure of likelihood of an event, but we've been picturing it as the frequency of occurrence of the event. Similarly, expected value was introduced as the weighted average of all possible values, but we've been thinking of it as the average value of a variable over multiple repetitions of the experiment. This note goes into why these are all valid assumptions.

Statistical Inference is the process of extracting information about an unknown variable or an unknown model from available data. The Bayesian approach essentially tries to move the field of statistics back to the realm of probability theory. The unknown variables are treated as random variables with known prior distrubtions.

In Bayesian view, the unknown variables are treated as random variables with known prior distrubtions. By contrast, in Classical Inference, the unknown quantity $\theta$ is viewed as a deterministic constant that happens to be unknown. It then strives to develop an estimate of $\theta$ that has some performance guarantees.