Vaibhav Gupta AI Researcher | Software Engineer


These notes closely follow Introduction to Probability, book and video lectures, by Dimitri P. Bertsekas and John N. Tsitsiklis.

Probability Basics

We go over the basics of probabilistic modelling, including the three probability axioms. Then we discuss conditional probability and Bayes' theorem. We end with the topic of Independence.

Discrete Random Variables

In this note we go over what Discrete Random Variables are, and how we calculate their Probability Mass Functions. We look at a few common types of discrete variables. Finally we discuss Expectation, Variation, Conditional Probabilities and Independence of discrete variables.

Continuous Random Variables

We study continuous random variables and their probability density functions. PDFs replace PMFs from the discrete case. We then discuss Cumulative Densities, which can be used to unify calculations for both discrete and continuous variables. There is a near one-one correspondence between the topics in discrete and continuous cases.


Given the PDF/PMF of a random variable $X$, how do we get the PDF/PMF of a derived random variable $Y = f(X)$? This is exactly what we will be going over in this short note. We will also contrast linear transformations of a random variables with linear transformations of a function.

Further Topics on Random Variables

We have already seen how to derive the distribution of a function of a random variable $Y = g(X)$, given the distribution of $X$ itself. Here we learn how to derive distribution of the convolution of two different variables, and calculate their correlation and covariance. Covariance and Correlation play an important role in predicting value of one random variable, given another random variable (machine learning).

Limit Theorems

Probability was introduced as the measure of likelihood of an event, but we've been picturing it as the frequency of occurrence of the event. Similarly, expected value was introduced as the weighted average of all possible values, but we've been thinking of it as the average value of a variable over multiple repetitions of the experiment. This note goes into why these are all valid assumptions.