$$ \newcommand{\g}{\,|\,} \newcommand{\pdd}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\te}{\!=\!} \newcommand{\ttimes}{\!\times\!} $$

MLPR background self-test

Please answer these questions before the end of the first week of teaching. If more than one or two of these questions are a struggle, please seriously consider taking a different course.

Make sure to have a proper attempt before looking at the answers. If necessary, work through tutorials on probability, linear algebra, and calculus first. If you look at the answers before you’ve had an honest attempt, it will be too easy to trick yourself into thinking you have no problems.

  1. Two binary outcomes have joint probability table: \[ \begin{array}{r|ll} P(x,y)& x\te1& x\te2\\ \hline y\te1 & 0.1 & 0.3 \\ y\te2 & 0.2 & 0.4 \\ \end{array} \]

    1. What is the conditional probability distribution \(P(x\g y\te1)\)?

    2. Are the two variables independent? Justify your answer.

  2. Random variable \(X\) has mean \(\mu_x\) and standard deviation \(\sigma_x\). Similarly, random variable \(Y\) has mean \(\mu_y\) and standard deviation \(\sigma_y\). The variables \(X\) and \(Y\) are independent.

    An outcome from random variable \(Z\) is generated by combining the two outcomes from the random variables above as follows: \(z = x + 3y\).

    What is the mean and standard deviation of \(Z\)?

    1. Show that for any discrete random variables \(X\) and \(Y\), and any outcomes \(x\) and \(y\): \[ P(X\te x, Y\te y) \le P(Y\te y). \] Also state when the inequality is an equality.

    2. Does the inequality \(p(x,y) \le p(y)\) hold for the joint and marginal probability density functions (PDFs) of real-valued random variables \(x\) and \(y\)? Justify your answer.

  3. Two matrices are:

    \[ A = \left[ \begin{array}{ccc} 1 & 3 & 5 \\ 4 & 6 & 0 \\ \end{array} \right],\quad B = \left[ \begin{array}{ccc} 0 & 5 & 2 \\ 7 & 9 & 1 \\ 8 & 2 & 3 \\ \end{array} \right]. \]

    What shape is \(C\te AB\), and what is \(C_{2,3}\)?

  4. \(A\) is a \(2\ttimes3\) matrix, a matrix with 2 rows and 3 columns.

    \(X\) is a \(N\ttimes 3\) matrix, where each row gives one of a set of \(N\) points scattered around a 3-dimensional space.

    1. What is the size of the matrix \(Y = X A^\top A\)? What can you say about the geometrical arrangement of the points represented by rows of this matrix? (Hint: consider \(X A^\top\) first.)

    2. Could the matrix \(A^\top A\) be invertible? Justify your answer.

    3. Could the matrix \(A A^\top\) be invertible? Why? If it can be inverted, is the inverse \(A^{-\top} A^{-1}\)? (Notation: \(A^{-\top} = (A^\top)^{-1} = (A^{-1})^\top\).)

  5. Given the function \(f(x,y) = (2x^2 + 3xy)^2\), find the vector of partial derivatives: \[ \nabla f = \left[ \begin{array}{c} \pdd{f}{x} \\[1ex] \pdd{f}{y} \end{array} \right]. \]

    1. For a small vector \(\delta \te [\delta_x~\delta_y]^\top\), give an interpretation of the inner-product \((\nabla f)^\top\delta\).

    2. Hence show that for small moves from a position \((x,y)\), the direction that will increase the function \(f\) the most points along \(\nabla f\).

      (Hint: recall that \(\mathbf{a}^\top\mathbf{b} = |\mathbf{a}||\mathbf{b}|\cos\theta\), where \(\theta\) is the angle between the two vectors \(\mathbf{a}\) and \(\mathbf{b}\), and \(|\cdot|\) gives the length of a vector.)

  6. Derive the variance of a random variable drawn from a Uniform\([0,1]\) distribution.

    If you aren’t sure where to start, remind yourself of the definition of the variance of a probability distribution. You’ll need to do some integration involving the PDF of the Uniform distribution.

  7. Programming question for those not from CS backgrounds:

    In the programming language of your choice, write a function to return the minimum value in a list of numbers. If your favourite programming language has a min() function, please ignore that fact, and write the algorithm yourself in the lower-level tools provided by the language.

    This course assumes you have done some programming before!