# 1.7  2018/03/30 17:52:47

---- General questions ------------------------------------------------
Q: How can I display the images of characters in the data set?
A: There is a Matlab sample function:
    /afs/inf.ed.ac.uk/group/teaching/inf2b/cwk2/matlab_codes/dispImage.m

Q: Are there any templates for the functions/scripts I should submit?
   Yes, please see:
     /afs/inf.ed.ac.uk/group/teaching/inf2b/cwk2/matlab_codes (for Matlab)
     /afs/inf.ed.ac.uk/group/teaching/inf2b/cwk2/python_codes (for Python)
   
Q: How should I measure the user time taken?
A: Ideally it should be measured with a function equivalent to the UNIX/Linux 
   time command (/usr/bin/time), which gives three measurements - 'real time',
   'user CPU time', and 'system time', where 'user CPU time' is the
   one requested in the coursework. However, none of Matlab time
   measurement functions seem to be equivalent to the time command.
   Thus, please use 'tic' and 'toc' for Matlab.
   For Python, please use time.clock().

   Please note that user time should be measured on a DICE machine.
   If you have measured the time on a non-DICE machine, e.g. your
   laptop, and you have no time to try a DICE machine, it is fine that
   report the time in your report, but please indicate so, and
   describes the computing enviroment you used.

Q: I get a (slightly) different user time every time I measure. Should
   I take the average? 
A: No, it is not necessary for this coursework - it's fine that you
   measure the time only once, and report it. 

---- Submissions ------------------------------------------------------
Q: Are we restricted to submitting only the files described in the
   assignment or can we make separate functions and turn them all in?
A: The latter. Please submit all pieces of your code including those
   for helper functions, so that markers can run your code. 

Q: What should I submit?
   Read the coursework specifications first, and the checklist:
     http://www.inf.ed.ac.uk/teaching/courses/inf2b/coursework/inf2b_cwk2_2018_checklist.txt

---- Task 1 -----------------------------------------------------------
Q: My k-NN code takes hours to finish...
   How k-NN classification can be implemented efficiently?
   
A: K-NN requires the calculation of Euclidean distances between each
   pair of data points in a test set and those in a training set, which
   is the main bottleneck when data sets are large.

   The key technique for accelerating the run time of your Matlab or
   Python code is vectorisation. Generally speaking, the more number
   of data points you put into a vector or matrix and the more you
   reduce for-loops, the faster your code runs (as long as sufficient
   amount of memory is available).   

 Details:
   We assume that a test data set X is a N-by-D matrix, and a training
   data set Y is a M-by-D matrix, where N is the number of test
   instances, M is the number of training instances, and D is the
   dimension of the feature vector space.  

         / X_1 \      / x_{11} ... x_{1D} \  
   X =  |   .   | =  |    .           .    |       ... Test data set   
        |   .   |    |    .           .    |
         \ X_N /      \ x_{N1} ... x_{ND} /

         / Y_1 \      / y_{11} ... y_{1D} \
   Y =  |   .   | =  |    .           .    |      .... Training data set
        |   .   |    |    .           .    |
         \ Y_M /      \ y_{M1} ... y_{MD} /

   where X_i = (x_{i1}, ..., x_{iD}) denotes the i-th test data, and
         Y_j = (y_{j1}, ..., y_{jD}) denotes the j-th training data.

   (NB: Here, we use a row vector to denote the feature vector of a data,
        as opposed to a column vector in Inf2b lectures and notes) 
   (Notation: '_' is used here for subscripts, '^' for superscripts, like LaTeX)

   The (squared) Euclidean distance d_{ij} is given as
      d_{ij} = || X_i - Y_j ||^2 = (X_i - Y_j) * (X_i - Y_j)^T

   In k-NN, for each test data instance X_i, we need to calculate
         d_{i1}, ..., d_{iM}, 
   to find the k closest data points in Y, so that, for all test instances,
   we calculate distance matrix DI of N-by-M:
            / d_{11} ... d_{1M} \
      DI = |    .          .     |
           |    .          .     |
            \ d_{N1} ... d_{NM} /

   In Matlab, DI can be efficiently computed with pdist2() function, e.g.
      DI = pdist2(X, Y, 'squaredeuclidean');
   in Python,
      DI = scipy.spatial.distance.cdist(X, Y, 'sqeuclidean') .
   However, this coursework does not allow you to use those functions,
   but ask you to implement similar functions of your own. The following
   outlines how it can be done. 
 
   At first, note that d_{ij} can be decomposed in the following manner.
     d_{ij} = (X_i - Y_j) * (X_i - Y_j)^T
            = X_i * X_i^T - 2 X_i * Y_j^T + Y_j * Y_j^T
            
   Let       / X_1 * X_1^T \            / Y_1 * Y_1^T \
       XX = |      .        | ,   YY = |      .        | ,
            |      .        |          |      .        |
             \ X_N * X_N^T /            \ Y_M * Y_M^T /

   so that XX and YY are N-by-1 and M-by-1 matrices, respectively.
   Then DI can be obtained with standard matrix operations:

     DI = (XX,...,XX) - 2 * X * Y^T + (YY,...,YY)^T
           <-- M -->                   <-- N -->

   In Matlab, repmat() can be used to repeat copies of matrix.

   If the size of DI is too large for your computer to handle, and
   causing a lot of memory swapping in your machine, consider defining
   the matrix as a single precision variable rather than double.
   
   Further speed-up is possible if you discard terms that are not
   relevant to classification.

 Debugging tips:
   Coding for vectorisation is sometimes tricky and potentially buggy.
   Make sure, using a small number of samples, that your vectorised
   code gives the same distance as the one without vectorisation.

Q: Task1.3 implicitly requests us to name the variables dynamically for
   confusion matrices, which is not recommended generally.
A: You are right. The specification has been modified, so that you can
   use the same variable name 'cm' to save a confusion matrix to
   a file named 'cmk.mat', where k is the number of nearest neighbours.
   In case that you've implemented dynamic variables already, you do
   not need to change the code to meet this new specification.

---- Task 2 -----------------------------------------------------------
Q: Can I use the add-one smoothing?
A: Yes, if you'd like to, but please indicate it in your report.

Q: What should I do with the problem with log(0) = -infinity?
   (which makes the following calculation/operations useless)
A: You can avoid the issue by replacing 0 with a very small
   number, e.g. 1.0E-10.

---- Task 3 -----------------------------------------------------------
Q: How can I calculate covariance matrices using vectorisation?
A: See the lecture slides for Lecture 8.

Q: What should I do with the zero determinant of a covariance matrix?
A: Please try the logdet() function provided in the course directory.

Q: Which type of covariance matrix should I use - the one normalised
   by N or N-1? 
A: Please use the one normalised by N, because MLE is assumed.

Q: How many different approaches should I try for Task3.3?
A: See https://piazza.com/class/jc7iuf9dssg44f?cid=252