COMP4702 Revision Session

Don’t focus on memorising formulae, need to understand it
- If you need the formula, will be given
Gaussian Process, GMMs - assessable
- Calculations doing them - possibly, but tedious (probably not?)
How to do the Convolution question on the practice exam
- Calculations are easy given that kernel is 2x2 and matrix is 8x8.
- If it doesn’t line up, then use zero-padding on the matrix
“Will pooling type always be stated?” -> Yes
- Really only two types - max pooling and average pooling

Practice Exam

B
D
A
C?
C - No Cov parameters, so spherical clusters / mixing coefficients
D - Still supervised learning/gradient descent/backpropagation
B
C
A
A
Want to know that from the picture shown - not going to get 100% performance regardless of how complex the model is, simply because the features do not separate the classes (they overlap).

Generalisation performance is on future, unseen data. ~~Assuming that ata not used for training follows the same distribution as the training ~~

We can see in the Figure that the classes in this dataset overlap. This means that

Euclidean distance results in the straight line dividing the two datapoints.

Generalisation is about performance on future unseen data.
Classes overlap, assuming that the test data follows the same distribution
Complex classes can over-fit to the training data.
Issue of estimation of generalisation performance
Generalisation error must be estimated in practice from a finite set of data (e.g., hold-out) or cross-validation
So, estimate is noisy.

Q3 a) “The loss function (to update the critic and generator)” b) Generate random number vector from Normal/Gaussian distribution with mean 0 and covariance matrix which is the identity matrix which is input to the model. Then pass through the model to get the output. c) Separate learning rates give more flexibility in allowing the critic and generator to learn at different rates. This means another hyper-parameter that needs to be tuned - leads to extra computational expense and time to properly configure the model.

Q5. a) Can just use a look-up table to get 100% performance on the training data. But this is not generalisable.

b) Hold-out, cross-validation for both $E_\text{train}$ and $E_\text{new}$

c) If you use a single training set then training estimate is going to be poor ( $\bar E_\text{train}$ is based on one sample but not an average). Method chosen in (b) would reduce the variance of the error of our estimates of $E_\text{train}$ and $E_\text{new}$ .

This is the argument to why bagging works.