COMP4702 Revision Session
COMP4702 Revision Session
- Don’t focus on memorising formulae, need to understand it
- If you need the formula, will be given
- Gaussian Process, GMMs - assessable
- Calculations doing them - possibly, but tedious (probably not?)
- How to do the Convolution question on the practice exam
- Calculations are easy given that kernel is 2x2 and matrix is 8x8.
- If it doesn’t line up, then use zero-padding on the matrix
- “Will pooling type always be stated?” -> Yes
- Really only two types - max pooling and average pooling
Practice Exam
B
D
A
C?
C - No Cov parameters, so spherical clusters / mixing coefficients
D - Still supervised learning/gradient descent/backpropagation
B
C
A
A
Want to know that from the picture shown - not going to get 100% performance regardless of how complex the model is, simply because the features do not separate the classes (they overlap).
Generalisation performance is on future, unseen data. ~~Assuming that ata not used for training follows the same distribution as the training ~~
We can see in the Figure that the classes in this dataset overlap. This means that
Q2
- Euclidean distance results in the straight line dividing the two datapoints.
- Generalisation is about performance on future unseen data.
- Classes overlap, assuming that the test data follows the same distribution
- Complex classes can over-fit to the training data.
- Issue of estimation of generalisation performance
- Generalisation error must be estimated in practice from a finite set of data (e.g., hold-out) or cross-validation
- So, estimate is noisy.
Q3 a) “The loss function (to update the critic and generator)” b) Generate random number vector from Normal/Gaussian distribution with mean 0 and covariance matrix which is the identity matrix which is input to the model. Then pass through the model to get the output. c) Separate learning rates give more flexibility in allowing the critic and generator to learn at different rates. This means another hyper-parameter that needs to be tuned - leads to extra computational expense and time to properly configure the model.
Q5. a) Can just use a look-up table to get 100% performance on the training data. But this is not generalisable.
b) Hold-out, cross-validation for both and
c) If you use a single training set then training estimate is going to be poor ( is based on one sample but not an average). Method chosen in (b) would reduce the variance of the error of our estimates of and .
This is the argument to why bagging works.