1) Universal Approximation Theorem implies:
A) Any function can be approximated with enough neurons
2) Softmax ensures:
A) Outputs sum to 1
3) Are deep networks always overparameterized?
A) It depends on definition
4) Increasing depth without regularization increases:
A) Variance
5) A perceptron cannot solve XOR because:
A) Data not linearly separable
6) Vanishing gradients are worst with:
A) Sigmoid
7) A very high learning rate may:
A) Prevent convergence
8) If validation loss increases while training loss decreases:
A) Overfitting
9) ReLU avoids vanishing gradients because:
A)It has constant gradient for positive inputs
10) Dropout is approximately equivalent to L2 regularization:
A) Under certain assumptions
11) Does dropout always improve test accuracy?
A) No – wrong
It depends on model capacity
12) If all weights initialized to zero:
A) Symmetry problem occurs
