YeonJu

본 포스팅은 AI502수업에서 제가 새로 알게 된 부분만 정리한 것입니다.

RNN

timestamp가 있으면 RNN 사용한다.
주로 cross entropy loss를 사용한다.
Vanishing gradient
- long term dependency를 학습하지 못함
- Initialization을 identity matrix로 하고 + ReLU activation
- regularizer
- skip-connections
  - Leaky units
Gradient Clipping
- exploding gradient를 막음
- normalize the gradient
- $\rVert g \rVert \gt v, g \leftarrow \frac{g_v}{\rVert g \rVert}$
Long term, short term dependency 둘다 학습
- clockwork RNN
Recursive Neural Network
- depth를 줄인다.