[Chapter7] self-attention

따옹 2023. 6. 17. 11:31

7.2 Transformers : Self-Attention Network

Transformers

RNN의 한계점으로 인해서 recurrent connection을 제거

input vectors (𝑥1,…,𝑥𝑛)map to output vectors (𝑦1,…,𝑦𝑛)

Transformers는 encoder-decoder 구조 the encoder maps an input sequence of symbol representations (𝑥1,…,𝑥𝑛)to a sequence of continuous representations 𝐳=(𝑧1,…,𝑧𝑛)

Given 𝒛, the decoder then generates an output sequence (𝑦1,…,𝑦𝑛)of symbols one element at a time.

Self-attentionTransformer의핵심아이디어

RNN이없이도임의의큰컨텍스트에서정보를직접추출하고사용가능

QKV 모두 인코더, QKV 모두 디코더로 병렬처리 가능

같은 차원의 Vector의 Seq를 새로 뽑아봄(음성인식률을 더욱 높이는 것을 새로 뽑음)

7.2.3 Positional Encoding : Modeling Input order

상대위치를 학습하기 위해서 더해주는 vecotr

각 차원을 0~1사이로 더해줌, 의미가 있는 Positional encoding vector를 찾음

논문에서 찾은 내용은..

sin, cos 합인 이유는?