-
BackpropagationAI\ML\DL/Deep learning theory 2023. 5. 7. 19:38๋ฐ์ํ
Backpropagation
์ฌ์ธต ์ ๊ฒฝ๋ง์์๋
1. ํน์ ํ๋ผ๋ฏธํฐ(weight/bias) ์ ๋ํ Loss function์ ํธ๋ฏธ๋ถ ๊ฐ์ธ ๊ทธ๋๋์ธํธ๋ฅผ ๊ตฌํ๊ณ ,
2. SGD (Stochastic gradient descent) ๋ฑ์ Optimizer ๋ก ์ต์ ์ ํ๋ผ๋ฏธํฐ๋ฅผ ์ ๋ฐ์ดํธํ๋ค.1๋ฒ์์ ์ ๊ฒฝ๋ง์ ๊น์ ๊ณณ์ ์๋ weight ์ ๋ํ ํธ๋ฏธ๋ถ์ ๊ตฌํ๊ธฐ ์ํด์๋ Chain rule ์ ํ์ฉํด์ผ ํ๋๋ฐ,
Chain rule์ ๋ฏธ๋ถ์ ๋ค๋ก ์ ๋ฌํ๋ฉด์ ๊ณฑํ๋ ๊ฑฐ๋๊น ์ด ๋ฐฉ๋ฒ์ Backpropagation์ด๋ผ๊ณ ํ๋ค.
๋ค์๊ณผ ๊ฐ์ ์ ๊ฒฝ๋ง์์ Backpropagation์ ํตํด ํ๋ ฌ๋ก ํํ๋ weight ์ ๋ํด loss function ์ ๋ฏธ๋ถํด๋ณด์.
๊ทธ๋ฆผ์ ๊ฐ ๋ ธ๋์ ๋ํด ๋ค์๊ณผ ๊ฐ์ด ์์ ์ธ ์ ์๋ค.
$$\begin{align} & \textbf{d}_{1}=\textbf{n}_{0}\textbf{W}_{1}+\textbf{b}_{1}\\&\textbf{n}_{1}=\textbf{f}_{1}(\textbf{d}_{1})\\&\textbf{d}_{2}=\textbf{n}_{1}\textbf{W}_{2}+\textbf{b}_{2}\\&\textbf{n}_{2}=\textbf{f}_{2}(\textbf{d}_{2})\ \end{align}$$
Loss function์ $L=(\hat{y_{1}}-y_{1})^{2}+(\hat{y_{2}}-y_{2})^{2}$ ์ด๋ค.
$\textbf{n}_{2}=[\begin{matrix}
\hat{y_{1}} & \hat{y_{2}} \\
\end{matrix}]$$$\begin {align*} L &=(\hat{y_{1}}-y_{1})^{2}+(\hat{y_{2}}-y_{2})^{2} \\&=(\textbf{n}_{2}-\textbf{y})(\textbf{n}_{2}-\textbf{y})^{T} \\ &= (\left [ \begin{matrix} \hat{y_{1}} & \hat{y_{2}} \\ \end{matrix} \right ]-\left [ \begin{matrix} y_{1} & y_{2} \\ \end{matrix} \right ])(\left [ \begin{matrix} \hat{y_{1}} \\ \hat{y_{2}} \end{matrix} \right ]-\left [ \begin{matrix} y_{1} \\ y_{2} \end{matrix} \right ]) \\&=\left [ \begin{matrix} \hat{y_{1}}-y_{1} & \hat{y_{2}}-y_{2} \\ \end{matrix} \right ]\left [ \begin{matrix} \hat{y_{1}}-y_{1} \\ \hat{y_{2}}-y_{2} \end{matrix} \right ] \\&= (\hat{y_{1}}-y_{1})^{2}+(\hat{y_{2}}-y_{2})^{2} \end{align*}โ $$
์ฆ, Loss function ์ $L=(\textbf{n}_{2}-\textbf{y})(\textbf{n}_{2}-\textbf{y})^{T}$ ์ ๊ฐ์ด ๋ฒกํฐ๋ก ๋ํ๋ด์๋ค.
$L$ ์ ํ๋ ฌ $W$ ๋ก ๋ฏธ๋ถํ๊ธฐ ์ํด $W$ ๋ฅผ ๋ฒกํฐํ ํ ๊ฒ์ด๋ค.
$$\mathbf{w_{1}}=\textrm{vec}(\mathbf{W_{1}})$$
$$\mathbf{w_{2}}=\textrm{vec}(\mathbf{W_{2}})$$
$L$ ์ด๋ผ๋ ๋ฒกํฐํจ์๋ฅผ $\mathbf{w}$ ์ด๋ผ๋ ๋ฒกํฐ๋ก ํธ๋ฏธ๋ถํ๋ ค๋ฉด chain rule ์ ์ฌ์ฉํ ์ ์๋ค.
(์ฐธ๊ณ ํ๊ธฐ https://deep-learning-basics.tistory.com/1 5๋ฒ)
๋ค๋ณ์ ํจ์์ ๋ฏธ๋ถ
1. ํธ๋ํจ์ ์ผ๋ณ์ ํจ์ $f: โ \rightarrow โ$ ์ ๋ํจ์ $f' (x) $ ์ ๋ค์๊ณผ ๊ฐ์ด ์ ์ํ๋ค. $ f' (x)= \displaystyle \lim_{h \to 0}\frac{f(x+h)-f(x)}{h} $ ์ผ๋ณ์ ํจ์ $f $ ์ ๋ํจ์ $f' $ ์ ์ $x$์์ '$x$ ๊ฐ ๋ณํ
deep-learning-basics.tistory.com
$$\frac{\partial L}{\partial \mathbf{w^{T}_{2}}}=\frac{\partial \textbf{d}_{2}}{\partial \mathbf{w^{T}_{2}}}\frac{\partial \textbf{n}_{2}}{\partial \mathbf{d^{T}_{2}}}\frac{\partial L}{\partial \mathbf{n^{T}_{2}}} \tag{1}$$
$$(\textbf{w}_{2}\to\textbf{d}_{2}\to \textbf{n}_{2}\to\textbf{L})$$
$$\frac{\partial L}{\partial \mathbf{w^{T}_{1}}}=\frac{\partial \textbf{d}_{1}}{\partial \mathbf{w^{T}_{1}}}\frac{\partial \textbf{n}_{1}}{\partial \mathbf{d^{T}_{1}}}\frac{\partial \mathbf{d_{2}}}{\partial \mathbf{n^{T}_{1}}}\frac{\partial \textbf{n}_{2}}{\partial \mathbf{d^{T}_{2}}}\frac{\partial L}{\partial \mathbf{n^{T}_{2}}} \tag{2}$$
$$(\textbf{w}_{1}\to\textbf{d}_{1}\to \textbf{n}_{1}\to \textbf{d}_{2}\to \textbf{n}_{2}\to L)$$
์์์ ๋ค์ชฝ์ผ๋ก ๋ฏธ๋ถํ๊ณ ๊ณฑํ์ฌ ํธ๋ฏธ๋ถ์ ๊ตฌํ๋ค.
์ (1) ๋ถํฐ ๋ฏธ๋ถ๊ฐ์ ๊ตฌํด๋ณด์.
$$\begin{align} & \frac{\partial \textbf{d}_{2}}{\partial \textbf{w}^{T}_{2}}=\textbf{n}_{1}^{T}\otimes\textbf{I} \\& \frac{\partial \textbf{n}_{2}}{\partial \textbf{d}^{T}_{2}}=\textrm{diag}(\textbf{f}_{2}^{'}(\textbf{d}_{2})) \end{align}$$
์ (2) ๋ฅผ ๊ณ์ฐํ๋ฉด ๋ค์๊ณผ ๊ฐ๋ค.
$$\begin{align} & \frac{\partial \textbf{d}_{1}}{\partial \mathbf{w^{T}_{1}}}=\textbf{n}_{0}^{T}\otimes\textbf{I} \\& \frac{\partial \textbf{n}_{2}}{\partial \mathbf{d^{T}_{2}}}=\textrm{diag}(\textbf{f}_{1}^{'}(\textbf{d}_{1})) \\& \frac{\partial \mathbf{d_{2}}}{\partial \mathbf{n^{T}_{1}}}=\textbf{W}_{2} \\& \frac{\partial \textbf{n}_{2}}{\partial \textbf{d}^{T}_{2}}=\textrm{diag}(\textbf{f}_{2}^{'}(\textbf{d}_{2})) \\& \frac{\partial L}{\partial \mathbf{n^{T}_{2}}}=2(\textbf{n}^{T}_{2}-\textbf{y}) \end{align}$$'AI\ML\DL > Deep learning theory' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
์ด์ง๋ถ๋ฅ์์ Maximum Likelihood Estimation (MLE) (1) 2023.05.10 Logistic Regression (0) 2023.05.08 Momentum, RMSProp Optimizer (0) 2023.05.06 Stochastic Gradient Descent (1) 2023.05.06