ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • VGG Net (2014.09)
    AI\ML\DL/๋…ผ๋ฌธ ๋ฆฌ๋ทฐ 2023. 9. 16. 14:25
    ๋ฐ˜์‘ํ˜•

    *  *  *

    ํ‘œ 1. VGGNet์˜ ConvNet ๊ตฌ์„ฑ

     

    VGGNet์€ 2014๋…„๋„ ILSVRC (ImageNet Large Scale Visual Recognition Challenge)์—์„œ ์ค€์šฐ์Šนํ•œ CNN ๋„คํŠธ์›Œํฌ์ด๋‹ค. VGGNet์˜ ๋…ผ๋ฌธ ์ œ๋ชฉ์€ "Very deep convolutional networks for large-scale image recognition"์œผ๋กœ, ๋„คํŠธ์›Œํฌ ๊นŠ์ด๋ฅผ ๊นŠ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ง€ ํ™•์ธํ•˜๋Š” ๋ชฉ์ ์„ ๊ฐ€์ง„๋‹ค. VGGNet์—๋Š” A, A-LRN, B, C, D, E๊ฐ€ ์žˆ๋Š”๋ฐ ๋ณธ ํฌ์ŠคํŒ…์€ 16๊ฐœ์˜ layer ๋ฅผ ๊ฐ€์ง„ D์— ๋Œ€ํ•ด์„œ ์ •๋ฆฌํ•œ๋‹ค.

     

    D๋ถ€๋ถ„๊ณผ ์•„๋ž˜์˜ MLP๋ฅผ ํ•ฉ์ณ ๊ทธ๋ฆผ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

     

     

     

    ํ‘œ 1์˜ D ๋ชจ๋ธ์„ ํ†ต๊ณผํ•œ output์˜ shape์„ ๊ณ„์‚ฐํ•ด๋ณด์•˜๋‹ค. conv3 ๋ฅผ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ ์ฑ„๋„ ์ˆ˜๋ฅผ ๊นŠ๊ฒŒ ๋งŒ๋“ค๊ณ  max pooling์„ ํ†ตํ•ด์„œ spatial size๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์—ฌ๊ฐ€๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  padding๊ณผ stride๋ฅผ ๊ฐ๊ฐ 1๋กœ ์ฃผ์–ด์„œ spatial size๋ฅผ ์œ ์ง€ํ•œ๋‹ค.

     

     

    ๋งˆ์ง€๋ง‰ conv3-512 ๋ฅผ ํ†ตํ•ด 7x7๊นŒ์ง€ ์‚ฌ์ด์ฆˆ๋ฅผ ์ค„์—ฌ์„œ FC layer๋ฅผ ํ†ต๊ณผํ•˜๋Š” ์ด์œ ๋Š” 7x7 ์˜ feature map์—์„œ ํ”ฝ์…€ ๊ฐ๊ฐ์ด ๊ฐ€์ง€๋Š” receptive field๋Š” ์ด๋ฏธ์ง€์˜ ์ „์ฒด๋ฅผ ํฌํ•จํ•˜์ง€๋Š” ๋ชปํ•˜๊ณ  ์ผ๋ถ€๋ฅผ ๋ณด๊ณ  ์žˆ์„ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ FC layer๋กœ ๋‹ค ์—ฐ๊ฒฐํ•ด์ฃผ์–ด์„œ ๋ชจ๋“  ์˜์—ญ์„ ๊ณ ๋ คํ•ด ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค.

     

    VGG Net์—์„œ ๋๊นŒ์ง€ conv + maxpooling ๋ฅผ ํ•ด์„œ 1x1๊นŒ์ง€ ์ค„์ด์ง€ ๋ชปํ•œ ์ด์œ ๋ฅผ ์ƒ๊ฐํ•ด๋ณด๋ฉด,

    • *loss landscape๊ฐ€ ๊ผฌ๋ถˆํ•ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๋“ฑ์˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ ˆ์ด์–ด๋ฅผ ๋„ˆ๋ฌด ๊นŠ๊ฒŒ ๋งŒ๋“ค๋ฉด underfitting์ด ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ
      *Li, Hao, et al. "Visualizing the loss landscape of neural nets." 62 Advances in neural information processing systems 31 (2018).
    • Max-pooling์„ ๋„ˆ๋ฌด ๋งŽ์ด ํ•œ๋‹ค๋ฉด ๊ณต๊ฐ„์ ์ธ ์ •๋ณด๋ฅผ ๋„ˆ๋ฌด ์žƒ๋Š”๋‹ค! ์–ด๋””์— ํŠน์ง•์ด ์œ„์น˜ํ•˜๋Š”์ง€์˜ ์ •๋ณด๊ฐ€ ๋„ˆ๋ฌด ๋ญ‰๋šฑ๊ทธ๋ ค์ง..

     

    FC layer๋ฅผ ํ†ต๊ณผํ•˜๋ฉด ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งค์šฐ ๋Š˜์–ด๋‚˜๊ฒŒ ๋˜๋Š”๋ฐ, ์‹ค์ œ๋กœ vgg16์˜ ์ „์ฒด ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜ ์ค‘ 80%์˜ ๋น„์ค‘์„ FC layer ๋‹จ์—์„œ ์ฐจ์ง€ ํ•˜๊ณ  ์žˆ๋‹ค. 

     

    VGG Net๋ณด๋‹ค ์›”๋“ฑํžˆ ์„ฑ๋Šฅ์ด ์ข‹์€ Inception Net, ResNet์€ conv layer ๋งˆ์ง€๋ง‰์— 1x1์ด ๋˜๋„๋ก GAP๋ฅผ ํ•ด๋ฒ„๋ฆฌ๋Š”๋ฐ, ์ด๋Š”

    1. 7x7๋กœ ์ค„์ธ ๋‹ค์Œ conv ๋ช‡ ๋ฒˆ ๋” ๊ฑฐ์ณ์„œ 1x1๊นŒ์ง€ ์ค„์ด๋ฉด ํ”ฝ์…€ ํ•˜๋‚˜์˜ receptive field๊ฐ€ ๊ฑฐ์˜ ์ด๋ฏธ์ง€ ์ „์ฒด ์ •๋ณด๋ฅผ ๋‹ด๊ฒŒ ๋œ๋‹ค.
    2. ๋ชจ๋ธ ์ž์ฒด๊ฐ€ ํ›จ์”ฌ ๊นŠ๋‹ค.

    ๋”ฐ๋ผ์„œ GAPํ•˜๋”๋ผ๋„ ๊ต‰์žฅํžˆ ์œ ์˜๋ฏธํ•œ ํŠน์ง•๋“ค์„ ๋‹ด์„ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค. 

    ๊ทธ๋ž˜์„œ GAP ์ดํ›„ fc ๋งŒ ํ†ต๊ณผํ•ด๋„ ์ถฉ๋ถ„ํ•˜๋‹ค. ์ฆ‰, MLP์— ์˜์กดํ•˜์ง€ ์•Š๊ณ  CNN์˜ ํšจ๊ณผ๋ฅผ ์ตœ๋Œ€๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

     

     

    ๋ชจ๋ธ ๊ตฌํ˜„

     

    1. import ํ•˜๊ธฐ

    import torch
    from torch import nn
    !pip install torchinfo
    from torchinfo import summary

    2. D ๋ถ€๋ถ„์„ dictionary๋กœ ์ €์žฅํ•˜๊ธฐ

    cfgs = {"D": [64, 64, "M", 128, 128, "M", 256, 256, 256, "M", 512, 512, 512, "M", 512, 512, 512, "M"]}

    ํ‘œ 1์˜ D (ConvNet).

    int ํ˜•์€ output_channel์„ ์˜๋ฏธํ•˜๊ณ , "M" (strํ˜•)์€ Max pooling์„ ์˜๋ฏธํ•œ๋‹ค.

     

    3. VGG ๋ชจ๋ธ ํด๋ž˜์Šค ์ •์˜

     

    1) ํด๋ž˜์Šค ์ •์˜

     

    • ConvNet ๋ถ€๋ถ„์€ make_layers ํ•จ์ˆ˜๋ฅผ ๋”ฐ๋กœ ์ •์˜ํ•˜์—ฌ ๋ ˆ์ด์–ด๋ฅผ cfg์˜ ์š”์†Œ์— ๋งž๊ฒŒ append ์‹œ์ผœ์ค„ ๊ฒƒ์ด๋‹ค.

    • ConvNet ์ดํ›„์— Adaptive average pooling์„ ์ ์šฉํ•ด์ฃผ๋Š” ์ด์œ : ์—๋Ÿฌ ๋ฐฉ์ง€์šฉ
      ๋งˆ์ง€๋ง‰ Max-pooling ์„ ํ•˜๋ฉด 512x7x7 ๊ฐ€ ๋˜์–ด์•ผ ํ•˜๋Š”๋ฐ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋กœ 224x224 ํฌ๊ธฐ๋ฅผ ์•ˆ์ฃผ๊ณ 
      64x64 ๋ฅผ ์ค€๋‹ค๋ฉด ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒƒ์ด๋‹ค (Max-pooling ๋‹ค์„ฏ๋ฒˆ ํ•˜๋ฉด 7x7์ด ์•„๋‹ˆ๋ผ 2x2 ๊ฐ€ ๋‚จ์œผ๋‹ˆ๊นŒ).
      ์ด๋Ÿฐ ์—๋Ÿฌ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ณต์ œ๋ฅผ ํ•ด์„œ๋ผ๋„ ConvNet ์ดํ›„์— 7x7๋กœ ๋Š˜๋ ค์ฃผ๋Š” ์—ญํ• ์„ ํ•˜๋Š”
      Adaptive average pooling์„ ์ ์šฉํ•ด์ค€๋‹ค.
    • classifier ๋ถ€๋ถ„์€ fc layer ์ค‘๊ฐ„์ค‘๊ฐ„์— relu, dropout์„ ์ ์šฉํ•ด์ฃผ์—ˆ๋‹ค.

     

    2) ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”

     

    • init_weights๊ฐ€ True์ผ ๊ฒฝ์šฐ, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๋ฅผ ํ•ด์ฃผ๋Š” ๋ถ€๋ถ„์ด๋‹ค. nn.modules()๋ฅผ ํ•ด์ฃผ๋ฉด ๋ชจ๋ธ์˜ ๋ชจ๋“  ๋ชจ๋“ˆ๋“ค (๋ ˆ์ด์–ด)๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค. nn.Conv2d ๋ ˆ์ด์–ด์— ๋Œ€ํ•ด์„œ๋Š” Kaiming He ๋ฐฉ์‹์œผ๋กœ weight initialization์„ ํ•ด์ฃผ๊ณ ,
      nn.Linear ๋ ˆ์ด์–ด์— ๋Œ€ํ•ด์„œ๋Š” ํ•ด๋‹น ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ‰๊ท  0, ํ‘œ์ค€ํŽธ์ฐจ 0.01์ธ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋„๋ก ์ดˆ๊ธฐํ™” ํ•ด์ค€๋‹ค. ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜ ๋ ˆ์ด์–ด์— ๋ชจ๋‘ ์ ์šฉ๋œ nn.init.constant_()๋Š” bias๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•ด์ค€๋‹ค.

     

    3) ๋ ˆ์ด์–ด ํ†ต๊ณผ

     

    ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ๋ชจ๋“  ๋ ˆ์ด์–ด์— ํ†ต๊ณผ์‹œ์ผœ์ค€๋‹ค.

     

    4) ConvNet ๋ ˆ์ด์–ด ์ •์˜

     

    ConvNet ์˜ ๋ ˆ์ด์–ด๋ฅผ append ํ•ด์ฃผ๋Š” ํ•จ์ˆ˜์ด๋‹ค. ์ด๋Š” ํ‘œ 1์˜ D ๋ง๊ณ ๋„ ๋‹ค๋ฅธ ์œ ํ˜•์„ ๊ตฌํ˜„ํ•  ๋•Œ๋„ ๋˜‘๊ฐ™์ด ์ ์šฉํ•  ์ˆ˜ ์žˆ์–ด์„œ ์œ ์šฉํ•˜๋‹ค.

    ์ดˆ๊ธฐ์˜ layers ๋Š” ๋นˆ ๋ฆฌ์ŠคํŠธ๋กœ ์ €์žฅํ•ด์ฃผ๊ณ  ๋งจ ์ฒ˜์Œ in_channels ๋Š” RGB ์ด๋ฏธ์ง€๋‹ˆ๊นŒ 3์œผ๋กœ ์„ค์ •ํ•ด์ค€๋‹ค.

    model ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ VGG(cfgs["D"]) ์ด๋ ‡๊ฒŒ ์ธ์ž๋ฅผ ๋„ฃ์–ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— intํ˜•๊ณผ str ํ˜•์ด ๋ชจ๋‘ ํฌํ•จ๋œ D ์— ํ•ด๋‹นํ•˜๋Š” ๋ฆฌ์ŠคํŠธ์—์„œ, 

    ๋ฆฌ์ŠคํŠธ ์š”์†Œ๊ฐ€ int ํ˜•์ด๋ผ๋ฉด nn.Conv2d ๋ ˆ์ด์–ด๋ฅผ ์Œ“์•„์ค€๋‹ค. 

    ๋ฆฌ์ŠคํŠธ ์š”์†Œ๊ฐ€ str ํ˜•์ด๋ผ๋ฉด nn.MaxPool2d ๋ ˆ์ด์–ด๋ฅผ ์Œ“์•„์ค€๋‹ค. 

     

    ๋ฆฌ์ŠคํŠธ์— ํฌํ•จ๋œ ์ˆซ์ž๋“ค์€ out_channels๋ฅผ ์˜๋ฏธํ•˜๊ณ , ์ด ์ˆซ์ž๋Š” ๋‹ค์Œ ๋ ˆ์ด์–ด์˜ in_channel์ด ๋•Œ๋ฌธ์— 

    in_channels = v ๋กœ ์—…๋ฐ์ดํŠธ ํ•ด์ค€๋‹ค. 

     

    ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ nn.Sequential ๋กœ ๋ฌถ์–ด์ค€๋‹ค. (์ด๋•Œ ๋ฆฌ์ŠคํŠธ๋กœ ๋“ค์–ด๊ฐˆ ์ˆ˜ ์—†์–ด์„œ *์„ ๊ผญ! ๋ถ™์—ฌ์ค€๋‹ค.)

     

    5) model summary

    model = VGG(cfgs["D"], batch_norm=False)
    summary(model, input_size=(2, 3, 224, 224), device='cpu')

    torchinfo์˜ summary ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์™€ ํŒŒ๋ผ๋ฏธํ„ฐ ํฌ๊ธฐ๋ฅผ ๋ณด๊ธฐ ์ข‹๊ฒŒ ํ‘œํ˜„ํ•ด์ค€๋‹ค.

    output shape์„ ํ™•์ธํ•ด๋ณด์ž!

    code: https://github.com/jeongin7103/VGGNet/blob/main/vggnet.ipynb

     

     


    Receptive field of 3x3

     

    https://deep-learning-basics.tistory.com/58

     

    Receptive field

    In the context of artificial neural networks, the receptive field is defined as the size of the region in the input that produces the featres. Wikipedia CNN์€ local operation(i.e., convolution, pooling)์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๋ฒˆ ๋ ˆ์ด์–ด๋ฅผ ๊ฑฐ์น˜๋ฉด์„œ ์›

    deep-learning-basics.tistory.com

    ์•ž์„  ํฌ์ŠคํŠธ ๋‚ด์šฉ์„ ๊ณต๋ถ€ํ•˜๋ฉด 3x3 Conv ๋ฅผ ๋‘ ๋ฒˆ ํ•˜๋ฉด 5x5 ํฌ๊ธฐ์˜ receptive field๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

    VGGNet ์—์„œ๋„ 3x3 conv ๋ฅผ ๋‘ ๋ฒˆํ•ด์„œ 5x5 receptive field๋ฅผ ์–ป์—ˆ๋‹ค. 

    vggnet์˜ ์ผ๋ถ€
    3x3 ๋‘๋ฒˆ์œผ๋กœ 5x5์˜ receptive field๋ฅผ ์–ป์€ VGGNet

     

    ๊ทธ๋Ÿฐ๋ฐ ์™œ 3x3 ๋ฅผ ๊ตณ์ด 2๋ฒˆ ํ• ๊นŒ? ๊ทธ๋ƒฅ 5x5 conv ํ•œ ๋ฒˆ ํ•˜๋ฉด ํ•œ ๋ฒˆ์— ๋ฐ”๋กœ receptive field 5x5๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์„ ํ…๋ฐ..

    ์ด์œ ๋Š” ๋ฐ”๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฐ ์žˆ๋‹ค.

     

    3x3 conv ๋ฅผ ๋‘ ๋ฒˆ ํ•˜๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ 9+9 ๊ฐœ๋งŒํผ ํ•„์š”ํ•˜๋‹ค.

    ๊ทธ๋Ÿฐ๋ฐ 5x5 conv๋ฅผ ํ•œ ๋ฒˆ ํ•˜๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋ฐ”๋กœ 25๊ฐœ ๋งŒํผ ์ƒ๊ธด๋‹ค.

    ๋˜‘๊ฐ™์€ receptive field๋ฅผ ์–ป๋Š”๋ฐ 3x3 conv๋ฅผ ๋‘ ๋ฒˆ ํ•˜๋Š” ๊ฒƒ์ด ๋” ํšจ์œจ์ ์œผ๋กœ (ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ๋” ์ ๊ฒŒ) ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

     

     

     

Designed by Tistory.