๐Ÿ“ AI & Bigdata/AI & ML & DL

[DL] CNN-Initializing Weights for the Convolutional and FC Layers

SOIT 2022. 8. 18. 14:09

DNN์—์„œ Convolutional ๋ฐ FC Layers์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜(Weight)๊ฐ€ ํŠน์ • ๋ฐฉ์‹์œผ๋กœ ์ดˆ๊ธฐํ™”(Initializing).

 

ResNet ๋„คํŠธ์›Œํฌ(https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py)

๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ PyTorch ์ฝ”๋“œ

n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
  • ๊ฐ€์ค‘์น˜๋Š” ํ‰๊ท ์ด 0์ธ ์ •๊ทœ ๋ถ„ํฌ์™€ ํ•„ํ„ฐ ์ปค๋„ ์ฐจ์›์˜ ํ•จ์ˆ˜์ธ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค. 
  • ์ด๊ฒƒ์€ ๋„คํŠธ์›Œํฌ ๊ณ„์ธต์˜ ์ถœ๋ ฅ ๋ถ„์‚ฐ์ด ์‚ฌ๋ผ์ง€๊ฑฐ๋‚˜ ํญ๋ฐœํ•˜๋Š” ๊ฒƒ, ์ฆ‰ ๋งค์šฐ ์ปค์ง€๋Š” ๋Œ€์‹  ํ•ฉ๋ฆฌ์ ์ธ ํ•œ๊ณ„ ๋‚ด์—์„œ ๊ฒฝ๊ณ„๋ฅผ ์œ ์ง€ํ•˜๋„๋ก ํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. 
  • ์ด ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์€ Kaiming He et al.์˜ ๋‹ค์Œ ๋…ผ๋ฌธ์— ์ž์„ธํžˆ ์„ค๋ช…๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. 

Section 1: Convolution์„ Matrix Multiplication์œผ๋กœ ๊ตฌํ˜„

Section 2:  Forward Pass (Without Bias)

Section 3:  Forward Pass (With Bias)

Section 4Other Rectifiers

  • tanh ๋ฐ Sigmoid ํ•จ์ˆ˜์™€ ๊ฐ™์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐํƒ€ rectifiers(์ •๋ฅ˜๊ธฐ: ๋ฐ”๊ฟ”์ฃผ๋Š” ์žฅ์น˜)๋ฅผ ๊ณ ๋ คํ•ฉ๋‹ˆ๋‹ค.

 


Section 1: Convolution์„ Matrix Multiplication์œผ๋กœ ๊ตฌํ˜„

์ปจ๋ณผ๋ฃจ์…˜๊ณผ ํ–‰๋ ฌ ๊ณฑ์…ˆ์€ ๋‹ค๋ฅธ ์—ฐ์‚ฐ์ž…๋‹ˆ๋‹ค. ํ–‰๋ ฌ ๊ณฑ์…ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ปจ๋ณผ๋ฃจ์…˜์„ ๊ตฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ž…๋ ฅ ํ–‰๋ ฌ(๋˜๋Š” ์ปค๋„ ํ–‰๋ ฌ)์„ ์ ์ ˆํ•˜๊ฒŒ ํŽผ์น˜๋ฉด ํ–‰๋ ฌ ๊ณฑ์…ˆ์œผ๋กœ ์ปจ๋ณผ๋ฃจ์…˜์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 

 

์ž…๋ ฅ ํ–‰๋ ฌ์„ ํŽผ์น˜๊ณ  ํ–‰๋ ฌ ๊ณฑ์…ˆ์œผ๋กœ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ํŒŒ์ด์ฌ ์ฝ”๋“œ

from scipy import signal
from scipy import misc
import numpy as np
from numpy import zeros

def unfold_matrix(X, k):
    n, m = X.shape[0:2]
    xx = zeros(((n - k + 1) * (m - k + 1), k**2))
    row_num = 0
    def make_row(x):
        return x.flatten()

    for i in range(n- k+ 1):
        for j in range(m - k + 1):
            #collect block of m*m elements and convert to row
            xx[row_num,:] = make_row(X[i:i+k, j:j+k])
            row_num = row_num + 1

    return xx

w = np.array([[1, 2, 3], [4, 5, 6], [-1, -2, -3]], np.float32)
#x = np.random.randn(5,5)
x = np.array([[-0.21556299, -0.11002319, -0.3499612,   1.49290769, -0.50435978],
 [ 0.06348409,  0.66873375,  0.14251138, -1.6414004 , -0.91561852],
 [-2.52451962, -1.97544675, -0.24609529, -1.11489934, -1.44793437],
 [ 1.26260575, -0.62047366,  0.12274525,  0.25200227, -0.83925847],
 [-1.54336488, -0.05100702,  0.36608208,  0.51712927, -0.97133877],
[-1.54336488, -0.05100702, 0.36608208, 0.51712927, -0.97133877]])

n, m = x.shape[0:2]
k = w.shape[0]

y = signal.correlate2d(x, w, mode='valid')

x_unfolded = unfold_matrix(x, k)
w_flat = w.flatten()
yy = np.matmul(x_unfolded, w_flat)
yy = yy.reshape((n-k+1, m-k+1))
print(yy)
# verify yy = y

 

Section 2:  Forward Pass (Without Bias)

๋„คํŠธ์›Œํฌ๊ฐ€ ๊นŠ์–ด์ง์— ๋”ฐ๋ผ ๋„คํŠธ์›Œํฌ ์ถœ๋ ฅ์˜ ๋ถ„์‚ฐ์ด ์‚ฌ๋ผ์ง€๊ฑฐ๋‚˜ ๊ณผ๋„ํ•˜๊ฒŒ ์ปค์ง€๋Š” ๋Œ€์‹  ๊ฒฝ๊ณ„๋ฅผ ์œ ์ง€ํ•˜๋„๋ก ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ์ ์ ˆํ•œ ๋ถ„์‚ฐ์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ

 

# number of layers
num_layers = 10
 
class layer(object):
    def __init__(self, _m, _n):
        #n: filter size (width)
        #m: filter size (height)
        self.m = _m
        self.n = _n
        self.activation = 'relu'
        #self.activation = 'tanh'
        #self.activation = 'sigmoid'
 
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
 
    def forward(self, x, use_bias = False):
       #x is a row vector
        self.W = np.random.normal(0, np.sqrt(2.0/self.m), (self.m,self.n))
        self.b = np.random.normal(0, np.sqrt(2.0/num_layers), self.n)
       # self.b = 1. - 2*np.random.rand(1, 5)
        self.y = np.dot(x, self.W)
        if (use_bias):
            self.y = self.y + self.b
        if (self.activation == 'relu'):
            self.a = np.maximum(0., self.y)
        if (self.activation == 'tanh'):
            self.a = np.tanh(self.y)
        if (self.activation == 'sigmoid'):
            self.a = self.sigmoid(self.y)
        return self.a, self.y
 
 
layers = []
# even numbered layers have a 5*10 weight matrix
# odd numbered layers have a 10*5 weight matrix
for i in range(num_layers):
    layers.append(layer(5 if(i % 2 == 0) else 10, 10 if(i % 2 == 0) else 5))
 
num_trials = 100000
# records the network output (activations of the last layer)
a = np.zeros((num_trials, 5))
# records the network input
i = np.zeros((num_trials, 5))
# record the activations
y = np.zeros((num_trials, 5))
for trial in range(0,num_trials):
    # input to the network is uniformly distributed numbers in (0,1). E(x) != 0.
    # Note that the distribution of the input is different from the distribution of the weights.
    x = 3*np.random.rand(1, 5)
    i[trial, :] = x
    for layer_no in range(0,num_layers):
        x, y_ = layers[layer_no].forward(x, False)
    a[trial, :] = x
    y[trial, :] = y_
 
#E(x^2) (expected value of the square of the input)
E_x2 = np.mean(np.multiply(i,i), 0)
 
# E(a^2) (expected value of the square of the activations of the last layer)
E_a2 = np.mean(np.multiply(a,a), 0)
# verify E_a2 ~ E_x2
 
# var(y): Variance of the output before applying activation function
Var_y = np.var(y,0)
# verify Var_y ~ 2*E_a2

 

 


Section 3:  Forward Pass (With Bias)

 

๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

self.W = np.random.normal(0, np.sqrt(2.0/self.m), (self.m,self.n))
self.b = np.random.normal(0, np.sqrt(2.0/num_layers), self.n)

 

๋„คํŠธ์›Œํฌ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋™์•ˆ use_bias = True๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

for layer_no in range(0,num_layers):
    x, y_ = layers[layer_no].forward(x, True)

 

Backward Pass

Backward Pass๋ฅผ ๊ณ ๋ คํ•  ๋•Œ ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•์„ ์ˆ˜์ •ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค๋Š” ๊ฒƒ์ด ๋ฐํ˜€์กŒ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์—ญ๋ฐฉํ–ฅ ํŒจ์Šค ๋™์•ˆ ์™„์ „ํžˆ ์—ฐ๊ฒฐ๋œ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด๋ฅผ ํ†ตํ•ด ๊ทธ๋ผ๋””์–ธํŠธ๋ฅผ ์ „ํŒŒํ•˜๋ฉด ์ฐจ์›์ด ์•ฝ๊ฐ„ ๋‹ค๋ฅธ ํ–‰๋ ฌ ๊ณฑ์…ˆ๊ณผ ์ปจ๋ณผ๋ฃจ์…˜์ด ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


Section 4 Other Rectifiers

์‹œ๊ทธ๋ชจ์ด๋“œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜๊ณ  ํŽธํ–ฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

ํ•ดํ•ด์•ผ ํ•  ํ•ต์‹ฌ์€ ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ํ‘œ์ค€ ๋ฐฉ๋ฒ• u=0์ด๋‹ค.

 ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ tanh ํ™œ์„ฑํ™”์—๋Š” ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ S์žํ˜•์—๋Š” ์ž˜ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

 

 

 

 

 

์ฐธ๊ณ : TELESENS. Ankur

728x90