http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

Deep Learning with PyTorch: A 60 Minute Blitz

Neural Network

신경망은 torch.nn패키지를 이용해서 만들 수 있습니다.

앞에서 autograd의 맛을 잠깐 보았는데, nn은 이 autograd를 이용하여 모델을 정의하고 미분을 합니다. nn.Module은 레이어를 포함하고 있고, foward(input)이 output을 리턴하게 됩니다.

아래의 숫자 이미지를 분류하는 망을 예제로 살펴봅시다.

convnet

간단한 feed-forward 망입니다. 입력을 받아 여러 레이어를 순서대로 거쳐 출력을 만들어냅니다.

신경망의 일반적인 학습 순서는 다음과 같습니다.

학습할 파라미터(혹은 가중치, learnable paramter, weights)를 가지는 신경망을 정의한다.
데이터셋에 대해 반복한다.
입력을 신경망을 통과시켜 처리한다.
로스(출력이 정답과 얼마나 차이나는지)를 계산한다.
망의 파라미터에 대해 그래디언트를 뒤로 전파한다.
망의 가중치를 업데이트한다. 보통 다음의 규칙을 사용한다. weight = weight - learning_rate * gradient

Define the network

이제 망을 정의해봅시다.

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

import torch

from torch.autograd import Variable

import torch.nn as nn

import torch.nn.functional as F

class Net(nn.Module):

def __init__(self):

super(Net, self).__init__()

# 1 input image channel, 6 output channels, 5x5 square convolution

# kernel

self.conv1 = nn.Conv2d(1, 6, 5)

self.conv2 = nn.Conv2d(6, 16, 5)

# an affine operation: y = Wx + b

self.fc1 = nn.Linear(16 * 5 * 5, 120)

self.fc2 = nn.Linear(120, 84)

self.fc3 = nn.Linear(84, 10)

def forward(self, x):

# Max pooling over a (2, 2) window

x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))

# If the size is a square you can only specify a single number

x = F.max_pool2d(F.relu(self.conv2(x)), 2)

x = x.view(-1, self.num_flat_features(x))

x = F.relu(self.fc1(x))

x = F.relu(self.fc2(x))

x = self.fc3(x)

return x

def num_flat_features(self, x):

size = x.size()[1:] # all dimensions except the batch dimension

num_features = 1

for s in size:

num_features *= s

return num_features

net = Net()

print(net)

Out:

Default

Net( (conv1): Conv2d (1, 6, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d (6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=400, out_features=120) (fc2): Linear(in_features=120, out_features=84) (fc3): Linear(in_features=84, out_features=10) )

1
2
3
4
5
6
7
8

Net(
  (conv1): Conv2d (1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d (6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120)
  (fc2): Linear(in_features=120, out_features=84)
  (fc3): Linear(in_features=84, out_features=10)
)

망을 구성하려면 forward함수를 정의하기만 하면 됩니다. 그래디언트가 계산되는 backward함수는 자동적으로 autograd를 이용하여 정의됩니다. forward함수 내에서는 어떤 Tensor 연산이라도 사용할 수 있습니다.

모델의 학습할 파라미터는 net.parametrs()를 이용해서 얻을 수 있습니다.

params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight

params = list(net.parameters())

print(len(params))

print(params[0].size()) # conv1's .weight

Out:

Default

10 torch.Size([6, 1, 5, 5])

1
2
3

10
torch.Size([6, 1, 5, 5])

forward로 들어간 입력은 autograd.Variable이고, 출력 또한 마찬가지입니다. 우리가 예상하고 있는 이 신경망(LeNet)의 입력의 크기는 32x32입니다. MNIST 데이터셋에서 이 신경망을 사용하려면 이미지를 32x32로 리사이즈하여 사용해야 합니다.

input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print(out)

input = Variable(torch.randn(1, 1, 32, 32))

out = net(input)

print(out)

Out:

Default

Variable containing: 0.0023 -0.0613 -0.0397 -0.1123 -0.0397 0.0330 -0.0656 -0.1231 0.0412 0.0162 [torch.FloatTensor of size 1x10]

1
2
3
4

Variable containing:
0.0023 -0.0613 -0.0397 -0.1123 -0.0397 0.0330 -0.0656 -0.1231 0.0412 0.0162
[torch.FloatTensor of size 1x10]

모든 파라미터의 그래디언트 버퍼를 0으로 만들고, 랜덤 그래디언트로 역전파를 해봅시다.

net.zero_grad()
out.backward(torch.randn(1, 10))

net.zero_grad()

out.backward(torch.randn(1, 10))

Note

torch.nn은 미니 배치만을 사용합니다. 모든 torch.nn 패키지는 입력을 한개의 샘플 입력이 아닌 샘플들의 매니 배치만을 지원합니다.

예를 들면, nn.Conv2d는 nSamples x nChannels x Height x Width의 4D 텐서를 입력으로 합니다.

하나의 샘플을 처리하고자 한다면 input.unsqueeze(0)을 사용해서 가짜 배치를 만든 뒤 사용하여야 합니다.

다음으로 넘어가기 전에 지금까지 살펴본 클래스들을 한번 더 짚고 넘어갑시다.

Recap:

torch.Tensor - 다차원 배열
autograd.Variable - 텐서와 그에 적용된 연산의 히스토리를 기록해둔 래퍼. Tensor와 동일한 API를 가지고 있으며, backward()와 같이 추가된 API도 있다. 텐서의 그래디언트 또한 가지고 있다.
nn.Module - 신경망 모듈. 파라미터를 캡슐화하는 편리한 방법. 데이터를 GPU로 옮기거나 출력하거나 로딩하는 등의 편리한 헬퍼를 제공한다.
nn.Parameter - Variable의 한 종류. 자동적으로 파라미터로 등록되어 Module의 한 애트리뷰트로 할당된다.
autograd.Function - autograd 연산의 forward와 backward가 구현된다. 모든 Variable 연산은 최소한 하나의 Function 노드를 가지며, 이것은 Variable을 만든 함수들과 연결되어 있으며, 그 히스토리를 또한 저장하고 있다.

여기까지 다음을 살펴보았습니다.

신경망을 정의하기

입력 데이터를 처리하고 backward를 실행하기

이런 것들을 더 해야 합니다.

로스를 계산하기
신경망의 가중치를 업데이트하기

Loss Function

로스 함수는 입력을 통해 계산한 (출력, 목표) 페어를 이용하여 목표와 출력이 얼마나 떨어져 있는지를 추정하는 값을 계산합니다.

nn패키지가 제공하는 로스 함수는 여러가지 종류가 있습니다. 간단한 로스는 nn.MSELoss인데, 출력과 목표 간의 mean-squared error를 계산합니다.

예를 들면,

output = net(input)
target = Variable(torch.arange(1, 11)) # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

output = net(input)

target = Variable(torch.arange(1, 11)) # a dummy target, for example

criterion = nn.MSELoss()

loss = criterion(output, target)

print(loss)

Out:

Default

Variable containing: 38.8243 [torch.FloatTensor of size 1]

1
2
3
4

Variable containing:
38.8243
[torch.FloatTensor of size 1]

이제 .grad_fn애트리뷰트를 이용하여 역방향으로 loss를 따라가보면 다음과 같이 계산의 그래프를 확인할 수 있습니다.

input -&gt; conv2d -&gt; relu -&gt; maxpool2d -&gt; conv2d -&gt; relu -&gt; maxpool2d
      -&gt; view -&gt; linear -&gt; relu -&gt; linear -&gt; relu -&gt; linear
      -&gt; MSELoss
      -&gt; loss

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d

-> view -> linear -> relu -> linear -> relu -> linear

-> MSELoss

-> loss

우리가 loss.backward()를 실행할 때, 모든 그래프는 로스에 대해서 미분되며, 그래프 내의 모든 Variable들은 그래디언트가 누적된 자신만의 .grad Variable를 가지게 됩니다.

설명을 위해서 backward의 몇 단계를 따라가 봅시다.

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

print(loss.grad_fn) # MSELoss

print(loss.grad_fn.next_functions[0][0]) # Linear

print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU

Out:

Default

<MseLossBackward object at 0x7fe4c18539e8> <AddmmBackward object at 0x7fe3f5498550> <ExpandBackward object at 0x7fe4c18539e8>

1
2
3
4

<MseLossBackward object at 0x7fe4c18539e8>
<AddmmBackward object at 0x7fe3f5498550>
<ExpandBackward object at 0x7fe4c18539e8>

Backprop

에러를 역전파를 하기위해서는 단지 loss.backward()를 실행하기만 하면 됩니다. 단, 기존의 존재하는 그래디언트를 초기화시키는 작업이 필요합니다. 그렇지 않으면 새로운 그래디언트는 기존의 그래디언트에 누적될 것입니다.

loss.backward()를 실행하고 conv1의 bias의 그래디언트가 backward전과 후에 어떻게 바뀌는지 살펴봅시다.

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

net.zero_grad() # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')

print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')

print(net.conv1.bias.grad)

Out:

Default

conv1.bias.grad before backward Variable containing: 0 0 0 0 0 0 [torch.FloatTensor of size 6] conv1.bias.grad after backward Variable containing: 1.00000e-02 * 7.4571 -0.4714 -5.5774 -6.2058 6.6810 3.1632 [torch.FloatTensor of size 6]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

conv1.bias.grad before backward
Variable containing:
0
0
0
0
0
0
[torch.FloatTensor of size 6]

conv1.bias.grad after backward
Variable containing:
1.00000e-02 *
  7.4571
-0.4714
-5.5774
-6.2058
  6.6810
  3.1632
[torch.FloatTensor of size 6]

여기까지 로스 함수를 어떻게 사용하는지를 살펴보았습니다.

읽을 거리:

신경망 패키지는 여러 모듈과 로스 함수를 가지고 있으며, 심층 신경망의 빌딩 블럭(building block)을 구성합니다. 전체 리스트가 나와있는 문서는 여기에 있습니다.

이제 남은 일은?

신경망의 가중치를 업데이트하기.

Update the weights

가장 간단한 업데이트 규칙은 Stochastic Gradient Descent (SGD)입니다.

weight = weight - learning_rate * graident

1 2	weight = weight - learning_rate * graident

이것을 파이썬코드로 구현해봅시다.

learning_rate = 0.01
for f in net.parameters():
	f.data.sub_(f.grad.data * learning_rate)

learning_rate = 0.01

for f in net.parameters():

f.data.sub_(f.grad.data * learning_rate)

하지만 실제 신경망에서 사용할 때에는 SGD, Nestero-SGD, Adam, RMSProp 등의 여러 업데이트 규칙을 사용하길 원할 수 있습니다. 이러한 것들을 사용하기 위해 위의 것들을 구현해놓은 torch.optim패키지를 만들어 두었습니다. 이를 사용하는 간단한 예입니다.

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

import torch.optim as optim

# create your optimizer

optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:

optimizer.zero_grad() # zero the gradient buffers

output = net(input)

loss = criterion(output, target)

loss.backward()

optimizer.step() # Does the update

Note

optimizer.zero_grad()를 사용해서 일일이 그래디언트 버퍼를 초기화 시켜야 합니다. Backprop 섹션에서 설명한 것처럼 그래디언트가 누적되기 때문입니다.

읽기일기

Deep Learning with PyTorch: A 60 Minutes Blitz (3) – Neural Network

Deep Learning with PyTorch: A 60 Minute Blitz

Neural Network

Define the network

Loss Function

Backprop

Update the weights

카테고리

최신 글

최신 댓글

보관함

메타

읽기일기

Deep Learning with PyTorch: A 60 Minutes Blitz (3) – Neural Network

Deep Learning with PyTorch: A 60 Minute Blitz

Neural Network

Define the network

Loss Function

Backprop

Update the weights

카테고리

태그

최신 글

최신 댓글

보관함

메타