PyTorch (2) 自動微分

PyTorchの自動微分を試してみた。

import numpy as np
import torch
import torch.nn as nn

まずは必要なライブラリをインポート。

# テンソルを作成
# requires_grad=Falseだと微分の対象にならず勾配はNoneが返る
x = torch.tensor(1.0, requires_grad=True)
w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

# 計算グラフを構築
# y = 2 * x + 3
y = w * x + b

# 勾配を計算
y.backward()

# 勾配を表示
print(x.grad)  # dy/dx = w = 2
print(w.grad)  # dy/dw = x = 1
print(b.grad)  # dy/db = 1

tensor(2.)
tensor(1.)
tensor(1.)

requires_grad=Falseだと微分の対象にならず勾配はNoneが返る
requires_grad=Fase はFine-tuningで層のパラメータを固定したいときに便利
計算グラフを構築してbackward()を実行するとグラフを構築する各変数のgradに勾配が入る

Theanoの使い方 (2) 自動微分（2015/5/18）をTheanoではなくPyTorchでやってみる

例1

$y = x^2$

$\displaystyle \frac{dy}{dx} = 2x$

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)

tensor(4.)

yは変数xの式で成り立っていて、yのbackward()を呼び出すとそれぞれの変数のgradプロパティに勾配が入る。

例2

$y = e^x$

$\displaystyle \frac{dy}{dx} = e^x$

x = torch.tensor(2.0, requires_grad=True)
y = torch.exp(x)
y.backward()
print(x.grad)

tensor(7.3891)

計算グラフを構築するときは numpy の関数 numpy.exp() を使ってはダメ
テンソル計算を行う専用の関数を使う torch.exp()
これらの関数は微分可能なので計算グラフ上で誤差逆伝搬が可能

例3

$y = \sin(x)$

$\displaystyle \frac{dy}{dx} = \cos(x)$

x = torch.tensor(np.pi, requires_grad=True)
y = torch.sin(x)
y.backward()
print(x.grad)

tensor(-1.)

例4

$y = (x - 4)(x^2 + 6)$

$\displaystyle \frac{dy}{dx} = 3 x^2 - 8 x + 6$

x = torch.tensor(0.0, requires_grad=True)
y = (x - 4) * (x ** 2 + 6)
y.backward()
print(x.grad)

tensor(6.)

例5

$y = (\sqrt{x} + 1)^3$

$\displaystyle \frac{dy}{dx} = \frac{3 (\sqrt{x} + 1)^2}{2 \sqrt{x}}$

x = torch.tensor(2.0, requires_grad=True)
y = (torch.sqrt(x) + 1) ** 3
y.backward()
print(x.grad)

tensor(6.1820)

例6

最後は偏微分の例。

$z = (x + 2 y)^2$

$\displaystyle \frac{\partial z}{\partial x} = 2 (x + 2y)$

$\displaystyle \frac{\partial z}{\partial y} = 4 (x + 2y)$

x = torch.tensor(1.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)
z = (x + 2 * y) ** 2
z.backward()
print(x.grad)  # dz/dx
print(y.grad)  # dz/dy

tensor(10.)
tensor(20.)

lossを微分する

ニューラルネットの場合は、lossをパラメータ（重みやバイアス）で偏微分した値を使って勾配降下法でパラメータを更新するのが一般的。

# バッチサンプル数=5、入力特徴量の次元数=3
x = torch.randn(5, 3)
# バッチサンプル数=5、出力特徴量の次元数=2
y = torch.randn(5, 2)

# Linear層を作成
# 3ユニット => 2ユニット
linear = nn.Linear(3, 2)

# Linear層のパラメータ
print('w:', linear.weight)
print('b:', linear.bias)

# lossとoptimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# forward
pred = linear(x)

# loss = L
loss = criterion(pred, y)
print('loss:', loss)

# backpropagation
loss.backward()

# 勾配を表示
print('dL/dw:', linear.weight.grad)
print('dL/db:', linear.bias.grad)

# 勾配を用いてパラメータを更新
print('*** by hand')
print(linear.weight.sub(0.01 * linear.weight.grad))
print(linear.bias.sub(0.01 * linear.bias.grad))

# 勾配降下法
optimizer.step()

# 1ステップ更新後のパラメータを表示
# 上の式と結果が一致することがわかる
print('*** by optimizer.step()')
print(linear.weight)
print(linear.bias)

w: Parameter containing:
tensor([[ 0.4176,  0.2302,  0.3942],
        [-0.3258,  0.0489, -0.3333]], requires_grad=True)
b: Parameter containing:
tensor([0.4269, 0.2872], requires_grad=True)
loss: tensor(1.3395, grad_fn=<MseLossBackward>)
dL/dw: tensor([[ 0.4404,  0.4512,  0.9893],
        [-0.6777, -0.2535, -0.5191]])
dL/db: tensor([0.6095, 0.6305])
*** by hand
tensor([[ 0.4132,  0.2257,  0.3843],
        [-0.3191,  0.0514, -0.3281]], grad_fn=<ThSubBackward>)
tensor([0.4208, 0.2809], grad_fn=<ThSubBackward>)
*** by optimizer.step()
Parameter containing:
tensor([[ 0.4132,  0.2257,  0.3843],
        [-0.3191,  0.0514, -0.3281]], requires_grad=True)
Parameter containing:
tensor([0.4208, 0.2809], requires_grad=True)