[Quant 1.2] 一些Pytorch基础

2022-06-08 04:27 作者:安平生一人好_ 0人读过 | 我要投稿

视频链接：https://www.youtube.com/watch?v=c36lUUr864M

因为我没学过Pytorch，所以需要从基础开始了。

这一篇的示例要用三个package

import torch
import numpy as np
import matplotlib.pyplot as plt

1. 数据类型tensor的建立

建立不同维度的0-tensor

一维长度为3的0-tensor

torch.empty(3)

#tensor([1.1210e-44, -0.0000e+00, 0.0000e+00])

二维 $2%5Ctimes3%0A$ 的0-tensor

torch.empty(2,3)

# tensor([[9.8091e-45, 0.0000e+00, 0.0000e+00],
#         [0.0000e+00, 0.0000e+00, 0.0000e+00]])

三维 $2%5Ctimes%203%5Ctimes%204$ 的0-tensor

torch.empty(2,3,4)

#tensor([[[ 2.9147e-43,  0.0000e+00, -1.1201e-19,  4.5779e-41],
#           [ 6.0009e+36,  4.5779e-41,  0.0000e+00,  7.0065e-45],
#           [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00]],
 
#          [[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
#           [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
#           [ 1.4013e-45,  0.0000e+00,  0.0000e+00,  0.0000e+00]]])

建立不同维度的1-tensor

二维 $2%5Ctimes%202%0A$ 的1-tensor

torch.ones(2,2)

# tensor([[1., 1.],
#         [1., 1.]])

建立不同维度的随机tensor

torch.rand(2,2)

# tensor([[0.2246, 0.5603],
#         [0.5463, 0.8566]])

建立不同维度的随机整数tensor

建立一个三维 $3%5Ctimes%203%20%5Ctimes%203$ 的随机整数矩阵，元素的范围是 $%5B2%2C8)$ 区间内的整数

x = torch.randint(2,8,(3,3,3))
x

# tensor([[[5, 3, 3],
#          [5, 7, 7],
#          [2, 3, 4]],

#         [[5, 3, 7],
#          [4, 6, 5],
#          [3, 4, 5]],

#         [[7, 5, 7],
#          [3, 6, 2],
#          [2, 7, 7]]])

以list为argument建立tensor

利用list建立一个 $2%20%5Ctimes%202%0A$ 的tensor

my_ten = torch.tensor([[2.5,0.1],[1,2]])

my_ten
# tensor([[2.5000, 0.1000],
#         [1.0000, 2.0000]])

my_ten.size()
#  torch.Size([2, 2])

设定tensor中的数据类型

建立一个二维 $2%20%5Ctimes%202$ 的1-tensor，要求tensor里面的元素的类型是浮点数float16

x = torch.ones(2,2,dtype=torch.float16)

# tensor([[1., 1.],
#         [1., 1.]], dtype=torch.float16)

设定tensor是否可以用来求梯度

建立一个二维 $2%20%5Ctimes%203$ 的随机tensor，要求tensor可以用来求梯度

torch.rand(2,3,requires_grad = True)

# tensor([[0.9961, 0.2444, 0.6532],
#         [0.5307, 0.6206, 0.5152]], requires_grad=True)

在后面，我们会把tensor放入某种函数。这个函数的输入是一个多维的tensor，输出是一个scaler。如果这个多维tensor有requires_grad = True，那么我们就可以用backward()求函数在此多维tensor上的梯度；如果这个多维tensor有requires_grad = False，那么在使用backward()求梯度的时候，interpreter就会报错。

2. 数据类型tensor的一些操作

Tensor的相加与相减

x = torch.rand(2,3)
y = torch.rand(2,3)
x, y
# (tensor([[0.5501, 0.8308, 0.2830],
#          [0.4184, 0.3558, 0.0589]]),
#  tensor([[0.3422, 0.9984, 0.7679],
#          [0.2108, 0.6127, 0.6060]]))

x + y
# tensor([[0.8923, 1.8292, 1.0509],
#         [0.6291, 0.9684, 0.6648]])

x.add(y)
# tensor([[0.8923, 1.8292, 1.0509],
#         [0.6291, 0.9684, 0.6648]])

x, y 
# (tensor([[0.5501, 0.8308, 0.2830],
#          [0.4184, 0.3558, 0.0589]]),
#  tensor([[0.3422, 0.9984, 0.7679],
#          [0.2108, 0.6127, 0.6060]]))

用 '+' 将tensor相加会创建新的tensor，原来的tensor不会改变。对应的，我们知道 '-' 也有相似的作用。

这里面 '+' 和 '-' 可以用函数 .add() 和 .sub() 来代替，用这两个方法相加或者相减同样创建新的tensor，并不会改变原tensor x和y的值。

x = torch.randint(2,(2,3))
x
# tensor([[1, 0, 1],
#         [0, 0, 1]])

x.add_(1)
# tensor([[2, 1, 2],
#         [1, 1, 2]])

x.sub_(1)
# tensor([[1, 0, 1],
#         [0, 0, 1]])

但是如果我们在 .add() 和 .sub() 后面加了下划线，那么 .add_() 和 .sub_() 就会改变其作用对象的值。在这个例子中，x的值被改变了两次。这种操作在pytorch中很常见，很多method不加下划线就会创建新的变量，而加下划线的话就会在改变原来的变量。

Tensor的切片slicing

x = torch.rand(5,3)
x
# tensor([[0.8021, 0.0619, 0.2424],
#         [0.1589, 0.9536, 0.5429],
#         [0.3194, 0.4105, 0.3977],
#         [0.4445, 0.5245, 0.5164],
#         [0.2404, 0.9453, 0.9572]])

x[1,:]
# tensor([0.1589, 0.9536, 0.5429])

tensor的切片和ndarray的切片方式一样。在上面的例子中，我们要slice这个 $5%5Ctimes%203$ tensor的第一行

Tensor的变形reshape

x = torch.rand(5,3)
x
# tensor([[0.8021, 0.0619, 0.2424],
#         [0.1589, 0.9536, 0.5429],
#         [0.3194, 0.4105, 0.3977],
#         [0.4445, 0.5245, 0.5164],
#         [0.2404, 0.9453, 0.9572]])

y = x.view(15)
y
# tensor([0.8021, 0.0619, 0.2424, 0.1589, 0.9536, 0.5429, 0.3194, 0.4105, 0.3977,
#         0.4445, 0.5245, 0.5164, 0.2404, 0.9453, 0.9572])
 
y = x.view(-1,5)
y
# tensor([[0.8021, 0.0619, 0.2424, 0.1589, 0.9536],
#         [0.5429, 0.3194, 0.4105, 0.3977, 0.4445],
#         [0.5245, 0.5164, 0.2404, 0.9453, 0.9572]])

通过 .view() 函数，我们可以将tensor展成我们想要的size。

y = x.view(15) 是将刚刚的 $5%5Ctimes%203$ tensor展成 $1%20%5Ctimes%2015$

y = x.view(-1,5) 则是将 $5%20%5Ctimes%203$ tensor展成 $%3F%20%5Ctimes%205$ ，这个?在函数输入中用-1代替，其具体数值会自行决定，例如这个statement就相当于 y = x.view(3,5)

3. 数据类型tensor的gradient

我们前面提到过了tensor里面可以内含一个叫requires_grad的argument。这个argument是bool类型，它决定着是否可以对这个tensor求gradient。

例如我们可以在建立一个tensor变量的时候来决定它是否可以被求梯度

torch.randn(2,3,requires_grad=True)
# tensor([[ 2.1278,  1.1417,  0.6102],
#         [-1.3501,  0.5458,  2.5938]], requires_grad=True)

建立一个二维 $2%5Ctimes%203$ 的随机tensor，元素服从标准正态分布，且这个tensor可以求梯度。

除了在建立tensor的时候决定这个argument之外，我们还可以修改已建立的tensor的argument requires_grad。一共有三种方法：

x = torch.randn(3,requires_grad=True)
x

# tensor([-1.1794,  1.0465, -1.3400], requires_grad=True)

1. .detach()

y = x.detach()
y

# tensor([-1.1794,  1.0465, -1.3400])

通过 x.detach() ，我们建立了一个全新的tensor，这个tensor是不可以求梯度的

2. .with torch.no_grad():

with torch.no_grad():
    y = x + 2
    print(y)
    
# tensor([0.8206, 3.0465, 0.6600])

在这个statemtent下面，我们可以忽略一个tensor本身是否可以求梯度的性质来对它进行一些操作。我们忽略了x可以求梯度的性质，把x的每一个元素加2，再把加了2之后的tensor赋值给一个新的变量y。所以我们最终输出的tensor y是不能够求梯度的。

3. .requires_grad_(False)

x.requires_grad_(False)

# tensor([-1.4181,  0.3631, -0.7994])

回忆上面的 .add_() 和 .sub_()，以下划线结尾的method会改变其作用的tensor本身。这里也是一样，例子里面我们直接改变了x的性质，让x不能够被求梯度。

3. 求梯度：backward函数和.grad

我目前的感觉是，如果我们对一个tensor进行各种操作（加减乘除，所有元素求和，求所有元素平均值等elementwise或者tensor-wise的操作），得到一个新的 $1%20%5Ctimes%201$ 的tensor，再把这个tensor赋值给另一个变量的话，pytorch会记住我们对初始tensor进行变换的过程。

简而言之，我们对tensor A操作，最终得到了 $1%20%5Ctimes%201$ tensor B，pytorch会自动建立由A到B的函数关系。

我们考虑一个二元函数：

$f(x%2Cy)%20%3D%20e%5Ex%20%2B%20ln(y)$

这个函数的梯度是

$%5Cbegin%7Balign%7D%0A%5Cnabla%20f%20%26%20%3D%20(%5Cfrac%7B%5Cpartial%20f%7D%7B%5Cpartial%20x%7D%2C%5Cfrac%7B%5Cpartial%20f%7D%7B%5Cpartial%20y%7D)%20%5C%5C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%26%20%3D%20(e%5Ex%2C%5Cfrac%7B1%7D%7By%7D)%0A%5Cend%7Balign%7D$

因此，当 $(x%2Cy)%20%3D%20(1%2C2)$ 的时候，梯度就是

$%5Cnabla%20f%7C_%7Bx%3D1%2C%20y%3D2%7D%20%3D%20(e%2C%5Cfrac%7B1%7D%7B2%7D)$

如果我们尝试用pytorch去求函数 $f$ 在 $(1%2C2)$ 处的梯度的话，等价的pytorch代码是

my_tensor = torch.tensor([1,2],requires_grad = True,dtype = torch.float64)
my_res = torch.exp(my_ten[0]) + torch.log(my_ten[1])
my_res.backward()
print(my_tensor.grad)
print(my_res)

# tensor([2.7183, 0.5000], dtype=torch.float64)
# tensor(3.4114, dtype=torch.float64, grad_fn=<AddBackward0>)

第一行，建立一个tensor $(1%2C2)$ ，就是我们想要求梯度的位置，记住一定要要求requires_grad = True

第二行，建立tensor $(1%2C2)$ （my_tensor）和目标 tensor （my_res）之间的联系。用自然语言叙述就是，将tensor的第0项的自然指数和tensor第一项的自然对数相加。

第三行，通过my_res.backward() ，我们求my_res在my_tensor上面的梯度。前两行代码对pytorch指示的my_tensor和my_res之间的联系就是上面提到的 $f(x%2Cy)$

第四行，我们通过my_tensor.grad得到函数在tensor $(1%2C2)$ 上面的梯度

值得注意的是，每当我们像第二行那样建立了一次自变量和因变量的联系之后，我们只能对应的求一次梯度。如果建立一个联系却求两次梯度的话，intuitively相等的新梯度会覆盖之前求出来的旧梯度，但是实际上pytorch会对这种行为报错。简而言之，建立一次联系只能求一次梯度。

weights = torch.ones(4,requires_grad = True)

for epoch in range(2):
    model_output = (weights*3).sum()
    
    model_output.backward()
    
    print(weights.grad)
    
# tensor([3., 3., 3., 3.])
# tensor([6., 6., 6., 6.])

在上面的代码中，每次求梯度之前，我们都会跑一行代码 model_output = (weights*3).sum() 来重新建立联系。即使每一次循环中这个联系是完全不变的，我们也要重新跑，否则backward会报错。

但是新的问题出现了，在两次循环中，建立的联系还有自变量tensor (weights)都是不变的，但是两次求出来的梯度却不一样。这是因为 .grad是自变量tensor的性质，在我们第二次求函数在自变量tensor上面的梯度时，第二次求出来的梯度会和第一次求出来的梯度叠加。因此，在这个循环中，每当我们得到了想要的自变量tensor的梯度之后，为了不影响下次循环，应该使用加一行 weights.grad.zero_() 把之前求出来的梯度清0。正确的代码是：

weights = torch.ones(4,requires_grad = True)

for epoch in range(2):
    model_output = (weights*3).sum()
    
    model_output.backward()
    
    print(weights.grad)
    
    weights.grad.zero_()
  
# tensor([3., 3., 3., 3.])
# tensor([3., 3., 3., 3.])

4. 小练习：下面这个函数在哪里取最小值

$f(x%2Cy)%20%3D%20x%5E2%20%2B%20xy%20%2B%20y%5E2%20%2B%20x%20%2B%20y$

这个用first order condition，最小值在 $(-%5Cfrac%7B1%7D%7B3%7D%2C-%5Cfrac%7B1%7D%7B3%7D)$ 处取，我就不详细写了。

x_list = []
y_list = []

i = 0
lr = 0.05
x = torch.randn(2,requires_grad = True)
while (i < 10000):
    y = x[0]**2 + x[0] * x[1] + x[1]**2 + x[0] + x[1]
    y.backward()
    if abs(x.grad).mean() < 1/100000:
        break
    #print(x) tensor([-0.0180, -1.1985], requires_grad=True)
    x = x - lr * x.grad
    #print(x) tensor([-0.0063, -1.1278], grad_fn=<SubBackward0>)
    x.detach_()  
    x.requires_grad_(True)
    #print(x) tensor([-0.0063, -1.1278], requires_grad=True)
    i += 1
    
    x_list.append(x.detach().numpy()[0])
    y_list.append(x.detach().numpy()[1])
x

# tensor([-0.3333, -0.3333], requires_grad=True)

等价的，也可以

i = 0
lr = 0.05
x = torch.randn(2,requires_grad = True)
while (i < 10000):
    y = x[0]**2 + x[0] * x[1] + x[1]**2 + x[0] + x[1]
    y.backward()
    if abs(x.grad).mean() < 1/100000:
        break
    #print(x) tensor([-0.6739, -0.4994], requires_grad=True)
    with torch.no_grad():
        x -= lr * x.grad
    #print(x) tensor([-0.6316, -0.4658], requires_grad=True)
    x.grad.zero_()
    i += 1

x

# tensor([-0.3333, -0.3333], requires_grad=True)

最后可视化一下，看看随着迭代逐渐逼近真实解的过程。

fig,ax = plt.subplots(1,figsize=(10,4))
ax.scatter(x_list,y_list,label='Estimated minimum')
ax.scatter(-1/3,-1/3,label='True minimum')
ax.legend()

标签：

[Quant 1.2] 一些Pytorch基础

1. 数据类型tensor的建立

2. 数据类型tensor的一些操作

3. 数据类型tensor的gradient

4. 小练习：下面这个函数在哪里取最小值