Softmax 逻辑回归模型是 Logistic 回归模型在多分类问题上的推广
入门 对于多个分类的问题,如:预测某一图像属于类别“猫”、“狗”、“马”的哪一类等
表示分类数据的方法:独热编码(one-hot encoding),是一个向量,其分量与类别一样多。类别对应的分量设置为 1,其余分量设置为 0 。如:(1,0,0)对应于“猫”、(0,1,0)对应于“狗”、(0,0,1)对应于“马”。
显然,如果使用 Sigmod 函数,求出每一类别的可能性,我们无法快速地判断出最适合的类别(有可能存在两个类别的可能性相等的情况),我们希望能有一个函数,能够保证:
P ( y = i ) ≥ 0 P(y = i) \ge 0 P ( y = i ) ≥ 0 ∑ i = 0 k − 1 P ( y = i ) = 1 \sum^{k-1}_{i=0}{P(y = i)} = 1 ∑ i = 0 k − 1 P ( y = i ) = 1 (假设总共有k k k 个已知类别)Softmax 函数,就能够将未归一化的预测变换为非负且总和为1的函数:
s o f t m a x ( z i ) = P ( y = i ) = exp ( z i ) ∑ j = 0 k − 1 exp ( z j ) , i ∈ { 0 , . . . , k − 1 } softmax(z_i)=P(y=i)=\frac{\exp(z_i)}{\sum^{k-1}_{j=0}{\exp(z_j)}}\ ,i\in \{ 0,...,k-1 \} so f t ma x ( z i ) = P ( y = i ) = ∑ j = 0 k − 1 exp ( z j ) exp ( z i ) , i ∈ { 0 , ... , k − 1 }
Sigmod 函数更适用于 多标签分类问题(多个正确答案 ,非独占输出)
Softmax 函数更适用于 多类别分类问题 (唯一一个正确答案,互斥输出 )
Softmax 函数与 Sigmod 函数的区别与联系:参见这篇文章
模型 对于批量大小为B \mathcal{B} B 的样本X ∈ R B × n \mathbf{X} \in \mathbb{R}^{\mathcal{B}\times n} X ∈ R B × n ,其中特征维度(输入数量)为n n n 。
输出中有k k k 个类别,则权重W ∈ R n × k \mathbf{W} \in \mathbb{R}^{n \times k} W ∈ R n × k ,偏置为b ∈ R 1 × k \mathbf{b} \in \mathbb{R}^{1\times k} b ∈ R 1 × k ,则 Softmax 回归的矢量表达式为:
Z = X W + b Y ^ = s o f t m a x ( Z ) \mathbf{Z} = \mathbf{XW} + \mathbf{b} \\ \hat{\mathbf{Y}} = softmax(\mathbf{Z}) Z = XW + b Y ^ = so f t ma x ( Z )
对于样本i i i ,预测出该样本属于每个类别的概率:y ^ ( i ) \hat{\mathbf{y}}^{(i)} y ^ ( i ) ,我们选择出最有可能的类别q q q :
arg max q y ^ q ( i ) \underset{q}{\arg\max}\ \hat{y}^{(i)}_{q} q arg max y ^ q ( i )
衡量预估质量 对于样本i i i ,预测出该样本属于每个类别的概率:y ^ ( i ) \hat{\mathbf{y}}^{(i)} y ^ ( i ) ,而y ( i ) \mathbf{y}^{(i)} y ( i ) 是真实标签(注意,是独热编码,该向量只有一个分量是 1 ,其余为 0 )
则损失函数即为交叉熵损失(cross-entropy loss):
l ( i ) = − ∑ q = 1 k y q ( i ) ln y ^ q ( i ) l^{(i)}=-\sum^k_{q=1}{y^{(i)}_q\ln\hat{y}^{(i)}_{q}} l ( i ) = − q = 1 ∑ k y q ( i ) ln y ^ q ( i )
基于PyTorch框架的实现(自定义块版本) 准备数据集 1 2 3 4 5 6 import torchfrom torchvision import transforms from torchvision import datasetsfrom torch.utils.data import DataLoaderimport torch.nn.functional as F import torch.optim as optim
我们将使用 MNIST 手写字符的图像数据集,它共有10个标签(对应数字0-9)
读取并预处理数据集 我们需要将该图像数据集从PIL类型,转换为 PyTorch 的 Tensor:
转换前:Z 28 × 28 , p i x e l ∈ { 0 , . . . , 255 } \mathbb{Z}^{28\times 28}, pixel \in \{0, ..., 255\} Z 28 × 28 , p i x e l ∈ { 0 , ... , 255 }
转换后:R 1 × 28 × 28 , p i x e l ∈ [ 0 , 1 ] \mathbb{R}^{1\times 28 \times 28}, pixel \in [0, 1] R 1 × 28 × 28 , p i x e l ∈ [ 0 , 1 ] ,对应C × W × H C\times W \times H C × W × H
故需要创建 torchvision.transforms.Compose 类(能够串联多个 图像变换相关 的操作)
1 2 3 4 transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307 , ), (0.3081 , )) ])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 batch_size = 64 train_dataset = datasets.MNIST(root = '../dataset/mnist/' , train= True , download = False , transform = transform) test_dataset = datasets.MNIST(root = '../dataset/mnist' , train = False , download = False , transform = transform) train_loader = DataLoader(train_dataset, shuffle = True , batch_size = batch_size) test_loader = DataLoader(test_dataset, shuffle = False , batch_size = batch_size)
设计模型 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 class Net (torch.nn.Module): def __init__ (self ): super (Net, self).__init__() self.l1 = torch.nn.Linear(784 , 512 ) self.l2 = torch.nn.Linear(512 , 256 ) self.l3 = torch.nn.Linear(256 , 128 ) self.l4 = torch.nn.Linear(128 , 64 ) self.l5 = torch.nn.Linear(64 , 10 ) def forward (self, x ): x = x.view(-1 , 784 ) x = F.relu(self.l1(x)) x = F.relu(self.l2(x)) x = F.relu(self.l3(x)) x = F.relu(self.l4(x)) return self.l5(x) model = Net()
损失函数和优化器 1 2 3 criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr = 0.01 , momentum=0.5 )
训练过程 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 def train (epoch ): running_loss = 0.0 for batch_idx, data in enumerate (train_loader, 0 ): inputs, target = data optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() if (batch_idx % 300 == 299 ): print ('[%d, %5d] loss: %.3f' % (epoch + 1 , batch_idx + 1 , running_loss / 3000 )) running_loss = 0.0 def test (): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: images, labels = data outputs = model(images) _, predicted = torch.max (outputs.data, dim = 1 ) total += labels.size(0 ) correct += (predicted == labels).sum ().item() print ('Accuracy on test set: %d %%' % (100 * correct / total))
1 2 3 4 if __name__ == '__main__' : for epoch in range (10 ): train(epoch) test()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 [1, 300] loss: 0.217 [1, 600] loss: 0.090 [1, 900] loss: 0.043 Accuracy on test set: 89 % [2, 300] loss: 0.032 [2, 600] loss: 0.028 [2, 900] loss: 0.022 Accuracy on test set: 93 % [3, 300] loss: 0.019 [3, 600] loss: 0.017 [3, 900] loss: 0.015 Accuracy on test set: 95 % [4, 300] loss: 0.013 [4, 600] loss: 0.013 [4, 900] loss: 0.012 Accuracy on test set: 95 % [5, 300] loss: 0.010 [5, 600] loss: 0.009 [5, 900] loss: 0.009 Accuracy on test set: 96 % [6, 300] loss: 0.008 [6, 600] loss: 0.008 [6, 900] loss: 0.007 Accuracy on test set: 97 % [7, 300] loss: 0.006 [7, 600] loss: 0.006 [7, 900] loss: 0.007 Accuracy on test set: 97 % [8, 300] loss: 0.005 [8, 600] loss: 0.005 [8, 900] loss: 0.005 Accuracy on test set: 97 % [9, 300] loss: 0.004 [9, 600] loss: 0.004 [9, 900] loss: 0.004 Accuracy on test set: 97 % [10, 300] loss: 0.003 [10, 600] loss: 0.003 [10, 900] loss: 0.004 Accuracy on test set: 97 %