pytorch 實現L2和L1正則化regularization的操作

Posted on 2021-03-03 by WalkonNet

1.torch.optim優化器實現L2正則化

torch.optim集成瞭很多優化器，如SGD，Adadelta，Adam，Adagrad，RMSprop等，這些優化器自帶的一個參數weight_decay，用於指定權值衰減率，相當於L2正則化中的λ參數，註意torch.optim集成的優化器隻有L2正則化方法，你可以查看註釋，參數weight_decay 的解析是：

weight_decay (float, optional): weight decay (L2 penalty) (default: 0)

使用torch.optim的優化器，可如下設置L2正則化

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.01)

但是這種方法存在幾個問題，

（1）一般正則化，隻是對模型的權重W參數進行懲罰，而偏置參數b是不進行懲罰的，而torch.optim的優化器weight_decay參數指定的權值衰減是對網絡中的所有參數，包括權值w和偏置b同時進行懲罰。很多時候如果對b 進行L2正則化將會導致嚴重的欠擬合，因此這個時候一般隻需要對權值w進行正則即可。（PS：這個我真不確定，源碼解析是 weight decay (L2 penalty) ，但有些網友說這種方法會對參數偏置b也進行懲罰，可解惑的網友給個明確的答復）

（2）缺點：torch.optim的優化器固定實現L2正則化，不能實現L1正則化。如果需要L1正則化，可如下實現：

（3）根據正則化的公式，加入正則化後，loss會變原來大，比如weight_decay=1的loss為10，那麼weight_decay=100時，loss輸出應該也提高100倍左右。而采用torch.optim的優化器的方法，如果你依然采用loss_fun= nn.CrossEntropyLoss()進行計算loss，你會發現，不管你怎麼改變weight_decay的大小，loss會跟之前沒有加正則化的大小差不多。這是因為你的loss_fun損失函數沒有把權重W的損失加上。

（4）采用torch.optim的優化器實現正則化的方法，是沒問題的！隻不過很容易讓人產生誤解，對鄙人而言，我更喜歡TensorFlow的正則化實現方法，隻需要tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)，實現過程幾乎跟正則化的公式對應的上。

（5）Github項目源碼：點擊進入

為瞭，解決這些問題，我特定自定義正則化的方法，類似於TensorFlow正則化實現方法。

2. 如何判斷正則化作用瞭模型？

一般來說，正則化的主要作用是避免模型產生過擬合，當然啦，過擬合問題，有時候是難以判斷的。但是，要判斷正則化是否作用瞭模型，還是很容易的。下面我給出兩組訓練時產生的loss和Accuracy的log信息，一組是未加入正則化的，一組是加入正則化：

2.1 未加入正則化loss和Accuracy

優化器采用Adam，並且設置參數weight_decay=0.0，即無正則化的方法

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.0)

訓練時輸出的 loss和Accuracy信息

step/epoch:0/0,Train Loss: 2.418065, Acc: [0.15625]
step/epoch:10/0,Train Loss: 5.194936, Acc: [0.34375]
step/epoch:20/0,Train Loss: 0.973226, Acc: [0.8125]
step/epoch:30/0,Train Loss: 1.215165, Acc: [0.65625]
step/epoch:40/0,Train Loss: 1.808068, Acc: [0.65625]
step/epoch:50/0,Train Loss: 1.661446, Acc: [0.625]
step/epoch:60/0,Train Loss: 1.552345, Acc: [0.6875]
step/epoch:70/0,Train Loss: 1.052912, Acc: [0.71875]
step/epoch:80/0,Train Loss: 0.910738, Acc: [0.75]
step/epoch:90/0,Train Loss: 1.142454, Acc: [0.6875]
step/epoch:100/0,Train Loss: 0.546968, Acc: [0.84375]
step/epoch:110/0,Train Loss: 0.415631, Acc: [0.9375]
step/epoch:120/0,Train Loss: 0.533164, Acc: [0.78125]
step/epoch:130/0,Train Loss: 0.956079, Acc: [0.6875]
step/epoch:140/0,Train Loss: 0.711397, Acc: [0.8125]

2.1 加入正則化loss和Accuracy

優化器采用Adam，並且設置參數weight_decay=10.0，即正則化的權重lambda =10.0

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=10.0)

這時，訓練時輸出的 loss和Accuracy信息：

step/epoch:0/0,Train Loss: 2.467985, Acc: [0.09375]
step/epoch:10/0,Train Loss: 5.435320, Acc: [0.40625]
step/epoch:20/0,Train Loss: 1.395482, Acc: [0.625]
step/epoch:30/0,Train Loss: 1.128281, Acc: [0.6875]
step/epoch:40/0,Train Loss: 1.135289, Acc: [0.6875]
step/epoch:50/0,Train Loss: 1.455040, Acc: [0.5625]
step/epoch:60/0,Train Loss: 1.023273, Acc: [0.65625]
step/epoch:70/0,Train Loss: 0.855008, Acc: [0.65625]
step/epoch:80/0,Train Loss: 1.006449, Acc: [0.71875]
step/epoch:90/0,Train Loss: 0.939148, Acc: [0.625]
step/epoch:100/0,Train Loss: 0.851593, Acc: [0.6875]
step/epoch:110/0,Train Loss: 1.093970, Acc: [0.59375]
step/epoch:120/0,Train Loss: 1.699520, Acc: [0.625]
step/epoch:130/0,Train Loss: 0.861444, Acc: [0.75]
step/epoch:140/0,Train Loss: 0.927656, Acc: [0.625]

當weight_decay=10000.0

step/epoch:0/0,Train Loss: 2.337354, Acc: [0.15625]
step/epoch:10/0,Train Loss: 2.222203, Acc: [0.125]
step/epoch:20/0,Train Loss: 2.184257, Acc: [0.3125]
step/epoch:30/0,Train Loss: 2.116977, Acc: [0.5]
step/epoch:40/0,Train Loss: 2.168895, Acc: [0.375]
step/epoch:50/0,Train Loss: 2.221143, Acc: [0.1875]
step/epoch:60/0,Train Loss: 2.189801, Acc: [0.25]
step/epoch:70/0,Train Loss: 2.209837, Acc: [0.125]
step/epoch:80/0,Train Loss: 2.202038, Acc: [0.34375]
step/epoch:90/0,Train Loss: 2.192546, Acc: [0.25]
step/epoch:100/0,Train Loss: 2.215488, Acc: [0.25]
step/epoch:110/0,Train Loss: 2.169323, Acc: [0.15625]
step/epoch:120/0,Train Loss: 2.166457, Acc: [0.3125]
step/epoch:130/0,Train Loss: 2.144773, Acc: [0.40625]
step/epoch:140/0,Train Loss: 2.173397, Acc: [0.28125]

2.3 正則化說明

就整體而言，對比加入正則化和未加入正則化的模型，訓練輸出的loss和Accuracy信息，我們可以發現，加入正則化後，loss下降的速度會變慢，準確率Accuracy的上升速度會變慢，並且未加入正則化模型的loss和Accuracy的浮動比較大（或者方差比較大），而加入正則化的模型訓練loss和Accuracy，表現的比較平滑。

並且隨著正則化的權重lambda越大，表現的更加平滑。這其實就是正則化的對模型的懲罰作用，通過正則化可以使得模型表現的更加平滑，即通過正則化可以有效解決模型過擬合的問題。

3.自定義正則化的方法

為瞭解決torch.optim優化器隻能實現L2正則化以及懲罰網絡中的所有參數的缺陷，這裡實現類似於TensorFlow正則化的方法。

3.1 自定義正則化Regularization類

這裡封裝成一個實現正則化的Regularization類，各個方法都給出瞭註釋，自己慢慢看吧，有問題再留言吧

# 檢查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device='cuda'
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
class Regularization(torch.nn.Module):
 def __init__(self,model,weight_decay,p=2):
  '''
  :param model 模型
  :param weight_decay:正則化參數
  :param p: 范數計算中的冪指數值，默認求2范數,
     當p=0為L2正則化,p=1為L1正則化
  '''
  super(Regularization, self).__init__()
  if weight_decay <= 0:
   print("param weight_decay can not <=0")
   exit(0)
  self.model=model
  self.weight_decay=weight_decay
  self.p=p
  self.weight_list=self.get_weight(model)
  self.weight_info(self.weight_list)
 
 def to(self,device):
  '''
  指定運行模式
  :param device: cude or cpu
  :return:
  '''
  self.device=device
  super().to(device)
  return self
 
 def forward(self, model):
  self.weight_list=self.get_weight(model)#獲得最新的權重
  reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)
  return reg_loss
 
 def get_weight(self,model):
  '''
  獲得模型的權重列表
  :param model:
  :return:
  '''
  weight_list = []
  for name, param in model.named_parameters():
   if 'weight' in name:
    weight = (name, param)
    weight_list.append(weight)
  return weight_list
 
 def regularization_loss(self,weight_list, weight_decay, p=2):
  '''
  計算張量范數
  :param weight_list:
  :param p: 范數計算中的冪指數值，默認求2范數
  :param weight_decay:
  :return:
  '''
  # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
  # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
  # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
  # reg_loss=torch.FloatTensor([0.]).to(self.device)
  reg_loss=0
  for name, w in weight_list:
   l2_reg = torch.norm(w, p=p)
   reg_loss = reg_loss + l2_reg
 
  reg_loss=weight_decay*reg_loss
  return reg_loss
 
 def weight_info(self,weight_list):
  '''
  打印權重列表信息
  :param weight_list:
  :return:
  '''
  print("---------------regularization weight---------------")
  for name ,w in weight_list:
   print(name)
  print("---------------------------------------------------")

3.2 Regularization使用方法

使用方法很簡單，就當一個普通Pytorch模塊來使用：例如

# 檢查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
weight_decay=100.0 # 正則化參數
 
model = my_net().to(device)
# 初始化正則化
if weight_decay>0:
 reg_loss=Regularization(model, weight_decay, p=2).to(device)
else:
 print("no regularization")
 
criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy
optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定參數weight_decay
 
# train
batch_train_data=...
batch_train_label=...
 
out = model(batch_train_data)
 
# loss and regularization
loss = criterion(input=out, target=batch_train_label)
if weight_decay > 0:
 loss = loss + reg_loss(model)
total_loss = loss.item()
 
# backprop
optimizer.zero_grad()#清除當前所有的累積梯度
total_loss.backward()
optimizer.step()

訓練時輸出的 loss和Accuracy信息：

（1）當weight_decay=0.0時，未使用正則化

step/epoch:0/0,Train Loss: 2.379627, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1.473092, Acc: [0.6875]
step/epoch:20/0,Train Loss: 0.931847, Acc: [0.8125]
step/epoch:30/0,Train Loss: 0.625494, Acc: [0.875]
step/epoch:40/0,Train Loss: 2.241885, Acc: [0.53125]
step/epoch:50/0,Train Loss: 1.132131, Acc: [0.6875]
step/epoch:60/0,Train Loss: 0.493038, Acc: [0.8125]
step/epoch:70/0,Train Loss: 0.819410, Acc: [0.78125]
step/epoch:80/0,Train Loss: 0.996497, Acc: [0.71875]
step/epoch:90/0,Train Loss: 0.474205, Acc: [0.8125]
step/epoch:100/0,Train Loss: 0.744587, Acc: [0.8125]
step/epoch:110/0,Train Loss: 0.502217, Acc: [0.78125]
step/epoch:120/0,Train Loss: 0.531865, Acc: [0.8125]
step/epoch:130/0,Train Loss: 1.016807, Acc: [0.875]
step/epoch:140/0,Train Loss: 0.411701, Acc: [0.84375]

（2）當weight_decay=10.0時，使用正則化

---------------------------------------------------
step/epoch:0/0,Train Loss: 1563.402832, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1530.002686, Acc: [0.53125]
step/epoch:20/0,Train Loss: 1495.115234, Acc: [0.71875]
step/epoch:30/0,Train Loss: 1461.114136, Acc: [0.78125]
step/epoch:40/0,Train Loss: 1427.868164, Acc: [0.6875]
step/epoch:50/0,Train Loss: 1395.430054, Acc: [0.6875]
step/epoch:60/0,Train Loss: 1363.358154, Acc: [0.5625]
step/epoch:70/0,Train Loss: 1331.439697, Acc: [0.75]
step/epoch:80/0,Train Loss: 1301.334106, Acc: [0.625]
step/epoch:90/0,Train Loss: 1271.505005, Acc: [0.6875]
step/epoch:100/0,Train Loss: 1242.488647, Acc: [0.75]
step/epoch:110/0,Train Loss: 1214.184204, Acc: [0.59375]
step/epoch:120/0,Train Loss: 1186.174561, Acc: [0.71875]
step/epoch:130/0,Train Loss: 1159.148438, Acc: [0.78125]
step/epoch:140/0,Train Loss: 1133.020020, Acc: [0.65625]

（3）當weight_decay=10000.0時，使用正則化

step/epoch:0/0,Train Loss: 1570211.500000, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1522952.125000, Acc: [0.3125]
step/epoch:20/0,Train Loss: 1486256.125000, Acc: [0.125]
step/epoch:30/0,Train Loss: 1451671.500000, Acc: [0.25]
step/epoch:40/0,Train Loss: 1418959.750000, Acc: [0.15625]
step/epoch:50/0,Train Loss: 1387154.000000, Acc: [0.125]
step/epoch:60/0,Train Loss: 1355917.500000, Acc: [0.125]
step/epoch:70/0,Train Loss: 1325379.500000, Acc: [0.125]
step/epoch:80/0,Train Loss: 1295454.125000, Acc: [0.3125]
step/epoch:90/0,Train Loss: 1266115.375000, Acc: [0.15625]
step/epoch:100/0,Train Loss: 1237341.000000, Acc: [0.0625]
step/epoch:110/0,Train Loss: 1209186.500000, Acc: [0.125]
step/epoch:120/0,Train Loss: 1181584.250000, Acc: [0.125]
step/epoch:130/0,Train Loss: 1154600.125000, Acc: [0.1875]
step/epoch:140/0,Train Loss: 1128239.875000, Acc: [0.125]

對比torch.optim優化器的實現L2正則化方法，這種Regularization類的方法也同樣達到正則化的效果，並且與TensorFlow類似，loss把正則化的損失也計算瞭。

此外更改參數p，如當p=0表示L2正則化，p=1表示L1正則化。

4. Github項目源碼下載

《Github項目源碼》點擊進入

以上為個人經驗，希望能給大傢一個參考，也希望大傢多多支持WalkonNet。如有錯誤或未考慮完全的地方，望不吝賜教。

pytorch 實現L2和L1正則化regularization的操作

1.torch.optim優化器實現L2正則化

但是這種方法存在幾個問題，

2. 如何判斷正則化作用瞭模型？

2.1 未加入正則化loss和Accuracy

2.1 加入正則化loss和Accuracy

2.3 正則化說明

3.自定義正則化的方法

3.1 自定義正則化Regularization類

3.2 Regularization使用方法

4. Github項目源碼下載

推薦閱讀：

發佈留言取消回覆

近期文章

1.torch.optim優化器實現L2正則化

但是這種方法存在幾個問題，

2. 如何判斷正則化作用瞭模型？

2.1 未加入正則化loss和Accuracy

2.1 加入正則化loss和Accuracy

2.3 正則化說明

3.自定義正則化的方法

3.1 自定義正則化Regularization類

3.2 Regularization使用方法

4. Github項目源碼下載

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆