ConvNeXt實戰之實現植物幼苗分類
前言
ConvNeXts 完全由標準 ConvNet 模塊構建,在準確性和可擴展性方面與 Transformer 競爭,實現 87.8% ImageNet top-1 準確率,在 COCO 檢測和 ADE20K 分割方面優於 Swin Transformers,同時保持標準 ConvNet 的簡單性和效率。
論文鏈接:https://arxiv.org/pdf/2201.03545.pdf
代碼鏈接:https://github.com/facebookresearch/ConvNeXt
如果github不能下載,可以使用下面的連接:
https://gitcode.net/hhhhhhhhhhwwwwwwwwww/ConvNeXt
ConvNexts的特點;
使用7×7的卷積核,在VGG、ResNet等經典的CNN模型中,使用的是小卷積核,但是ConvNexts證明瞭大卷積和的有效性。作者嘗試瞭幾種內核大小,包括 3、5、7、9 和 11。網絡的性能從 79.9% (3×3) 提高到 80.6% (7×7),而網絡的 FLOPs 大致保持不變, 內核大小的好處在 7×7 處達到飽和點。
使用GELU(高斯誤差線性單元)激活函數。GELUs是 dropout、zoneout、Relus的綜合,GELUs對於輸入乘以一個0,1組成的mask,而該mask的生成則是依概率隨機的依賴於輸入。實驗效果要比Relus與ELUs都要好。下圖是實驗數據:
使用LayerNorm而不是BatchNorm。
倒置瓶頸。圖 3 (a) 至 (b) 說明瞭這些配置。盡管深度卷積層的 FLOPs 增加瞭,但由於下采樣殘差塊的快捷 1×1 卷積層的 FLOPs 顯著減少,這種變化將整個網絡的 FLOPs 減少到 4.6G。成績從 80.5% 提高到 80.6%。在 ResNet-200/Swin-B 方案中,這一步帶來瞭更多的收益(81.9% 到 82.6%),同時也減少瞭 FLOP。
ConvNeXt殘差模塊
殘差模塊是整個模型的核心。如下圖:
代碼實現:
class Block(nn.Module): r""" ConvNeXt Block. There are two equivalent implementations: (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back We use (2) as we find it slightly faster in PyTorch Args: dim (int): Number of input channels. drop_path (float): Stochastic depth rate. Default: 0.0 layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6. """ def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6): super().__init__() self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv self.norm = LayerNorm(dim, eps=1e-6) self.pwconv1 = nn.Linear(dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers self.act = nn.GELU() self.pwconv2 = nn.Linear(4 * dim, dim) self.gamma = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True) if layer_scale_init_value > 0 else None self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() def forward(self, x): input = x x = self.dwconv(x) x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C) x = self.norm(x) x = self.pwconv1(x) x = self.act(x) x = self.pwconv2(x) if self.gamma is not None: x = self.gamma * x x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W) x = input + self.drop_path(x) return x
數據增強Cutout和Mixup
ConvNext使用瞭Cutout和Mixup,為瞭提高成績我在我的代碼中也加入這兩種增強方式。官方使用timm,我沒有采用官方的,而選擇用torchtoolbox。安裝命令:
pip install torchtoolbox
Cutout實現,在transforms中。
from torchtoolbox.transform import Cutout # 數據預處理 transform = transforms.Compose([ transforms.Resize((224, 224)), Cutout(), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
Mixup實現,在train方法中。需要導入包:from torchtoolbox.tools import mixup_data, mixup_criterion
for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True) data, labels_a, labels_b, lam = mixup_data(data, target, alpha) optimizer.zero_grad() output = model(data) loss = mixup_criterion(criterion, output, labels_a, labels_b, lam) loss.backward() optimizer.step() print_loss = loss.data.item()
項目結構
使用tree命令,打印項目結構
數據集
數據集選用植物幼苗分類,總共12類。數據集連接如下:
鏈接 提取碼:syng
在工程的根目錄新建data文件夾,獲取數據集後,將trian和test解壓放到data文件夾下面,如下圖:
導入模型文件
從官方的鏈接中找到convnext.py文件,將其放入Model文件夾中。如圖:
安裝庫,並導入需要的庫
模型用到瞭timm庫,如果沒有需要安裝,執行命令:
pip install timm
新建train_connext.py文件,導入所需要的包:
import torch.optim as optim import torch import torch.nn as nn import torch.nn.parallel import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from dataset.dataset import SeedlingData from torch.autograd import Variable from Model.convnext import convnext_tiny from torchtoolbox.tools import mixup_data, mixup_criterion from torchtoolbox.transform import Cutout
設置全局參數
設置使用GPU,設置學習率、BatchSize、epoch等參數。
# 設置全局參數 modellr = 1e-4 BATCH_SIZE = 8 EPOCHS = 300 DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
數據預處理
數據處理比較簡單,沒有做復雜的嘗試,有興趣的可以加入一些處理。
# 數據預處理 transform = transforms.Compose([ transforms.Resize((224, 224)), Cutout(), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]) transform_test = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
數據讀取
然後我們在dataset文件夾下面新建 init.py和dataset.py,在mydatasets.py文件夾寫入下面的代碼:
說一下代碼的核心邏輯。
第一步 建立字典,定義類別對應的ID,用數字代替類別。
第二步 在__init__裡面編寫獲取圖片路徑的方法。測試集隻有一層路徑直接讀取,訓練集在train文件夾下面是類別文件夾,先獲取到類別,再獲取到具體的圖片路徑。然後使用sklearn中切分數據集的方法,按照7:3的比例切分訓練集和驗證集。
第三步 在__getitem__方法中定義讀取單個圖片和類別的方法,由於圖像中有位深度32位的,所以我在讀取圖像的時候做瞭轉換。
代碼如下:
# coding:utf8 import os from PIL import Image from torch.utils import data from torchvision import transforms as T from sklearn.model_selection import train_test_split Labels = {'Black-grass': 0, 'Charlock': 1, 'Cleavers': 2, 'Common Chickweed': 3, 'Common wheat': 4, 'Fat Hen': 5, 'Loose Silky-bent': 6, 'Maize': 7, 'Scentless Mayweed': 8, 'Shepherds Purse': 9, 'Small-flowered Cranesbill': 10, 'Sugar beet': 11} class SeedlingData(data.Dataset): def __init__(self, root, transforms=None, train=True, test=False): """ 主要目標: 獲取所有圖片的地址,並根據訓練,驗證,測試劃分數據 """ self.test = test self.transforms = transforms if self.test: imgs = [os.path.join(root, img) for img in os.listdir(root)] self.imgs = imgs else: imgs_labels = [os.path.join(root, img) for img in os.listdir(root)] imgs = [] for imglable in imgs_labels: for imgname in os.listdir(imglable): imgpath = os.path.join(imglable, imgname) imgs.append(imgpath) trainval_files, val_files = train_test_split(imgs, test_size=0.3, random_state=42) if train: self.imgs = trainval_files else: self.imgs = val_files def __getitem__(self, index): """ 一次返回一張圖片的數據 """ img_path = self.imgs[index] img_path = img_path.replace("\\", '/') if self.test: label = -1 else: labelname = img_path.split('/')[-2] label = Labels[labelname] data = Image.open(img_path).convert('RGB') data = self.transforms(data) return data, label def __len__(self): return len(self.imgs)
然後我們在train.py調用SeedlingData讀取數據 ,記著導入剛才寫的dataset.py(from mydatasets import SeedlingData)
# 讀取數據 dataset_train = SeedlingData('data/train', transforms=transform, train=True) dataset_test = SeedlingData("data/train", transforms=transform_test, train=False) # 導入數據 train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE, shuffle=True) test_loader = torch.utils.data.DataLoader(dataset_test, batch_size=BATCH_SIZE, shuffle=False)
設置模型
設置loss函數為nn.CrossEntropyLoss()。
- 設置模型為coatnet_0,修改最後一層全連接輸出改為12(數據集的類別)。
- 優化器設置為adam。
- 學習率調整策略改為餘弦退火
# 實例化模型並且移動到GPU criterion = nn.CrossEntropyLoss() #criterion = SoftTargetCrossEntropy() model_ft = convnext_tiny(pretrained=True) num_ftrs = model_ft.head.in_features model_ft.fc = nn.Linear(num_ftrs, 12) model_ft.to(DEVICE) # 選擇簡單暴力的Adam優化器,學習率調低 optimizer = optim.Adam(model_ft.parameters(), lr=modellr) cosine_schedule = optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer,T_max=20,eta_min=1e-9)
定義訓練和驗證函數
alpha=0.2 Mixup所需的參數。
# 定義訓練過程 alpha=0.2 def train(model, device, train_loader, optimizer, epoch): model.train() sum_loss = 0 total_num = len(train_loader.dataset) print(total_num, len(train_loader)) for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device, non_blocking=True), target.to(device, non_blocking=True) data, labels_a, labels_b, lam = mixup_data(data, target, alpha) optimizer.zero_grad() output = model(data) loss = mixup_criterion(criterion, output, labels_a, labels_b, lam) loss.backward() optimizer.step() print_loss = loss.data.item() sum_loss += print_loss if (batch_idx + 1) % 10 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, (batch_idx + 1) * len(data), len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item())) ave_loss = sum_loss / len(train_loader) print('epoch:{},loss:{}'.format(epoch, ave_loss)) ACC=0 # 驗證過程 def val(model, device, test_loader): global ACC model.eval() test_loss = 0 correct = 0 total_num = len(test_loader.dataset) print(total_num, len(test_loader)) with torch.no_grad(): for data, target in test_loader: data, target = Variable(data).to(device), Variable(target).to(device) output = model(data) loss = criterion(output, target) _, pred = torch.max(output.data, 1) correct += torch.sum(pred == target) print_loss = loss.data.item() test_loss += print_loss correct = correct.data.item() acc = correct / total_num avgloss = test_loss / len(test_loader) print('\nVal set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( avgloss, correct, len(test_loader.dataset), 100 * acc)) if acc > ACC: torch.save(model_ft, 'model_' + str(epoch) + '_' + str(round(acc, 3)) + '.pth') ACC = acc # 訓練 for epoch in range(1, EPOCHS + 1): train(model_ft, DEVICE, train_loader, optimizer, epoch) cosine_schedule.step() val(model_ft, DEVICE, test_loader)
然後就可以開始訓練瞭
訓練10個epoch就能得到不錯的結果:
測試
第一種寫法
測試集存放的目錄如下圖:
第一步 定義類別,這個類別的順序和訓練時的類別順序對應,一定不要改變順序!!!!
classes = ('Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent', 'Maize', 'Scentless Mayweed', 'Shepherds Purse', 'Small-flowered Cranesbill', 'Sugar beet')
第二步 定義transforms,transforms和驗證集的transforms一樣即可,別做數據增強。
transform_test = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ])
第三步 加載model,並將模型放在DEVICE裡。
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = torch.load("model_8_0.971.pth") model.eval() model.to(DEVICE)
第四步 讀取圖片並預測圖片的類別,在這裡註意,讀取圖片用PIL庫的Image。不要用cv2,transforms不支持。
path = 'data/test/' testList = os.listdir(path) for file in testList: img = Image.open(path + file) img = transform_test(img) img.unsqueeze_(0) img = Variable(img).to(DEVICE) out = model(img) # Predict _, pred = torch.max(out.data, 1) print('Image Name:{},predict:{}'.format(file, classes[pred.data.item()]))
測試完整代碼:
import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image from torch.autograd import Variable import os classes = ('Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent', 'Maize', 'Scentless Mayweed', 'Shepherds Purse', 'Small-flowered Cranesbill', 'Sugar beet') transform_test = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]) ]) DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = torch.load("model_8_0.971.pth") model.eval() model.to(DEVICE) path = 'data/test/' testList = os.listdir(path) for file in testList: img = Image.open(path + file) img = transform_test(img) img.unsqueeze_(0) img = Variable(img).to(DEVICE) out = model(img) # Predict _, pred = torch.max(out.data, 1) print('Image Name:{},predict:{}'.format(file, classes[pred.data.item()]))
運行結果:
第二種寫法
第二種,使用自定義的Dataset讀取圖片。前三步同上,差別主要在第四步。讀取數據的時候,使用Dataset的SeedlingData讀取。
dataset_test =SeedlingData('data/test/', transform_test,test=True) print(len(dataset_test)) # 對應文件夾的label for index in range(len(dataset_test)): item = dataset_test[index] img, label = item img.unsqueeze_(0) data = Variable(img).to(DEVICE) output = model(data) _, pred = torch.max(output.data, 1) print('Image Name:{},predict:{}'.format(dataset_test.imgs[index], classes[pred.data.item()])) index += 1
運行結果:
以上就是ConvNeXt實戰之實現植物幼苗分類的詳細內容,更多關於ConvNeXt植物幼苗分類的資料請關註WalkonNet其它相關文章!
推薦閱讀:
- CoAtNet實戰之對植物幼苗圖像進行分類(pytorch)
- Python機器學習之基於Pytorch實現貓狗分類
- pytorch深度神經網絡入門準備自己的圖片數據
- pytorch加載自己的圖片數據集的2種方法詳解
- Pytorch實現圖像識別之數字識別(附詳細註釋)