基於深度學習和OpenCV實現目標檢測

Posted on 2021-12-27 by WalkonNet

使用深度學習和 OpenCV 進行目標檢測

基於深度學習的對象檢測時，您可能會遇到三種主要的對象檢測方法：

Faster R-CNNs (Ren et al., 2015)

You Only Look Once (YOLO) (Redmon et al., 2015)

Single Shot Detectors (SSD)（Liu 等人，2015 年）

Faster R-CNNs 可能是使用深度學習進行對象檢測最“聽說”的方法；然而，該技術可能難以理解（特別是對於深度學習的初學者）、難以實施且難以訓練。

此外，即使使用“更快”的 R-CNN 實現（其中“R”代表“區域提議”），算法也可能非常慢，大約為 7 FPS。

如果追求純粹的速度，那麼我們傾向於使用 YOLO，因為這種算法要快得多，能夠在 Titan X GPU 上處理 40-90 FPS。 YOLO 的超快變體甚至可以達到 155 FPS。

YOLO 的問題在於它的準確性不高。

最初由 Google 開發的 SSD 是兩者之間的平衡。該算法比 Faster R-CNN 更直接。

MobileNets：高效（深度）神經網絡

在構建對象檢測網絡時，我們通常使用現有的網絡架構，例如 VGG 或 ResNet，這些網絡架構可能非常大，大約 200-500MB。由於其龐大的規模和由此產生的計算數量，諸如此類的網絡架構不適合資源受限的設備。相反，我們可以使用 Google 研究人員的另一篇論文 MobileNets（Howard 等人，2017 年）。我們稱這些網絡為“MobileNets”，因為它們專為資源受限的設備而設計，例如您的智能手機。 MobileNet 與傳統 CNN 的不同之處在於使用瞭深度可分離卷積。深度可分離卷積背後的一般思想是將卷積分成兩個階段：

3×3 深度卷積。
隨後是 1×1 逐點卷積。

這使我們能夠實際減少網絡中的參數數量。問題是犧牲瞭準確性——MobileNets 通常不如它們的大哥們準確…… ……但它們的資源效率要高得多。

使用 OpenCV 進行基於深度學習的對象檢測

MobileNet SSD 首先在 COCO 數據集（上下文中的常見對象）上進行訓練，然後在 PASCAL VOC 上進行微調，達到 72.7% mAP（平均精度）。

因此，我們可以檢測圖像中的 20 個對象（背景類為 +1），包括飛機、自行車、鳥、船、瓶子、公共汽車、汽車、貓、椅子、牛、餐桌、狗、馬、摩托車、人、盆栽植物、羊、沙發、火車和電視顯示器。

在本節中，我們將使用 OpenCV 中的 MobileNet SSD + 深度神經網絡 (dnn) 模塊來構建我們的目標檢測器。

打開一個新文件，將其命名為 object_detection.py ，並插入以下代碼：

import numpy as np
import cv2
if __name__=="__main__":
	image_name = '11.jpg'
	prototxt = 'MobileNetSSD_deploy.prototxt.txt'
	model_path = 'MobileNetSSD_deploy.caffemodel'
	confidence_ta = 0.2
	# 初始化MobileNet SSD訓練的類標簽列表
	# 檢測，然後為每個類生成一組邊界框顏色
	CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
			   "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
			   "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
			   "sofa", "train", "tvmonitor"]
    COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

導入需要的包。

定義全局參數：

image_name：輸入圖像的路徑。
prototxt ：Caffe prototxt 文件的路徑。
model_path ：預訓練模型的路徑。
confidence_ta ：過濾弱檢測的最小概率閾值。默認值為 20%。

接下來，讓我們初始化類標簽和邊界框顏色。

	# load our serialized model from disk
	print("[INFO] loading model...")
	net = cv2.dnn.readNetFromCaffe(prototxt, model_path)
	# 加載輸入圖像並為圖像構造一個輸入blob
	# 將大小調整為固定的300x300像素。
	# （註意：SSD模型的輸入是300x300像素）
	image = cv2.imread(image_name)
	(h, w) = image.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843,
								 (300, 300), 127.5)
	# 通過網絡傳遞blob並獲得檢測結果和
	# 預測
	print("[INFO] computing object detections...")
	net.setInput(blob)
	detections = net.forward()

從磁盤加載模型。

讀取圖片。

提取高度和寬度（第 35 行），並從圖像中計算一個 300 x 300 像素的 blob。

將blob放入神經網絡。

計算輸入的前向傳遞，將結果存儲為 detections。

	# 循環檢測結果
	for i in np.arange(0, detections.shape[2]):
		# 提取與數據相關的置信度（即概率）
		# 預測
		confidence = detections[0, 0, i, 2]
		# 通過確保“置信度”來過濾掉弱檢測
		# 大於最小置信度
		if confidence > confidence_ta:
			# 從`detections`中提取類標簽的索引，
			# 然後計算物體邊界框的 (x, y) 坐標
			idx = int(detections[0, 0, i, 1])
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")
			# 顯示預測
			label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
			print("[INFO] {}".format(label))
			cv2.rectangle(image, (startX, startY), (endX, endY),
						  COLORS[idx], 2)
			y = startY - 15 if startY - 15 > 15 else startY + 15
			cv2.putText(image, label, (startX, y),
						cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
	# show the output image
	cv2.imshow("Output", image)
	cv2.imwrite("output.jpg", image)
	cv2.waitKey(0)

循環檢測，首先我們提取置信度值。

如果置信度高於我們的最小閾值，我們提取類標簽索引並計算檢測到的對象周圍的邊界框。

然後，提取框的 (x, y) 坐標，我們將很快使用它來繪制矩形和顯示文本。

接下來，構建一個包含 CLASS 名稱和置信度的文本標簽。

使用標簽，將其打印到終端，然後使用之前提取的 (x, y) 坐標在對象周圍繪制一個彩色矩形。

通常，希望標簽顯示在矩形上方，但如果沒有空間，我們會將其顯示在矩形頂部下方。

最後，使用剛剛計算的 y 值將彩色文本覆蓋到圖像上。

運行結果：

使用 OpenCV 檢測視頻

打開一個新文件，將其命名為 video_object_detection.py ，並插入以下代碼：

video_name = '12.mkv'
prototxt = 'MobileNetSSD_deploy.prototxt.txt'
model_path = 'MobileNetSSD_deploy.caffemodel'
confidence_ta = 0.2

# initialize the list of class labels MobileNet SSD was trained to
# detect, then generate a set of bounding box colors for each class
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
           "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
           "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
           "sofa", "train", "tvmonitor"]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(prototxt, model_path)

# initialze the video stream, allow the camera to sensor to warmup,
# and initlaize the FPS counter
print('[INFO] starting video stream...')
vs = cv2.VideoCapture(video_name)
fps = 30    #保存視頻的FPS，可以適當調整
size=(600,325)
fourcc=cv2.VideoWriter_fourcc(*'XVID')
videowrite=cv2.VideoWriter('output.avi',fourcc,fps,size)
time.sleep(2.0)

定義全局參數：

video_name：輸入視頻的路徑。
prototxt ：Caffe prototxt 文件的路徑。
model_path ：預訓練模型的路徑。
confidence_ta ：過濾弱檢測的最小概率閾值。默認值為 20%。

接下來，讓我們初始化類標簽和邊界框顏色。

加載模型。

初始化VideoCapture對象。

設置VideoWriter對象以及參數。size的大小由下面的代碼決定，需要保持一致，否則不能保存視頻。

接下就是循環視頻的幀，然後輸入到檢測器進行檢測，這一部分的邏輯和圖像檢測一致。代碼如下：

# loop over the frames from the video stream
while True:
    ret_val, frame = vs.read()
    if ret_val is False:
        break
    frame = imutils.resize(frame, width=1080)
    print(frame.shape)
    # grab the frame dimentions and convert it to a blob
    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007843, (300, 300), 127.5)

    # pass the blob through the network and obtain the detections and predictions
    net.setInput(blob)
    detections = net.forward()

    # loop over the detections
    for i in np.arange(0, detections.shape[2]):
        # extract the confidence (i.e., probability) associated with
        # the prediction
        confidence = detections[0, 0, i, 2]

        # filter out weak detections by ensuring the `confidence` is
        # greater than the minimum confidence
        if confidence > confidence_ta:
            # extract the index of the class label from the
            # `detections`, then compute the (x, y)-coordinates of
            # the bounding box for the object
            idx = int(detections[0, 0, i, 1])
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")

            # draw the prediction on the frame
            label = "{}: {:.2f}%".format(CLASSES[idx],
                                         confidence * 100)
            cv2.rectangle(frame, (startX, startY), (endX, endY),
                          COLORS[idx], 2)
            y = startY - 15 if startY - 15 > 15 else startY + 15
            cv2.putText(frame, label, (startX, y),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
    # show the output frame
    cv2.imshow("Frame", frame)
    videowrite.write(frame)
    key = cv2.waitKey(1) & 0xFF

    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break
videowrite.release()
# do a bit of cleanup
cv2.destroyAllWindows()
vs.release()

運行結果：

https://www.bilibili.com/video/BV19i4y197kh?spm_id_from=333.999.0.0

以上就是基於深度學習和OpenCV實現目標檢測的詳細內容，更多關於深度學習 OpenCV目標檢測的資料請關註WalkonNet其它相關文章！

基於深度學習和OpenCV實現目標檢測

目錄

使用深度學習和 OpenCV 進行目標檢測

MobileNets：高效（深度）神經網絡

使用 OpenCV 進行基於深度學習的對象檢測

使用 OpenCV 檢測視頻

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

使用深度學習和 OpenCV 進行目標檢測

MobileNets：高效（深度）神經網絡

使用 OpenCV 進行基於深度學習的對象檢測

使用 OpenCV 檢測視頻

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆