使用python模塊plotdigitizer摳取論文圖片中的數據實例詳解

技術背景

對於各行各業的研究人員來說,經常會面臨這樣的一個問題:有一篇不錯的文章裡面有很好的數據,但是這個數據在文章中僅以圖片的形式出現。而假如我們希望可以從該圖片中提取出數據,這樣就可以用我們自己的形式重新來展現這些數據,還可以額外再附上自己優化後的數據。因此從論文圖片中提取數據,是一個非常實際的需求。這裡以前面寫的量子退火的博客為例,博客中有這樣的一張圖片:

在這篇文章中,我們將介紹如何使用python從圖片上把數據摳取出來。

plotdigitizer的安裝

這裡我們使用pip來安裝python第三方庫plotdigitizer,該庫的主要功能就是可以自動化的從圖片中提取出數據,我們可以使用騰訊的pip鏡像源來加速我們的安裝過程:

[dechin@dechin-manjaro plotdigitizer]$ python3 -m pip install -i https://mirrors.cloud.tencent.com/pypi/simple plotdigitizer
Looking in indexes: https://mirrors.cloud.tencent.com/pypi/simple
Collecting plotdigitizer
 Downloading https://mirrors.cloud.tencent.com/pypi/packages/89/bb/ff753093458c05ce3b52fd17527b6b0622ca096aadcf561c6316320ab793/plotdigitizer-0.1.3-py3-none-any.whl (20 kB)
Collecting loguru<0.6.0,>=0.5.3
 Downloading https://mirrors.cloud.tencent.com/pypi/packages/6d/48/0a7d5847e3de329f1d0134baf707b689700b53bd3066a5a8cfd94b3c9fc8/loguru-0.5.3-py3-none-any.whl (57 kB)
   |████████████████████████████████| 57 kB 521 kB/s 
Collecting opencv-python<5.0.0,>=4.5.1
 Downloading https://mirrors.cloud.tencent.com/pypi/packages/2a/9a/ff309b530ac1b029bfdb9af3a95eaff0f5f45f6a2dbe37b3454ae8412f4c/opencv_python-4.5.1.48-cp38-cp38-manylinux2014_x86_64.whl (50.4 MB)
   |████████████████████████████████| 50.4 MB 467 kB/s 
Collecting numpy<2.0.0,>=1.19.5
 Downloading https://mirrors.cloud.tencent.com/pypi/packages/c7/e6/dccac76b7e825915ffb906beeba5a953597b6cfe1fe686b5276e122cb07c/numpy-1.20.1-cp38-cp38-manylinux2010_x86_64.whl (15.4 MB)
   |████████████████████████████████| 15.4 MB 20.4 MB/s 
Collecting matplotlib<4.0.0,>=3.3.4
 Downloading https://mirrors.cloud.tencent.com/pypi/packages/ab/20/60cfe5d611ac86df07b7b1f9b9582f22f7eda5edbe2124ba85bdf3133822/matplotlib-3.3.4-cp38-cp38-manylinux1_x86_64.whl (11.6 MB)
   |████████████████████████████████| 11.6 MB 4.4 MB/s 
Requirement already satisfied: python-dateutil>=2.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (0.10.0)
Requirement already satisfied: pillow>=6.2.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (8.0.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (1.3.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/dechin/anaconda3/lib/python3.8/site-packages (from matplotlib<4.0.0,>=3.3.4->plotdigitizer) (2.4.7)
Requirement already satisfied: six>=1.5 in /home/dechin/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.1->matplotlib<4.0.0,>=3.3.4->plotdigitizer) (1.15.0)
Installing collected packages: loguru, numpy, opencv-python, matplotlib, plotdigitizer
 Attempting uninstall: numpy
  Found existing installation: numpy 1.19.2
  Uninstalling numpy-1.19.2:
   Successfully uninstalled numpy-1.19.2
 Attempting uninstall: matplotlib
  Found existing installation: matplotlib 3.3.2
  Uninstalling matplotlib-3.3.2:
   Successfully uninstalled matplotlib-3.3.2
Successfully installed loguru-0.5.3 matplotlib-3.3.4 numpy-1.20.1 opencv-python-4.5.1.48 plotdigitizer-0.1.3

通過運行幫助指令,我們可以查看是否安裝成功:

[dechin@dechin-manjaro plotdigitizer]$ plotdigitizer -h
usage: plotdigitizer [-h] --data-point DATA_POINT [--location LOCATION] [--plot PLOT] [--output OUTPUT]
           [--preprocess] [--debug]
           INPUT

Digitize image.

positional arguments:
 INPUT         Input image file.

optional arguments:
 -h, --help      show this help message and exit
 --data-point DATA_POINT, -p DATA_POINT
            Datapoints (min 3 required). You have to click on them later. At least 3 points
            are recommended. e.g -p 0,0 -p 10,0 -p 0,1 Make sure that point are comma
            separated without any space.
 --location LOCATION, -l LOCATION
            Location of a points on figure in pixels (integer). These values should appear in
            the same order as -p option. If not given, you will be asked to click on the
            figure.
 --plot PLOT      Plot the final result. Requires matplotlib.
 --output OUTPUT, -o OUTPUT
            Name of the output file else trajectory will be written to <INPUT>.traj.csv
 --preprocess     Preprocess the image. Useful with bad resolution images.
 --debug        Enable debug logger

執行指令與輸出圖片

先把需要摳取數據的圖片放到當前目錄下,然後運行如下指令:

plotdigitizer ./test1.png -p 0,-1 -p 20,0 -p 0,0.1 --plot output.png

該指令會將test1.png中的數據提取出來,可以使用-o存儲為csv格式的數據表格。這裡實際使用中我們發現,即使不用plot指令,也會在Manjaro Linux系統下不斷的輸出打印圖片,隻有通過kill -9的方式才能強行將進程殺死,有可能是開源庫中存在的某個bug。這裡展示一下用新的數據繪制出來的效果圖:

執行結束後,該圖片會被輸出到臨時文件夾tmp/plotdigitizer/下,但是註意前面產生的圖片會被後來的臨時文件所覆蓋。

總結概要

這裡我們僅僅是介紹和演示瞭plotdigitizer的基本使用方法,這樣一個使用python制作的圖像數據工具更加符合pythoner的使用習慣和邏輯。雖然實際使用過程中工具可能出現各種各樣的問題,但是基本上是一個比較好的工具,值得推薦。

版權聲明

本文首發鏈接為:https://www.cnblogs.com/dechinphy/p/plotdigitizer.html
作者ID:DechinPhy
更多原著文章請參考:https://www.cnblogs.com/dechinphy/

到此這篇關於使用python模塊plotdigitizer摳取論文圖片中的數據的文章就介紹到這瞭,更多相關python模塊plotdigitizer內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!

推薦閱讀: