不需要用到正則的Python文本解析庫parse

Posted on 2021-10-25 by WalkonNet

1. 真實案例

拿一個最近使用 parse 的真實案例來舉例說明。

下面是 ovs 一個條流表，現在我需要收集提取一個虛擬機（網口）裡有多少流量、多少包流經瞭這條流表。也就是每個 in_port 對應的 n_bytes、n_packets 的值。

cookie=0x9816da8e872d717d, duration=298506.364s, table=0, n_packets=480, n_bytes=20160, priority=10,ip,in_port="tapbbdf080b-c2" actions=NORMAL

如果是你，你會怎麼做呢？

先以逗號分隔開來，再以等號分隔取出值來？

你不防可以嘗試一下，寫出來的代碼應該和我想象的一樣，沒有一絲美感而言。

我來給你展示一下，我是怎麼做的？

可以看到，我使用瞭一個叫做 parse 的第三方包，是需要自行安裝的

$ python -m pip install parse

從上面這個案例中，你應該能感受到 parse 對於解析規范的字符串，是非常強大的。

2. parse 的結果

parse 的結果隻有兩種結果：

1.沒有匹配上，parse 的值為None

>>> parse("halo", "hello") is None
True
>>>

如果匹配上，parse 的值則為 Result 實例

>>> parse("hello", "hello world")
>>> parse("hello", "hello")
<Result () {}>
>>>

如果你編寫的解析規則，沒有為字段定義字段名，也就是匿名字段， Result 將是一個類似 list 的實例，演示如下：

>>> profile = parse("I am {}, {} years old, {}", "I am Jack, 27 years old, male")
>>> profile
<Result ('Jack', '27', 'male') {}>
>>> profile[0]
'Jack'
>>> profile[1]
'27'
>>> profile[2]
'male'

而如果你編寫的解析規則，為字段定義瞭字段名， Result 將是一個類似字典的實例，演示如下：

>>> profile = parse("I am {name}, {age} years old, {gender}", "I am Jack, 27 years old, male")
>>> profile
<Result () {'gender': 'male', 'age': '27', 'name': 'Jack'}>
>>> profile['name']
'Jack'
>>> profile['age']
'27'
>>> profile['gender']
'male'

3. 重復利用 pattern

和使用 re 一樣，parse 同樣支持 pattern 復用。

>>> from parse import compile
>>> 
>>> pattern = compile("I am {}, {} years old, {}")
>>> pattern.parse("I am Jack, 27 years old, male")
<Result ('Jack', '27', 'male') {}>
>>> 
>>> pattern.parse("I am Tom, 26 years old, male")
<Result ('Tom', '26', 'male') {}>

4. 類型轉化

從上面的例子中，你應該能註意到，parse 在獲取年齡的時候，變成瞭一個"27" ，這是一個字符串，有沒有一種辦法，可以在提取的時候就按照我們的類型進行轉換呢？

你可以這樣寫。

>>> from parse import parse
>>> profile = parse("I am {name}, {age:d} years old, {gender}", "I am Jack, 27 years old, male")
>>> profile
<Result () {'gender': 'male', 'age': 27, 'name': 'Jack'}>
>>> type(profile["age"])
<type 'int'>

除瞭將其轉為整型，還有其他格式嗎？

內置的格式還有很多，比如

匹配時間

>>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
<Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>

更多類型請參考官方文檔：

Type	Characters Matched	Output
l	Letters (ASCII)	str
w	Letters, numbers and underscore	str
W	Not letters, numbers and underscore	str
s	Whitespace	str
S	Non-whitespace	str
d	Digits (effectively integer numbers)	int
D	Non-digit	str
n	Numbers with thousands separators (, or .)	int
%	Percentage (converted to value/100.0)	float
f	Fixed-point numbers	float
F	Decimal numbers	Decimal
e	Floating-point numbers with exponent e.g. 1.1e-10, NAN (all case insensitive)	float
g	General number format (either d, f or e)	float
b	Binary numbers	int
o	Octal numbers	int
x	Hexadecimal numbers (lower and upper case)	int
ti	ISO 8601 format date/time e.g. 1972-01-20T10:21:36Z (“T” and “Z” optional)	datetime
te	RFC2822 e-mail format date/time e.g. Mon, 20 Jan 1972 10:21:36 +1000	datetime
tg	Global (day/month) format date/time e.g. 20/1/1972 10:21:36 AM +1:00	datetime
ta	US (month/day) format date/time e.g. 1/20/1972 10:21:36 PM +10:30	datetime
tc	ctime() format date/time e.g. Sun Sep 16 01:03:52 1973	datetime
th	HTTP log format date/time e.g. 21/Nov/2011:00:07:11 +0000	datetime
ts	Linux system log format date/time e.g. Nov 9 03:37:44	datetime
tt	Time e.g. 10:21:36 PM -5:30	time

5. 提取時去除空格

去除兩邊空格

>>> parse('hello {} , hello python', 'hello     world    , hello python')
<Result ('    world   ',) {}>
>>> 
>>> 
>>> parse('hello {:^} , hello python', 'hello     world    , hello python')
<Result ('world',) {}>

去除左邊空格

>>> parse('hello {:>} , hello python', 'hello     world    , hello python')
<Result ('world   ',) {}>

去除右邊空格

>>> parse('hello {:<} , hello python', 'hello     world    , hello python')
<Result ('    world',) {}>

6. 大小寫敏感開關

Parse 默認是大小寫不敏感的，你寫 hello 和 HELLO 是一樣的。

如果你需要區分大小寫，那可以加個參數，演示如下：

>>> parse('SPAM', 'spam')
<Result () {}>
>>> parse('SPAM', 'spam') is None
False
>>> parse('SPAM', 'spam', case_sensitive=True) is None
True

7. 匹配字符數

精確匹配：指定最大字符數

>>> parse('{:.2}{:.2}', 'hello')  # 字符數不符
>>> 
>>> parse('{:.2}{:.2}', 'hell')   # 字符數相符
<Result ('he', 'll') {}>

模糊匹配：指定最小字符數

>>> parse('{:.2}{:2}', 'hello') 
<Result ('h', 'ello') {}>
>>> 
>>> parse('{:2}{:2}', 'hello') 
<Result ('he', 'llo') {}>

若要在精準/模糊匹配的模式下，再進行格式轉換，可以這樣寫

>>> parse('{:2}{:2}', '1024') 
<Result ('10', '24') {}>
>>> 
>>> 
>>> parse('{:2d}{:2d}', '1024') 
<Result (10, 24) {}>

8. 三個重要屬性

Parse 裡有三個非常重要的屬性

fixed：利用位置提取的匿名字段的元組named：存放有命名的字段的字典spans：存放匹配到字段的位置

下面這段代碼，帶你瞭解他們之間有什麼不同

>>> profile = parse("I am {name}, {age:d} years old, {}", "I am Jack, 27 years old, male")
>>> profile.fixed
('male',)
>>> profile.named
{'age': 27, 'name': 'Jack'}
>>> profile.spans
{0: (25, 29), 'age': (11, 13), 'name': (5, 9)}
>>>

9. 自定義類型的轉換

匹配到的字符串，會做為參數傳入對應的函數

比如我們之前講過的，將字符串轉整型

>>> parse("I am {:d}", "I am 27")
<Result (27,) {}>
>>> type(_[0])
<type 'int'>
>>>

其等價於

>>> def myint(string):
...     return int(string)
... 
>>> 
>>> 
>>> parse("I am {:myint}", "I am 27", dict(myint=myint))
<Result (27,) {}>
>>> type(_[0])
<type 'int'>
>>>

利用它，我們可以定制很多的功能，比如我想把匹配的字符串弄成全大寫

>>> def shouty(string):
...    return string.upper()
...
>>> parse('{:shouty} world', 'hello world', dict(shouty=shouty))
<Result ('HELLO',) {}>
>>>

10 總結一下

parse 庫在字符串解析處理場景中提供的便利，肉眼可見，上手簡單。

在一些簡單的場景中，使用 parse 可比使用 re 去寫正則開發效率不知道高幾個 level，用它寫出來的代碼富有美感，可讀性高，後期維護起代碼來一點壓力也沒有，推薦你使用。

以上就是不需要用到正則的Python文本解析庫parse的詳細內容，更多關於Python文本解析庫parse的資料請關註WalkonNet其它相關文章！

不需要用到正則的Python文本解析庫parse

目錄

1. 真實案例

2. parse 的結果

3. 重復利用 pattern

4. 類型轉化

5. 提取時去除空格

6. 大小寫敏感開關

7. 匹配字符數

8. 三個重要屬性

9. 自定義類型的轉換

10 總結一下

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

1. 真實案例

2. parse 的結果

3. 重復利用 pattern

4. 類型轉化

5. 提取時去除空格

6. 大小寫敏感開關

7. 匹配字符數

8. 三個重要屬性

9. 自定義類型的轉換

10 總結一下

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆