python文件數據分析治理提取

Posted on 2022-08-24 by WalkonNet

前提提要

python2.0有無法直接讀取中文路徑的問題，需要另外寫函數。python3.0在2018年的時候也無法直接讀取。

現在使用的時候，發現python3.0是可以直接讀取中文路徑的。

需要自帶或者創建幾個txt文件，裡面最好寫幾個數據（姓名，手機號，住址）

要求

寫代碼的時候最好，自己設幾個要求，明確下目的：

需要讀取對應目錄路徑的所有對應文件
按行讀取出每個對應txt文件的記錄
使用正則表達式獲取每行的手機號
將手機號碼存儲到excel中

思路

1）讀取文件
2）讀取數據
3）數據整理
4）正則表達式匹配
5）數據去重
6）數據導出保存

代碼

import glob
import re
import xlwt
filearray=[]
data=[]
phone=[]
filelocation=glob.glob(r'課堂實訓/*.txt')
print(filelocation)
for i in range(len(filelocation)):
file =open(filelocation[i])
file_data=file.readlines()
data.append(file_data)
print(data)
combine_data=sum(data,[])

print(combine_data)
for a in combine_data:
data1=re.search(r'[0-9]{11}',a)
phone.append(data1[0])
phone=list(set(phone))
print(phone)
print(len(phone))

#存到excel中
f=xlwt.Workbook('encoding=utf-8')
sheet1=f.add_sheet('sheet1',cell_overwrite_ok=True)
for i in range(len(phone)):
sheet1.write(i,0,phone[i])
f.save('phonenumber.xls')

運行結果

會生成一個excel文件

分析

import glob
import re
import xlwt

globe用來定位文件，re正則表達式，xlwt用於excel

1）讀取文件

filelocation=glob.glob(r'課堂實訓/*.txt')

指定目錄下的所有txt文件

2）讀取數據

for i in range(len(filelocation)):
file =open(filelocation[i])
file_data=file.readlines()
data.append(file_data)
print(data)

將路徑下的txt文件循環讀取，按序號依次讀取文件
打開每一次循環對應的文件
將每一次循環的txt文件的數據按行讀取出來
使用append()方法將每一行的數據添加到data列表中
輸出一下，可以看到將幾個txt的文件數據以字列形式存在同一個列表

3）數據整理

combine_data=sum(data,[])

列表合並成一個列表

4）正則表達式匹配外加數據去重

print(combine_data)
for a in combine_data:
data1=re.search(r'[0-9]{11}',a)
phone.append(data1[0])
phone=list(set(phone))
print(phone)
print(len(phone))

set()函數：無序去重，創建一個無序不重復元素集

6）數據導出保存

#存到excel中
f=xlwt.Workbook('encoding=utf-8')
sheet1=f.add_sheet('sheet1',cell_overwrite_ok=True)
for i in range(len(phone)):
sheet1.write(i,0,phone[i])
f.save('phonenumber.xls')

Workbook('encoding=utf-8')：設置工作簿的編碼
add_sheet('sheet1',cell_overwrite_ok=True)：創建對應的工作表
write(x,y,z)：參數對應行、列、值

到此這篇關於python文件數據分析治理提取的文章就介紹到這瞭,更多相關python文件數據分析內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet！

python文件數據分析治理提取

目錄

前提提要

要求

思路

代碼

運行結果

分析

1）讀取文件

2）讀取數據

3）數據整理

4）正則表達式匹配外加數據去重

6）數據導出保存

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

前提提要

要求

思路

代碼

運行結果

分析

1）讀取文件

2）讀取數據

3）數據整理

4）正則表達式匹配外加數據去重

6）數據導出保存

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆