Python數據分析之NumPy常用函數使用詳解

Posted on 2022-05-23 by WalkonNet

文件讀入

讀寫文件是數據分析的一項基本技能

CSV（Comma-Separated Value，逗號分隔值）格式是一種常見的文件格式。通常，數據庫的轉存文件就是CSV格式的，文件中的各個字段對應於數據庫表中的列。

NumPy中的 loadtxt 函數可以方便地讀取CSV文件，自動切分字段，並將數據載入NumPy數組。

1、保存或創建新文件

import numpy as np

i = np.eye(3) #eye(n)函數創建n維單位矩陣
print(i)
np.savetxt('test.txt', i) #savetxt()創建並保存test.txt文件

savetxt()函數，如果有已經文件則更新，如目錄中沒有，則創建並保存test.txt文件

運行結果如下：

[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]

2、讀取csv文件的函數loadtxt

1）先在保存程序的目錄下創建一個名稱為data.csv的文件，並設置數據如下圖：

2）讀取文件，如下：

c,v=np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)

usecols 的參數是一個元組，以獲取第7字段至第8字段的數據，也就是上述文件中股票的收盤價和成交量數據。 unpack 參數設置為 True ，是分拆存儲不同列的數據，即分別將收盤價和成交量的數組賦值給變量c和v。

3、常見的函數

成交量加權平均、時間加權、算術平均值、中位數、方差等

import numpy as np

i = np.eye(3) #eye(n)函數創建n維單位矩陣
print(i)
np.savetxt('test.txt', i) #savetxt創建並保存test.txt文件

#讀取csv文件
c,v=np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)
"""usecols 的參數為一個元組，以獲取第7字段至第8字段的數據，也就是股票的收盤價和成交量數據。 unpack 參數設置為 True ，是分拆存儲不同列的數據，即分別將收
盤價和成交量的數組賦值給變量c和v"""
vwap = np.average(c, weights=v)  #調用瞭average函數,將v作為權重參數使用，
print(vwap)
print('\n')
print( np.mean(c)) #算術平均值
print('\n')
t = np.arange(len(c))
print( t )
print('\n')
twap =np.average(c, weights=t) #按時間權重
print( twap )
print('\n')
h,l=np.loadtxt('data.csv',delimiter=',', usecols=(4,5), unpack=True)
# 獲取第4字段至第5字段的數據，即股票的最高價和最低價

print ( np.max(h)) #獲取最大值max()
print ( np.min(l)) #獲取最小值min()
print('\n')
print( np.ptp(h) ) # 用ptp()函數計算瞭極差，即最大值和最小值之間的差值
print( np.ptp(l) )
print('\n')
print( np.median(c)) # 中位數median()函數，即多個數據中，處於中間的數
print( np.msort(c))#msort(( ))函數對價格數組進行排序,可以驗證上述中位數
#方差的計算
variance = np.var(c) #方差函數var()
print(variance)

用代碼、excel進行相關計算，運行結果如下：

為後面計算，將data.csv中的數據多增加幾行，修改如下並保存(為後面日期讀寫與修改，日期形式修改成如下)：

603112,2022-4-1,,13.56,13.97,13.55,13.87,3750000603112,2022-4-2,,13.75,14.25,13.69,14.03,4003500603112,2022-4-3,,13.69,14.11,13.61,13.95,3956500603112,2022-4-4,,14.3,14.3,13.73,13.89,4250000603112,2022-4-5,,14.1,14.5,13.93,14,4013500603112,2022-4-6,,14.5,15.4,14.35,15.4,9056500603112,2022-4-7,,16,16.94,15.85,16.94,3750000

4、股票的收益率等

股市中最常見的就是漲幅，也就是今日收盤價相對昨日漲跌的比例，即（今日收盤價-昨天收盤價）/昨日收盤價*100，numpy中的 diff() 函數可以返回一個由相鄰數組元素的差值構成的數組，由於相鄰數據相減，因此diff()數組數據較原數組少一個。

如上述修改後，有7天的收盤價，diff()計算出的結果就隻有6位，

import numpy as np

#讀取csv文件
c,v=np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)

#股票的簡單收益率
# diff 函數可以返回一個由相鄰數組元素的差值構成的數組
results = np.diff(c)
print(results)
print('\n')
results1 = np.diff(c)/c[:-1]*100  #相對前一天的漲幅
print(results1)
print('\n')
Standard_deviation =np.std(results) # 計算出標準差
print(Standard_deviation)

運行結果，代碼、excel進行相比較：

5、對數收益與波動率

1）對數收益：log 函數得到每一個收盤價的對數，再對結果使用 diff 函數即可，

logreturns = np.diff( np.log(c) )
print(logreturns)

運行結果：

[ 0.01146966 -0.00571839 -0.00431035 0.00788817 0.09531018 0.09531018]

2) where的作用

where 函數可以根據指定的條件返回所有滿足條件的序列索引值，比如上述logreturns中有兩個小於0的數據。

posretindices = np.where(results1 > 0) 
print('Indices with positive returns1',posretindices)

運行結果：

Indices with positive returns1 (array([0, 3, 4, 5], dtype=int64),)

3）波動率：波動率=對數收益率的標準差除以其均值，再除以交易周期倒數的平方根。下面代碼分別為以年、月進行統計的波動率.

annual_volatility =(np.std(logreturns)/np.mean(logreturns))/np.sqrt(1./252.)#使用浮點數才能得到正確的結果
print ( annual_volatility )
#月波動率
month_volatility =(np.std(logreturns)/np.mean(logreturns))/np.sqrt(1./12.)
print ( month_volatility )

6、日期分析

處理日期總是很煩瑣。NumPy是面向浮點數運算的，因此需要對日期做一些專門的處理。

通過上述代碼，我們知道，修改函數np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)中的參數 usecols=(6,7)就可以讀取不同的列，日期是在第2列，即下標應該為1（數列下標是從0開始的），可以重新定義新日期數列並獲取後存入。

代碼如下：

dates, c=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), unpack=True) #讀取下標為1、6的數據，分別存入到dates和c數列中。

但實際運行過程中會報錯，

代碼需要作如下修改：

import numpy as np
from datetime import datetime

def datestr2num(s): #定義一個函數
    return datetime.strptime(s.decode('ascii'),"%Y-%m-%d").date().weekday()  
#decode('ascii') 將字符串s轉化為ascii碼

#讀取csv文件
dates,close=np.loadtxt('data.csv',delimiter=',', usecols=(1,6),converters={1:datestr2num},unpack=True)
print(dates)

運行結果：[4. 5. 6. 0. 1. 2. 3.]，也是從0開始，到6結束。為瞭更好地說明數據，可以采用真實的數據，即從通信達軟件直接下載真實的交易數據，如下圖所示：

（註意：較原來少瞭一列空格列）

修改代碼如下：

import numpy as np
from datetime import datetime

def datestr2num(s): #定義一個函數
    return datetime.strptime(s.decode('ascii'),"%Y-%m-%d").date().weekday()  
#decode('ascii') 將字符串s轉化為ascii碼

#讀取csv文件
dates,c=np.loadtxt('data.csv',delimiter=',', usecols=(1,5),
                       converters={1:datestr2num},unpack=True)
print(dates)

print(len(dates)) #統計導出的天數

運行結果：

如上圖，導出有420天數據。

按周一到周五，統計相關數據：

averages = np.zeros(5) #創建包含5個元素的數組,保存交易日收盤價，0-4分別代表周一到周五五個交易日
for i in range(5):  #遍歷0到4的日期標識
    indices =np.where(dates==i)   #where函數得到各工作日的索引值並存儲在 indices 數組
    prices=np.take(c,indices)   #take函數獲取各個工作日的收盤價。
    avg= np.mean(prices) #每個工作日計算出平均值存放在 averages 數組
    averages[i] = avg  #每個工作日計算出平均值存放在 averages 數組
    print('day', i)
    #print('prices', prices)
    print("Average", avg)

print(averages)

當然除瞭上述外，還可以求得420天裡的最大值、最小值以及交易日平均值中最大值、最小值等，對代碼進行如下修：

import numpy as np
from datetime import datetime

def datestr2num(s): #定義一個函數
    return datetime.strptime(s.decode('ascii'),"%Y-%m-%d").date().weekday()  
#decode('ascii') 將字符串s轉化為ascii碼

#讀取csv文件
dates,c=np.loadtxt('data.csv',delimiter=',', usecols=(1,5),
                       converters={1:datestr2num},unpack=True)

averages = np.zeros(5) #創建包含5個元素的數組,保存交易日收盤價，0-4分別代表周一到周五五個交易日
for i in range(5):  #遍歷0到4的日期標識
    indices =np.where(dates==i)   #where函數得到各工作日的索引值並存儲在 indices 數組
    prices=np.take(c,indices)   #take函數獲取各個工作日的收盤價。
    avg= np.mean(prices) 
    averages[i] = avg  #每個工作日計算出平均值存放在 averages 數組,共有5個數值
    print('day', i)
    #print('prices', prices)
    print("Average", avg)

print(averages)
print('\n')

print('the top close price:',np.max(c)) #最高收盤價
print('the low close price:',np.min(c)) #最低收盤價
print('\n')

top = np.max(averages)  #找出averages數列中的最大值
print ("Highest average", top)
print ("Top day of the week", np.argmax(averages)) #argmax函數返回的是averages數組中最大元素的索引值
print('\n')

bottom = np.min(averages) #找出averages數列中的最小值
print ("Lowest average", bottom)
print ( "Bottom day of the week", np.argmin(averages))#argmin函數返回的是averages數組中最小元素的索引值

運行結果如下：

總結

本篇初步導入瞭真實的股票交易信息，並利用numpy常見函數對進行瞭初步的計算，列舉瞭下列常用函數：

loadtxt() 函數可以方便地讀取CSV文件，自動切分字段，並將數據載入NumPy數組。

savetxt()創建並保存test.txt文件

np.loadtxt('data.csv', delimiter=',', usecols=(6,7),)usecols參數用來選擇讀取的數列

np.average(c, weights=v) 加權平均,將v作為權重參數使用，

np.mean(c)) #算術平均值

np.max(h)) #獲取最大值max()

np.min(l)) #獲取最小值min()

np.ptp(h) ) 用ptp()函數計算瞭極值差，

np.median(c)) 中位數median()函數，即多個數據中，處於中間的數

np.msort(c))函數對價格數組進行排序,

np.var(c) 方差函數var()

np.diff(c) 函數可以返回一個由相鄰數組元素的差值構成的數組

np.std(results) # 標準差

np.diff( np.log(c) )

np.where(results1 > 0) 選擇

np.sqrt()#平方根sqrt(),浮點數

s.decode('ascii') 將字符串s轉化為ascii碼

np.take(c,indices) #take函數獲取各個工作日的收盤價。

np.argmax(averages)) #argmax函數返回數組中最大元素的索引值

np.argmin(averages))#argmin函數返回數組中最小元素的索引值

以上就是Python數據分析之NumPy常用函數使用詳解的詳細內容，更多關於Python NumPy常用函數的資料請關註WalkonNet其它相關文章！

Python數據分析之NumPy常用函數使用詳解

目錄

文件讀入

1、保存或創建新文件

2、讀取csv文件的函數loadtxt

3、常見的函數

4、股票的收益率等

5、對數收益與波動率

6、日期分析

總結

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

文件讀入

1、保存或創建新文件

2、讀取csv文件的函數loadtxt

3、常見的函數

4、股票的收益率等

5、對數收益與波動率

6、日期分析

總結

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆