python Dataframe 合並與去重詳情

Posted on 2022-08-09 by WalkonNet

1.合並

1.1 結構合並

將兩個結構相同的數據合並

1.1.1 concat函數

函數配置：

concat([dataFrame1, dataFrame2,…], index_ingore=False)

參數說明：index_ingore=False（表示合並的索引不延續），index_ingore=True（表示合並的索引可延續）

實例：

import pandas as pd
import numpy as np

# 創建一個十行兩列的二維數據
df = pd.DataFrame(np.random.randint(0, 10, (3, 2)), columns=['A', 'B'])

# 將數據拆分成兩份，並保存在列表中
data_list = [df[0:2], df[3:]]

# 索引值不延續 
df1 = pd.concat(data_list, ignore_index=False)

# 索引值延續
df2 = pd.concat(data_list, ignore_index=True)

返回結果：

—————-df————————–
A B
0 7 8
1 7 3
2 5 9
3 4 0
4 1 8
—————-df1————————–
A B
0 7 8
1 7 3
3 4 0# ————–>這裡並沒有2出現，索引不連續
4 1 8
—————-df2————————–
A B
0 7 8
1 7 3
2 4 0
3 1 8

1.1.2 append函數

函數配置：

df.append(df1, index_ignore=True)

參數說明：index_ingore=False（表示索引不延續），index_ingore=True（表示索引延續）

實例：

import pandas as pd
import numpy as np

# 創建一個五行兩列的二維數組
df = pd.DataFrame(np.random.randint(0, 10, (5, 2)), columns=['A', 'B'])

# 創建要追加的數據
narry = np.random.randint(0, 10, (3, 2))
data_list = pd.DataFrame(narry, columns=['A', 'B'])

# 合並數據
df1 = df.append(data_list, ignore_index=True)

返回結果：

—————-df————————–
A B
0 5 6
1 1 2
2 5 3
3 1 8
4 1 2
—————-df1————————–
A B
0 5 6
1 1 2
2 5 3
3 1 8
4 1 2
5 8 1
6 3 5
7 1 1

1.2 字段合並

將同一個數據不同列合並

參數配置：

pd.merge( left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, )

參數說明：

參數	說明
how	連接方式：inner、left、right、outer，默認為 inner
on	用於連接的列名
left_on	左表用於連接的列名
right_on	右表用於連接的列名
Left_index	是否使用左表的行索引作為連接鍵，默認為False
Right_index	是否使用右表的行索引作為連接鍵，默認為False
sort	默認為False，將合並的數據進行排序
copy	默認為True。總是將數據復制到數據結構中，設置為False可以提高性能
suffixes	存在相同列名時在列名後面添加的後綴，默認為（’_x’, ‘_y’）
indicator	顯示合並數據中數據來自哪個表

實例1：

import pandas as pd
 
df1 = pd.DataFrame({'key':['a','b','c'], 'data1':range(3)})
df2 = pd.DataFrame({'key':['a','b','c'], 'data2':range(3)})
df = pd.merge(df1, df2) # 合並時默認以重復列並作為合並依據

結果展示：

—————-df1————————–
key data1
0 a 0
1 b 1
2 c 2
—————-df2————————–
key data2
0 a 0
1 b 1
2 c 2
—————-df—————————
key data1 data2
0 a 0 0
1 b 1 1
2 c 2 2

實例2：

# 多鍵連接時將連接鍵組成列表傳入
 
right=DataFrame({'key1':['foo','foo','bar','bar'],  
         'key2':['one','one','one','two'],  
         'lval':[4,5,6,7]})  
 
left=DataFrame({'key1':['foo','foo','bar'],  
         'key2':['one','two','one'],  
         'lval':[1,2,3]})  
  
pd.merge(left,right,on=['key1','key2'],how='outer')

結果展示：

—————-right————————-
key1 key2 lval
0 foo one 4
1 foo one 5
2 bar one 6
3 bar two 7
—————-left————————–
key1 key2 lval
0 foo one 1
1 foo two 2
2 bar one 3
—————-df—————————
key1 key2 lval_x lval_y
0 foo one 1.0 4.0
1 foo one 1.0 5.0
2 foo two 2.0 NaN
3 bar one 3.0 6.0
4 bar two NaN 7.0

2.去重

參數配置：

data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)

參數說明：

參數	說明
subset	列名，可選，默認為None
keep	{‘first’, ‘last’, False}, 默認值 ‘first’
first	保留第一次出現的重復行，刪除後面的重復行
last	刪除重復項，除瞭最後一次出現
False	刪除所有重復項
inplace	佈爾值，默認為False，是否直接在原數據上刪除重復項或刪除重復項後返回副本。（inplace=True表示直接在原來的DataFrame上刪除重復項，而默認值False表示生成一個副本。）

實例：

去除完全重復的行數據

data.drop_duplicates(inplace=True)

df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})

df.drop_duplicates()

結果展示：

—————去重前的df—————————
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
—————去重後的df—————————
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

使用subset 去除某幾列重復的行數據

data.drop_duplicates(subset=[‘A’,‘B’],keep=‘first’,inplace=True)

df.drop_duplicates(subset=['brand'])

結果展示：

brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5

使用 keep刪除重復項並保留最後一次出現

df.drop_duplicates(subset=['brand', 'style'], keep='last')

結果展示：

brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0

到此這篇關於python Dataframe 合並與去重詳情的文章就介紹到這瞭,更多相關python Dataframe內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet！

python Dataframe 合並與去重詳情

目錄

1.合並

1.1 結構合並

1.1.1 concat函數

1.1.2 append函數

1.2 字段合並

2.去重

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

1.合並

1.1 結構合並

1.1.1 concat函數

1.1.2 append函數

1.2 字段合並

2.去重

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆