Python爬取肯德基官網ajax的post請求實現過程

Posted on 2021-10-13 by WalkonNet

準備工作

查看肯德基官網的請求方法：post請求。

在這裡插入圖片描述

X-Requested-With: XMLHttpRequest 判斷得肯德基官網是ajax請求

在這裡插入圖片描述

通過這兩個準備步驟，明確本次爬蟲目標：
ajax的post請求肯德基官網獲取上海肯德基地點前10頁。

分析

獲取上海肯德基地點前10頁，那就需要先對每頁的url進行分析。

第一頁

# page1
# http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname
# POST
# cname: 上海
# pid:
# pageIndex: 1
# pageSize: 10

第二頁

# page2
# http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname
# POST
# cname: 上海
# pid:
# pageIndex: 2
# pageSize: 10

第三頁依次類推。

程序入口

首先回顧urllib爬取的基本操作：

# 使用urllib獲取百度首頁的源碼
import urllib.request

# 1.定義一個url，就是你要訪問的地址
url = 'http://www.baidu.com'

# 2.模擬瀏覽器向服務器發送請求 response響應
response = urllib.request.urlopen(url)

# 3.獲取響應中的頁面的源碼 content內容
# read方法 返回的是字節形式的二進制數據
# 將二進制數據轉換為字符串
# 二進制-->字符串  解碼 decode方法
content = response.read().decode('utf-8')

# 4.打印數據
print(content)

1.定義一個url，就是你要訪問的地址

2.模擬瀏覽器向服務器發送請求 response響應

3.獲取響應中的頁面的源碼 content內容

if __name__ == '__main__':
    start_page = int(input('請輸入起始頁碼: '))
    end_page = int(input('請輸入結束頁碼: '))

    for page in range(start_page, end_page+1):
        # 請求對象的定制
        request = create_request(page)
        # 獲取網頁源碼
        content = get_content(request)
        # 下載數據
        down_load(page, content)

對應的，我們在主函數中也類似聲明方法。

url組成數據定位

請添加圖片描述

爬蟲的關鍵在於找接口。對於這個案例，在預覽頁可以找到頁面對應的json數據，說明這是我們要的數據。

請添加圖片描述

構造url

不難發現，肯德基官網的url的一個共同點，我們把它保存為base_url。

base_url = ‘http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname’

參數

老樣子，找規律，隻有’pageIndex’和頁碼有關。

    data = {
        'cname': '上海',
        'pid': '',
        'pageIndex': page,
        'pageSize': '10'
    }

post請求

post請求的參數必須要進行編碼

data = urllib.parse.urlencode(data).encode(‘utf-8’)

編碼之後必須調用encode方法
參數放在請求對象定制的方法中：post的請求的參數，是不會拼接在url後面的，而是放在請求對象定制的參數中

所以將data進行編碼

data = urllib.parse.urlencode(data).encode('utf-8')

標頭獲取（防止反爬的一種手段）

請添加圖片描述

即響應頭中UA部分。

User Agent，用戶代理，特殊字符串頭，使得服務器能夠識別客戶使用的操作系統及版本，CPU類型，瀏覽器及版本，瀏覽器內核，瀏覽器渲染引擎，瀏覽器語言，瀏覽器插件等。

 headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Edg/94.0.992.38'
    }

請求對象定制

參數，base_url，請求頭都準備得當後，就可以進行請求對象定制瞭。

 request = urllib.request.Request(base_url,
  headers=headers, data=data)

獲取網頁源碼

把request請求作為參數，模擬瀏覽器向服務器發送請求獲得response響應。

 response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')

獲取響應中的頁面的源碼，下載數據

使用 read()方法，得到字節形式的二進制數據，需要使用 decode進行解碼，轉換為字符串。

content = response.read().decode('utf-8')

然後我們將下載得到的數據寫進文件，使用 with open() as fp 的語法，系統自動關閉文件。

def down_load(page, content):
    with open('kfc_' + str(page) + '.json', 'w', encoding='utf-8') as fp:
        fp.write(content)

全部代碼

# ajax的post請求肯德基官網 獲取上海肯德基地點前10頁
# page1
# http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname
# POST
# cname: 上海
# pid:
# pageIndex: 1
# pageSize: 10
# page2
# http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname
# POST
# cname: 上海
# pid:
# pageIndex: 2
# pageSize: 10
import urllib.request, urllib.parse
def create_request(page):
    base_url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname'
    data = {
        'cname': '上海',
        'pid': '',
        'pageIndex': page,
        'pageSize': '10'
    }
    data = urllib.parse.urlencode(data).encode('utf-8')
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Edg/94.0.992.38'
    }
    request = urllib.request.Request(base_url, headers=headers, data=data)
    return request

def get_content(request):
    response = urllib.request.urlopen(request)
    content = response.read().decode('utf-8')
    return content

def down_load(page, content):
    with open('kfc_' + str(page) + '.json', 'w', encoding='utf-8') as fp:
        fp.write(content)

if __name__ == '__main__':
    start_page = int(input('請輸入起始頁碼: '))
    end_page = int(input('請輸入結束頁碼: '))
    for page in range(start_page, end_page+1):
        # 請求對象的定制
        request = create_request(page)
        # 獲取網頁源碼
        content = get_content(request)
        # 下載數據
        down_load(page, content)

爬取後結果

在這裡插入圖片描述

以上就是Python爬取肯德基官網ajax的post請求實現過程的詳細內容，更多關於Python爬取肯德基官網ajax的post請求的資料請關註WalkonNet其它相關文章！

Python爬取肯德基官網ajax的post請求實現過程

目錄

準備工作

分析

程序入口

url組成數據定位

構造url

參數

post請求

標頭獲取（防止反爬的一種手段）

請求對象定制

獲取網頁源碼

獲取響應中的頁面的源碼，下載數據

全部代碼

爬取後結果

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

準備工作

分析

程序入口

url組成數據定位

構造url

參數

post請求

標頭獲取（防止反爬的一種手段）

請求對象定制

獲取網頁源碼

獲取響應中的頁面的源碼，下載數據

全部代碼

爬取後結果

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆