用python爬取今日說法每期數據
實驗目的
主要是獲取2021年今日說法每期節目主要內容及時間
今日說法的網址為:http://tv.cctv.com/lm/jrsf/index.shtml
當時怎麼寫的思路有點不太記得瞭,先把代碼貼上,後續有時間再補上。
代碼
import xlwt import re import requests # url = "https://tv.cctv.com/lm/jrsf/index.shtml" def get_data(page): url = 'https://api.cntv.cn/NewVideo/getVideoListByColumn?id=TOPC145146466500891' \ '4&n=1000&sort=desc&p={pageNo}&mode=0&serviceId=tvcctv&cb=Callback'.format(pageNo=page) headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' } response = requests.get(url=url, headers=headers) return response.text # print(response.text) if __name__ == "__main__": headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' } book = xlwt.Workbook(encoding='utf-8', style_compression=0) sheet = book.add_sheet('今日說法', cell_overwrite_ok=True) count = 0 for page in range(1,5): page_content = get_data(page) obj = re.compile(r'url":"(.*?.shtml)"', re.S) imgUrl = re.findall(obj, page_content.replace('\\', '')) for i in range(len(imgUrl)): resp = requests.get(url=imgUrl[i], headers=headers) resp.encoding = 'utf-8' obj2 = re.compile(r'更新時間:</em>(.*?)</p>', re.S) time = re.findall(obj2, resp.text) obj3 = re.compile(r'視頻簡介:</em>(.*?)</p>', re.S) jianjie = re.findall(obj3, resp.text) content = [] content.append(time) content.append(jianjie) for j in range(2): sheet.write(count, j, content[j]) count+=1 book.save("./data_5.xls")
實驗結果
總結
到此這篇關於用python爬取今日說法每期數據的文章就介紹到這瞭,更多相關python爬取今日說法內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!
推薦閱讀:
- Python網絡爬蟲之獲取網絡數據
- Python7個爬蟲小案例詳解(附源碼)上篇
- python爬蟲破解字體加密案例詳解
- Python基於百度AI實現抓取表情包
- Python爬蟲實戰之虎牙視頻爬取附源碼