python出現RuntimeError錯誤問題及解決

下面是出現的錯誤解釋

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
 
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
 
            if __name__ == '__main__':
                freeze_support()
                …
 
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

下面是出現錯誤代碼的原代碼

import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re
 
base_url = "https://morvanzhou.github.io/"
 
#crawl爬取網頁
def crawl(url):
    response = urlopen(url)
    time.sleep(0.1)
    return response.read().decode()
 
#parse解析網頁
def parse(html):
    soup = BeautifulSoup(html,'html.parser')
    urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
    title = soup.find('h1').get_text().strip()
    page_urls = set([urljoin(base_url,url['href'])for url in urls])
    url = soup.find('meta',{'property':"og:url"})['content']
    return title,page_urls,url
 
unseen = set([base_url])
seen = set()
restricted_crawl = True
 
pool = mp.Pool(4)
count, t1 = 1, time.time()
while len(unseen) != 0:                 # still get some url to visit
    if restricted_crawl and len(seen) > 20:
            break
    print('\nDistributed Crawling...')
    crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
    htmls = [j.get() for j in crawl_jobs]      # request connection
 
    print('\nDistributed Parsing...')
    parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
    results = [j.get() for j in parse_jobs]    # parse html
 
    print('\nAnalysing...')
    seen.update(unseen)         # seen the crawled
    unseen.clear()              # nothing unseen
 
    for title, page_urls, url in results:
        print(count, title, url)
        count += 1
        unseen.update(page_urls - seen)     # get new url to crawl
print('Total time: %.1f s' % (time.time()-t1))    # 16 s !!!

這是修改後的正確代碼

import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re
 
base_url = "https://morvanzhou.github.io/"
 
#crawl爬取網頁
def crawl(url):
    response = urlopen(url)
    time.sleep(0.1)
    return response.read().decode()
 
#parse解析網頁
def parse(html):
    soup = BeautifulSoup(html,'html.parser')
    urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
    title = soup.find('h1').get_text().strip()
    page_urls = set([urljoin(base_url,url['href'])for url in urls])
    url = soup.find('meta',{'property':"og:url"})['content']
    return title,page_urls,url
 
def main():
    unseen = set([base_url])
    seen = set()
    restricted_crawl = True
 
    pool = mp.Pool(4)
    count, t1 = 1, time.time()
    while len(unseen) != 0:                 # still get some url to visit
        if restricted_crawl and len(seen) > 20:
                break
        print('\nDistributed Crawling...')
        crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
        htmls = [j.get() for j in crawl_jobs]      # request connection
 
        print('\nDistributed Parsing...')
        parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
        results = [j.get() for j in parse_jobs]    # parse html
 
        print('\nAnalysing...')
        seen.update(unseen)         # seen the crawled
        unseen.clear()              # nothing unseen
 
        for title, page_urls, url in results:
            print(count, title, url)
            count += 1
            unseen.update(page_urls - seen)     # get new url to crawl
    print('Total time: %.1f s' % (time.time()-t1))    # 16 s !!!
 
 
if __name__ == '__main__':
    main()

綜上可知,就是把你的運行代碼整合成一個函數,然後加入

if __name__ == '__main__':
    main()

這行代碼即可解決這個問題。

python報錯:RuntimeError

python報錯:RuntimeError:fails to pass a sanity check due to a bug in the windows runtime這種類型的錯誤

這種錯誤原因

1.當前的python與numpy版本之間有什麼問題,比如我自己用的python3.9與numpy1.19.4會導致這種報錯。

2.numpy1.19.4與當前很多python版本都有問題。

解決辦法

在File->Settings->Project:pycharmProjects->Project Interpreter下將numpy版本降下來就好瞭。

1.打開interpreter,如下圖:

第一步

2.雙擊numpy修改其版本:

在這裡插入圖片描述

3.勾選才能修改版本,將需要的低版本導入即可:

第三步

弄完瞭之後,重新運行就好。

以上為個人經驗,希望能給大傢一個參考,也希望大傢多多支持WalkonNet。

推薦閱讀: