为什么会这样检索'返回'NoneType'？

2024-06-08 13:50:12 发布

男 | 程序猿一只，喜欢编程写python代码。

我想制作一个网络爬虫来从website下载HTML，但是我对re模型不太了解，因此陷入了困境。你知道吗

import urllib2
def download(url):
    print("Downloading: " + url)
    try:
        html = urllib2.urlopen(url).read()
    except urllib2.URLError as e:
        print("Download error: ", e.reason)
        html = None
    return html

FIELD = ('area', 'population', 'iso', 'country', 'capital', 'continent', 'tld', 'currency_code', 'currency_name', 'phone',
    'postal_code_format', 'postal_code_regex', 'languages', 'neighhbours')

import re
def re_scraper(html):
    results = {}
    for field in FIELD:
        results[field] = re.search('<tr id="places_%s__row">.*?<td class="w2p_fw">(.*?)</td>' % field, html).group()
    return results

import time
NUM_ITERATIONS = 1000
html = download("http://example.webscraping.com/view/Afghanistan-1")
for name, scraper in [('Regular expressions', re_scraper), ('BeautifulSoup', bs_scraper), ('Lxml', lxml_scraper)]:
    start = time.time()
    for i in range(NUM_ITERATIONS):
        if scraper == re_scraper:
            re.purge()
        result = scraper(html)
        assert (result['area'] == '647,500 square kilometres')
    end = time.time()
print('%s: %.2f seconds' % (name, end - start))

错误消息：

File "E:/���/Projects/new.py", line 20, in re_scraper
    results[field] = re.search('<tr id="places_%s__row">.*?<td class="w2p_fw">(.*?)</td>' % field, html).group()
AttributeError: 'NoneType' object has no attribute 'group'

HTML是：

<tr id="places_area__row"><td class="w2p_fl"><label for="places_area" id="places_area__label">Area: </label></td><td class="w2p_fw">647,500 square kilometres</td>

我已经测试了代码，找到HTML和regex是没有问题的。问题可能出在field或FIELD。我想他们的类型可能会导致这个错误，但我如何才能修复它？你知道吗

Tags： in re id field for time html area

0条回答

目前没有回答

为什么会这样检索'返回'NoneType'？

相关问题更多 >

编程相关推荐

热门问题

热门文章

为什么会这样检索'返回'NoneType'？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >