如何在web抓取html页面时处理非类型对象

2024-05-14 21:08:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从rottentomatoes上刮下一页

我试图找出不同电影的所有导演的名字。到目前为止,我的代码运行良好。然而,网页上有一部名为WORLD ON A WIRE的电影。这部电影缺少导演的名字。现在,当我运行代码时,它会给出类似NoneType object is not iterable的错误。现在,在抓取网页时,如何处理空字段

我的代码:

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url= 'https://editorial.rottentomatoes.com/guide/best-sci-fi-movies-of-all-time/'
r = requests.get(url, headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content)
director = []
for d in soup.find_all('div', attrs={'class': 'info director'}):
    for a in d.find('a'):
        director.append(a)
        print(a)

enter image description here


Tags: 代码url网页电影applicationxmlcontentall
1条回答
网友
1楼 · 发布于 2024-05-14 21:08:12

代码中的d.find('a')未返回iterable对象,这将导致python中出现错误。您应该使用find_all('a')而不是find('a')

您的代码应该如下所示:

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url= 'https://editorial.rottentomatoes.com/guide/best-sci-fi-movies-of-all-time/'
r = requests.get(url, headers=headers)#, proxies=proxies)
content = r.content
soup = BeautifulSoup(content)
director = []
for d in soup.find_all('div', attrs={'class': 'info director'}):
    for a in d.find_all('a'):
        director.append(a.string)
        print(a.string)

相关问题 更多 >

    热门问题