TypeError: 使用BeautifulSoup时调用split操作的'NoneType'对象错误

3 投票

3 回答

8458 浏览

提问于 2025-04-17 19:05

今天我在玩BeautifulSoup和Requests这两个工具，所以我想写一个简单的爬虫，能够跟踪网页上的链接，深度为2（如果这样说能让你明白的话）。我抓取的网页里的所有链接都是相对链接。（比如说：<a href="/free-man-aman-sethi/books/9788184001341.htm" title="A Free Man">）为了把这些相对链接变成绝对链接，我想用urljoin把网页的地址和相对链接连接起来。

为了做到这一点，我首先需要从<a>标签中提取出href的值，为此我想用split来处理：

#!/bin/python
#crawl.py
import requests
from bs4 import BeautifulSoup
from urlparse import urljoin

html_source=requests.get("http://www.flipkart.com/books")
soup=BeautifulSoup(html_source.content)
links=soup.find_all("a")
temp=links[0].split('"')

但是这给我带来了以下错误：

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    temp=links[0].split('"')
TypeError: 'NoneType' object is not callable

在没有仔细阅读文档之前就开始动手，我意识到这可能不是实现我目标的最佳方法，但为什么会出现TypeError这个错误呢？

error handling web scraping beautifulsoup typeerror requests relative links absolute links href extraction

3 个回答

我刚遇到了同样的错误，所以四年后分享一下经验：如果你需要把 soup 元素分开，你也可以在分开之前先用 str() 把它转换成字符串。在你的情况下，可以这样做：

    temp = str(links).split('"')

回答于 2025-04-17 由 Python大师

分享举报

因为Tag这个类使用了一种叫做代理的方式来访问属性（正如Pavel所提到的，这种方式是为了尽可能访问子元素），所以如果找不到对应的属性，就会返回一个默认值None。

复杂的例子：

>>> print soup.find_all('a')[0].bob
None
>>> print soup.find_all('a')[0].foobar
None
>>> print soup.find_all('a')[0].split
None

你需要使用：

soup.find_all('a')[0].get('href')

其中：

>>> print soup.find_all('a')[0].get
<bound method Tag.get of <a href="test"></a>>

回答于 2025-04-17 由 Python大师

分享举报

links[0] 不是一个字符串，它是一个 bs4.element.Tag 对象。当你试图在它上面使用 split 方法时，它会去找一个叫 split 的子元素，但实际上并没有这个子元素。所以你得到的结果是 None。

In [10]: l = links[0]

In [11]: type(l)
Out[11]: bs4.element.Tag

In [17]: print l.split
None

In [18]: None()   # :)

TypeError: 'NoneType' object is not callable

可以用索引来查找 HTML 属性：

In [21]: links[0]['href']
Out[21]: '/?ref=1591d2c3-5613-4592-a245-ca34cbd29008&_pop=brdcrumb'

或者使用 get 方法，这样可以避免找不到属性的情况：

In [24]: links[0].get('href')
Out[24]: '/?ref=1591d2c3-5613-4592-a245-ca34cbd29008&_pop=brdcrumb'


In [26]: print links[0].get('wharrgarbl')
None

In [27]: print links[0]['wharrgarbl']

KeyError: 'wharrgarbl'

回答于 2025-04-17 由 Python大师

分享举报

TypeError: 使用BeautifulSoup时调用split操作的'NoneType'对象错误

3 个回答

撰写回答