BS4 + Python3: 无法爬取树: 'NavigableString'对象没有'has_attr'属性

2 投票

2 回答

1604 浏览

提问于 2025-04-17 23:41

我刚开始学习Python（我只会用powershell），现在想用BS4和Python3学习网页爬虫。

这是我在练习的一个简单例子：

<h1 class="entry-title">
<a href="test1.html">test1</a></h1>
<h1 class="entry-title">
<a href="test2.html" rel="bookmark">test2</a></h1>

我想做的是只获取带有“rel”属性的详细信息（href和.string）。

这是我的代码：

for h1_Tag in soup.find_all(("h1", { "class" : "entry-title" })):
    for a_Tag in h1_Tag.contents:
        if a_Tag.has_attr('rel'):
           print (a_Tag)

但是我遇到了这个错误： AttributeError: 'NavigableString'对象没有'has_attr'这个属性。

我哪里做错了？任何帮助都非常感谢。

谢谢！

数据提取网页抓取 html解析编程学习属性错误网页爬虫 bs4 navigablestring

2 个回答

另一种方法是使用 SoupStrainer。这个工具可以让你根据事先设定的条件来解析文档。这里使用的是 Python 2.7 和 BeautifulSoup 4.3.2，所以逻辑上是相似的。

from bs4 import BeautifulSoup as bsoup, SoupStrainer as strain

ofile = open("test.html")
strain = strain(rel=True)
soup = bsoup(ofile, parse_only=strain)

print soup

结果：

<a href="test2.html" rel="bookmark">test2</a>
[Finished in 0.2s]

如果这对你有帮助，请告诉我们。

回答于 2025-04-17 由 Python大师

分享举报

你正在遍历所有的内容，包括 NavigableString 对象，比如文本。

如果你想找到所有带有 rel 属性的元素，可以直接搜索它们：

for h1_Tag in soup.find_all(("h1", { "class" : "entry-title" })):
    for a_Tag in h1_Tag.find_all('a', rel=True):
       print(a_Tag)

这里的 rel=True 这个参数会限制搜索范围，只找那些有这个属性的元素；没有 rel 属性的 <a> 标签会被跳过。

回答于 2025-04-17 由 Python大师

分享举报

BS4 + Python3: 无法爬取树: 'NavigableString'对象没有'has_attr'属性

2 个回答

撰写回答