无法分析commo中的某些文本

from bs4 import BeautifulSoup, Comment content=""" <a href="https://extratorrent.ag/"><p>Hi there!!</p></a> <a href="https://thepiratebay.se/"><p>Hi again!!</p></a> """ soup = BeautifulSoup(content, 'lxml') for comment in soup.find_all(string=lambda text:isinstance(text,Comment)): data = BeautifulSoup(comment.next_element,"lxml") for item in data.select("p"): print(item.text)

Traceback (most recent call last): File "C:\AppData\Local\Programs\Python\Python35-32\Social.py", line 9, in <module> data = BeautifulSoup(comment.next_element,"lxml") File "C:\AppData\Local\Programs\Python\Python35-32\lib\site-packages\bs4\__init__.py", line 191, in __init__ markup = markup.read() TypeError: 'NoneType' object is not callable

1条回答

网友

1楼 · 发布于 2024-04-24 08:11:35

切换到html.parser，然后只访问内部的p标记。你知道吗

html.parser的优点是它不会在soup数据周围添加额外的<html><body>...</body></html>标记。然后可以使用comment.next_element.p.text访问p标记的内容。你知道吗

soup = BeautifulSoup(content, 'html.parser')
for comment in soup.find_all(string=lambda text: isinstance(text, Comment)):
    print(comment.next_element.p.text)

Hi there!!
Hi again!!

相关问题更多 >

编程相关推荐

热门问题

热门文章