我已经用python
和BeautifulSoup
组合编写了一个脚本来解析网页中的某个地址。然而,当我运行下面的脚本时,当它到达第address = [item.find_next_sibling().get_text(strip=True)
行时,我得到了一个问题AttributeError: 'NavigableString' object has no attribute 'text'
。我可以摆脱这个问题,如果我尝试注释掉的行。不过,我想坚持目前的应用方式。我能做什么?你知道吗
这是我的尝试:
import requests
from bs4 import BeautifulSoup
URL = "https://beta.companieshouse.gov.uk/officers/lX9snXUPL09h7ljtMYLdZU9LmOo/appointments"
def fetch_names(session,link):
session.headers = {"User-Agent":"Mozilla/5.0"}
res = session.get(link)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("#content-container dt"):
#the error appears in the following line
address = [item.find_next_sibling().get_text(strip=True) for item in items if "correspondence address" in item.text.lower()][0]
print(address)
if __name__ == '__main__':
with requests.Session() as session:
fetch_names(session,URL)
我可以通过下面这样的操作来消除错误,但我想坚持我在脚本中尝试的方式:
items = soup.select("#content-container dt")
address = [item.find_next_sibling().get_text(strip=True) for item in items if "correspondence address" in item.text.lower()][0]
print(address)
EDIT:
It's not an answer but this is how I tried to play around (still unsure how to apply
.find_previous_sibling()
though:
import requests
from bs4 import BeautifulSoup
URL = "https://beta.companieshouse.gov.uk/officers/lX9snXUPL09h7ljtMYLdZU9LmOo/appointments"
def fetch_names(session,link):
session.headers = {"User-Agent":"Mozilla/5.0"}
res = session.get(link)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("#content-container dt"):
address = [item for item in items.strings if "correspondence address" in item.lower()]
print(address)
if __name__ == '__main__':
with requests.Session() as session:
fetch_names(session,URL)
它产生(无导航问题)。你知道吗
[]
['Correspondence address']
[]
[]
items
不是节点列表,而是单个节点,因此您不应该在这里将其用作迭代器-for item in items
。只需将列表替换为以下内容:您可以将BeautifulSoup选择器更改为直接将联系人地址id查找为#通信地址值-1。你知道吗
结果
相关问题 更多 >
编程相关推荐