我正在Beautifulsoup.com上练习python抓取
使用[div class companyLocation]提取“作业位置”时, 我想要的是在'div^{cl1}之后获得位置字符串$
但在某些情况下,会有额外的'a aria label'或'span'子句,其中包含不需要的字符串,如“+1 location”等
我想不出如何摆脱这些。 所以我征求你的意见
<div class="companyLocation">United States
<span><a aria-label="Same Python Developer job in 1 other location" class="more_loc" href="/addlLoc/redirect?tk=1fgg7b6pa306m001&jk=d724dab9a2d2af2c&dest=%2Fjobs%3Fq%3Dpython%26limit%3D50%26grpKey%3DkAO5nvwVmAPOkxWgAwHyBwN0Y2w%253D" rel="nofollow">
+1 location</a></span>
<span class="remote-bullet">•</span><span>Remote</span></div>, United States+1 location•Remote
以下是我的Python代码供您参考。 问题出现在“if a.string为None:”情况下
您可以通过以下代码看到上面的div+span html子句: 打印(f“{a},{a.text}”)
import requests
from bs4 import BeautifulSoup
url = "https://www.indeed.com/jobs?q=python&limit=50"
extracts_url = requests.get(url)
extracts_soup = BeautifulSoup(extracts_url.text, 'html.parser')
soup_jobs = extracts_soup.find_all("div", {"class": "job_seen_beacon"})
for soup_job in soup_jobs:
for a in soup_job.select("div.companyLocation"):
if a.string is not None:
pass
#problem(below)
if a.string is None:
print(f"{a}, {a.text}")
这样行吗
您混淆了
if
语句,请尝试以下操作:输出:
现在它工作得很好
相关问题 更多 >
编程相关推荐