在Python中使用Beautifulsoup时如何排除不需要的标记

<div class="companyLocation">United States <span><a aria-label="Same Python Developer job in 1 other location" class="more_loc" href="/addlLoc/redirect?tk=1fgg7b6pa306m001&jk=d724dab9a2d2af2c&dest=%2Fjobs%3Fq%3Dpython%26limit%3D50%26grpKey%3DkAO5nvwVmAPOkxWgAwHyBwN0Y2w%253D" rel="nofollow"> +1 location</a></span> <span class="remote-bullet">•</span><span>Remote</span></div>, United States+1 location•Remote

import requests from bs4 import BeautifulSoup url = "https://www.indeed.com/jobs?q=python&limit=50" extracts_url = requests.get(url) extracts_soup = BeautifulSoup(extracts_url.text, 'html.parser') soup_jobs = extracts_soup.find_all("div", {"class": "job_seen_beacon"}) for soup_job in soup_jobs: for a in soup_job.select("div.companyLocation"): if a.string is not None: pass #problem(below) if a.string is None: print(f"{a}, {a.text}")

2条回答

网友

1楼 · 编辑于 2024-05-14 04:13:06

这样行吗

    #problem(below)
    if a.string is None:
        data=''
        for child in a.children:
            if not child.name and child != '':
                data+=child
        print(data)

网友

2楼 · 编辑于 2024-05-14 04:13:06

您混淆了if语句，请尝试以下操作：

import requests
from bs4 import BeautifulSoup

url = "https://www.indeed.com/jobs?q=python&limit=50"

extracts_url = requests.get(url)
extracts_soup = BeautifulSoup(extracts_url.text, 'html.parser')
soup_jobs = extracts_soup.find_all("div", {"class": "job_seen_beacon"})

for soup_job in soup_jobs:
    for a in soup_job.select("div.companyLocation"):
        if a.string is not None:
            print(f"{a}, {a.text}")

输出：

<div class="companyLocation">United States</div>, United States
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation">Boulder, CO</div>, Boulder, CO
<div class="companyLocation">Houston, TX</div>, Houston, TX
<div class="companyLocation">Allen, TX</div>, Allen, TX
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation">New York, NY</div>, New York, NY
<div class="companyLocation">New York, NY</div>, New York, NY
<div class="companyLocation">New York State</div>, New York State
<div class="companyLocation">Austin, TX</div>, Austin, TX
<div class="companyLocation">Research Triangle Park, NC</div>, Research Triangle Park, NC
<div class="companyLocation">New York, NY</div>, New York, NY
<div class="companyLocation">Cary, NC</div>, Cary, NC
<div class="companyLocation">Raleigh, NC</div>, Raleigh, NC
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation"><span>Remote</span></div>, Remote
<div class="companyLocation">Houston, TX</div>, Houston, TX
<div class="companyLocation">Bellevue, WA</div>, Bellevue, WA
<div class="companyLocation">Houston, TX</div>, Houston, TX

现在它工作得很好

相关问题更多 >

编程相关推荐

热门问题

热门文章