如何排除某些我不想要的beautifulsoup结果？

from bs4 import BeautifulSoup import requests URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') for link in soup.find_all('a'): print(link.get('href'))

2条回答

网友

1楼 · 编辑于 2024-05-29 03:44:03

您可以使用str.startswith()方法：

from bs4 import BeautifulSoup
import requests

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for tag in soup.find_all('a'):
    link = tag.get('href')
    if not str(link).startswith('#'):
        print(link)

网友

2楼 · 编辑于 2024-05-29 03:44:03

您可以使用CSS选择器a[href]:not([href^="#"])。这将选择具有href=属性的所有<a>标记，但不选择以#字符开头的标记：

import requests
from bs4 import BeautifulSoup

URL = 'https://en.wikipedia.org/wiki/List_of_Wikipedia_mobile_applications'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.select('a[href]:not([href^="#"])'):
    print(link['href'])

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何排除某些我不想要的beautifulsoup结果？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >