动态查找href标记

2024-04-24 21:02:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从我的靓汤搜索中提取“信息技术”作为输出。但我还没弄清楚,因为“扇区”是URL中任何一种股票代码的动态值。你知道吗

有人能告诉我如何提取这些信息吗?你知道吗

<a href="http://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&amp;sector=45">Information Technology</a>

我的代码:

url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'

html = requests.get(url).text    
detail_tags_sector = BeautifulSoup(html, 'lxml')
detail_tags_sector.find_all('a')

Tags: comurlhtmltags动态股票代码detailsector
3条回答

要从锚元素获取文本,需要访问每个锚元素上的.text变量
因此,您的代码将更改为:

url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
contents = []

html = requests.get(url).text    
detail_tags_sector = BeautifulSoup(html, 'html.paser')
for anchor in detail_tags_sector.find_all('a'):
    contents.append(anchor.text)
print(contents)

这些答案的问题在于,它们收集了页面上所有链接的文本,并且有相当多的链接。如果只选择information technology字符串,则只需添加:

info = soup.select_one('[href*="sectors_in"]')
print(info.text)

输出:

Information Technology

您可以使用以下任一选项。你知道吗

import requests
from lxml.html.soupparser import fromstring
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup=fromstring(html)
findSearch = soup.xpath('//a[contains(text(), "Information Technology")]/text()')
print(findSearch[0])

或者

from bs4 import BeautifulSoup
from lxml import html
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'

html = requests.get(url).text
detail_tags_sector = BeautifulSoup(html, 'lxml')
for link in detail_tags_sector.find_all('a'):
    print(link.text)

或者

from bs4 import BeautifulSoup    
import requests
url = 'https://eresearch.fidelity.com/eresearch/goto/evaluate/snapshot.jhtml?symbols=AAPL'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
    print(link.text)

如果有帮助请告诉我。你知道吗

相关问题 更多 >