性能仪表中的Web刮取数据

<div class="arrow-F-uE7IX8 arrowToStrongBuy-1ydGKDOo arrowStrongBuyShudder-3xsGK8k5"> https://www.tradingview.com/symbols/NASDAQ-MDB/ StrongBuy: <div class="arrow-F-uE7IX8 arrowToBuy-1R7d8UMJ arrowBuyShudder-3GMCnG5u"> https://www.tradingview.com/symbols/NYSE-XOM/ Buy: <div class="arrow-F-uE7IX8 arrowToStrongSell-3UWimXJs arrowStrongSellShudder-2UJhm0_C"> https://www.tradingview.com/symbols/NASDAQ-IDEX/ StrongSell:

import pyppdf.patch_pyppeteer from requests_html import AsyncHTMLSession asession = AsyncHTMLSession() async def get_page(): code = 'NASDAQ-MDB' r = await asession.get(f'https://www.tradingview.com/symbols/{code}/') await r.html.arender(wait=3) return r results = asession.run(get_page) for result in results: arrow_class_placeholder = "//div[contains(@class,'arrow-F-uE7IX8 arrowToStrongBuy-1ydGKDOo')]//div[1]" arrow_class_name = result.html.xpath(arrow_class_placeholder,first=True) if arrow_class_name == "//div[contains(@class,'arrow-F-uE7IX8 arrowToStrongBuy-1ydGKDOo')]//div[1]": print('StrongBuy') else: print('not strong buy')

1条回答

网友

1楼 · 发布于 2024-05-14 06:06:13

您可以使用BeautifulSoup4 (bs4)，这是一个Python库，用于从HTML和XML文件中提取数据，并结合使用Regular Expressions (RegEx)。在本例中，我使用python re库用于正则表达式

这是您想要的（source）：

在上面的示例中soup.find_all(class_=re.compile("itle"))返回在类标记中找到单词“itle”的所有实例，例如下面显示的html文档中的class = "title"

对于您的正则表达式，它看起来像"arrowTo*"，甚至只是"arrowTo"soup.find_all(class_=re.compile("arrowTo"))

您的最终代码应该如下所示：

from bs4 import BeautifulSoup
#i think result was your html document from requests library
#the first parameter is your html document variable
soup = BeautifulSoup(result, 'html.parser') 
myArrowToList = soup.find_all(class_=re.compile("arrowTo"))

如果您想要"arrowToStrongBuy"，只需在find_all函数的正则表达式输入中使用它

soup.find_all(class_=re.compile("arrowToStrongBuy"))

相关问题更多 >

编程相关推荐

热门问题

热门文章