Python Selenium爬取包含部分文本的项目

2024-04-19 13:26:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从html表中提取特定元素,这是我当前的代码:

tabela  = soup.find("div", {"class" : "productDatatable"})
>>> tabela

<div class="productDatatable">\n<div>\r\n            Category:\r\n                        <span class="productDatatableValue">\n<a href="/en/market/mt5/utility">Utilities</a>\n</span>\n</div>\n<div title="Number of activations available for the buyers of this application. During the activation, software product is bound to the buyer's hardware, so that the copy of the application cannot work on another PC. The application should be re-activated and downloaded again in order to launch it on another computer. If the activation limit is exceeded, the buyer will have to purchase the product again.">\r\n            Activations:\r\n                        <span class="productDatatableValue">\r\n                            5\r\n                        </span>\n</div>\n<div style="padding:5px;"></div>\n<div>\r\n            Author:\r\n                        <span class="productDatatableValue">\n<span style="display: inline-block; vertical-align: middle; margin-top: -2px;"><span class="icoVerified small" title="Verified User"></span></span>\n<span title="Konstantin Chernov"><a class="author" href="/en/users/konstantin83" title="Konstantin83">Konstantin Chernov</a></span>\n</span>\n</div>\n<div>\r\n            Published:\r\n                        <span class="productDatatableValue">\r\n                            16 January 2013\r\n                        </span>\n</div>\n<div>\r\n            Current version:\r\n                        <span class="productDatatableValue">1.55</span>\n</div>\n<div>\r\n            Updated:\r\n                        <span class="productDatatableValue">\r\n                            23 March 2015\r\n                        </span>\n</div>\n</div>

如何从这个html输出中提取类别名称。 我试过了,但没用。你知道吗

tabela.find_element_by_xpath("//*[contains(text(), 'Category')]").find("span", {"class" : "productDatatable"}).text

如何从这个html中获取类别?我需要输出Utilities


Tags: ofthetodivapplicationtitlehtmlfind
2条回答

请试试这个

tabela.find_element_by_xpath("/html/body/div[1]/div[3]/div[2]/div[1]/div[2]/div[4]/div[1]/span/a").text

返回Utilities哪个在锚定标记中span.试试看下面是我们的密码。 编辑:

from bs4 import BeautifulSoup
import requests
response=requests.get("https://www.mql5.com/en/market/product/635").text
soup=BeautifulSoup(response,'html.parser')
tabela  = soup.find("div", class_="productDatatable").find('span', class_="productDatatableValue").find('a')
print(tabela.text)

编辑:

如果您想使用selenium,请使用以下xpath并引用category

print(browser.find_element_by_xpath("//div[contains(.,'Category')]/span[@class='productDatatableValue']/a").text)

相关问题 更多 >