使用Selenium获取“ul”标签内的文本？

3条回答

网友

1楼 · 编辑于 2024-06-16 10:13:22

要使用Selenium和python提取文本，例如，含有酶活性B族维生素，膳食补充剂，您可以使用以下任何一种Locator Strategies：

使用CSS_SELECTOR并打印列表：

driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
print([my_elem.text for my_elem in driver.find_elements_by_css_selector("div[itemprop='description']>ul li")])

控制台输出：

['Contains Enzymatically Active B-Vitamins', 'Dietary Supplement', 'Non-GMO LE Certified ', 'Promotes healthy metabolism of glucose, fat & alcohol', 'Supports the healthy energy production your body needs', 'Encourages healthy organ function, cognitive health & more', 'Helps inhibit potential vitamin B deficiency']

使用XPATH并以逗号分隔的字符串打印元素：

driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
print(', '.join([my_elem.text for my_elem in driver.find_elements_by_xpath("//div[@itemprop='description']/ul//li")]))

控制台输出：

Contains Enzymatically Active B-Vitamins, Dietary Supplement, Non-GMO LE Certified , Promotes healthy metabolism of glucose, fat & alcohol, Supports the healthy energy production your body needs, Encourages healthy organ function, cognitive health & more, Helps inhibit potential vitamin B deficiency

要提取文本，例如含有酶活性B族维生素，膳食补充剂，理想情况下，您必须诱导WebDriverWait用于visibility_of_all_elements_located()，并且您可以使用以下任何一种Locator Strategies：

使用CSS_SELECTOR并打印列表：

driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[itemprop='description']>ul li")))])

控制台输出：

['Contains Enzymatically Active B-Vitamins', 'Dietary Supplement', 'Non-GMO LE Certified ', 'Promotes healthy metabolism of glucose, fat & alcohol', 'Supports the healthy energy production your body needs', 'Encourages healthy organ function, cognitive health & more', 'Helps inhibit potential vitamin B deficiency']

使用XPATH并以逗号分隔的字符串打印元素：

driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
print(', '.join([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@itemprop='description']/ul//li")))]))

控制台输出：

Contains Enzymatically Active B-Vitamins, Dietary Supplement, Non-GMO LE Certified , Promotes healthy metabolism of glucose, fat & alcohol, Supports the healthy energy production your body needs, Encourages healthy organ function, cognitive health & more, Helps inhibit potential vitamin B deficiency

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

网友

2楼 · 编辑于 2024-06-16 10:13:22

您始终可以获取所有元素li，从所有这些元素获取文本并使用", ".join(elements)

为您的小示例编写代码

text = '''
<ul>
  <li>Contains Enzymatically Active B-Vitamins
  </li>
  <li>Dietary Supplement
  </li>
  <li>Non-GMO LE Certified
  </li>
</ul>'''

import selenium.webdriver

driver = selenium.webdriver.Firefox()

driver.get("data:text/html;charset=utf-8," + text)

elements = driver.find_elements_by_tag_name('li')

elements = [i.text for i in elements]

print(", ".join(elements))

网友

3楼 · 编辑于 2024-06-16 10:13:22

这应该做到：

from selenium import webdriver

link = 'https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051'

with webdriver.Chrome() as driver:
    driver.get(link)
    elements = ', '.join([item.text for item in driver.find_elements_by_css_selector("[itemprop='description'] > ul:nth-of-type(1) > li")])
    print(elements)

输出：

Contains Enzymatically Active B-Vitamins, Dietary Supplement, Non-GMO LE Certified

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Selenium获取“ul”标签内的文本？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >