使用Selenium获取“ul”标签内的文本?

2024-06-16 10:13:22 发布

您现在位置:Python中文网/ 问答频道 /正文

请帮助我找到解决方案,以获取“ul”标签内的文本

我想得到用逗号分隔的信息,如:“含有酶活性B族维生素、膳食补充剂、非转基因LE认证”

网站链接:https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051

图片:enter image description here

这是HTML代码:

<ul>
  <li>Contains Enzymatically Active B-Vitamins
  </li>
  <li>Dietary Supplement
  </li>
  <li>Non-GMO LE Certified
  </li>
</ul>

Tags: https文本le信息网站链接li标签
3条回答

要使用Selenium提取文本,例如,含有酶活性B族维生素膳食补充剂,您可以使用以下任何一种Locator Strategies

  • 使用CSS_SELECTOR并打印列表:

    driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
    print([my_elem.text for my_elem in driver.find_elements_by_css_selector("div[itemprop='description']>ul li")])
    
  • 控制台输出:

    ['Contains Enzymatically Active B-Vitamins', 'Dietary Supplement', 'Non-GMO LE Certified ', 'Promotes healthy metabolism of glucose, fat & alcohol', 'Supports the healthy energy production your body needs', 'Encourages healthy organ function, cognitive health & more', 'Helps inhibit potential vitamin B deficiency']
    
  • 使用XPATH并以逗号分隔的字符串打印元素:

    driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
    print(', '.join([my_elem.text for my_elem in driver.find_elements_by_xpath("//div[@itemprop='description']/ul//li")]))
    
  • 控制台输出:

    Contains Enzymatically Active B-Vitamins, Dietary Supplement, Non-GMO LE Certified , Promotes healthy metabolism of glucose, fat & alcohol, Supports the healthy energy production your body needs, Encourages healthy organ function, cognitive health & more, Helps inhibit potential vitamin B deficiency
    

要提取文本,例如含有酶活性B族维生素膳食补充剂,理想情况下,您必须诱导WebDriverWait用于visibility_of_all_elements_located(),并且您可以使用以下任何一种Locator Strategies

  • 使用CSS_SELECTOR并打印列表:

    driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[itemprop='description']>ul li")))])
    
  • 控制台输出:

    ['Contains Enzymatically Active B-Vitamins', 'Dietary Supplement', 'Non-GMO LE Certified ', 'Promotes healthy metabolism of glucose, fat & alcohol', 'Supports the healthy energy production your body needs', 'Encourages healthy organ function, cognitive health & more', 'Helps inhibit potential vitamin B deficiency']
    
  • 使用XPATH并以逗号分隔的字符串打印元素:

    driver.get('https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051')
    print(', '.join([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@itemprop='description']/ul//li")))]))
    
  • 控制台输出:

    Contains Enzymatically Active B-Vitamins, Dietary Supplement, Non-GMO LE Certified , Promotes healthy metabolism of glucose, fat & alcohol, Supports the healthy energy production your body needs, Encourages healthy organ function, cognitive health & more, Helps inhibit potential vitamin B deficiency
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

您始终可以获取所有元素li,从所有这些元素获取文本并使用", ".join(elements)


为您的小示例编写代码

text = '''
<ul>
  <li>Contains Enzymatically Active B-Vitamins
  </li>
  <li>Dietary Supplement
  </li>
  <li>Non-GMO LE Certified
  </li>
</ul>'''

import selenium.webdriver

driver = selenium.webdriver.Firefox()

driver.get("data:text/html;charset=utf-8," + text)

elements = driver.find_elements_by_tag_name('li')

elements = [i.text for i in elements]

print(", ".join(elements)) 

这应该做到:

from selenium import webdriver

link = 'https://ca.iherb.com/pr/Life-Extension-BioActive-Complete-B-Complex-60-Vegetarian-Capsules/67051'

with webdriver.Chrome() as driver:
    driver.get(link)
    elements = ', '.join([item.text for item in driver.find_elements_by_css_selector("[itemprop='description'] > ul:nth-of-type(1) > li")])
    print(elements)

输出:

Contains Enzymatically Active B-Vitamins, Dietary Supplement, Non-GMO LE Certified 

相关问题 更多 >