如何从OuterHTMLin python中提取值

2024-04-27 03:27:42 发布

您现在位置:Python中文网/ 问答频道 /正文

<a id="ctl00_ctl00_ctl00_c_hdetail_lblPat2" href="javascript:popupPatient(218809, '0');">CHATARPAL, LALITA</a>

我正在尝试从outerHTML获取文本(218809)。之前我用AHK做同样的事情,但现在我正在学习Python做同样的事情

这是我的密码

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import re
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select

driver.get("https://brightree.net/F1/0375/MBSNI/Receipts/Invoices/Invoice_Invoice.aspx?InvoiceKey=3729668")

wait=WebDriverWait(driver,10)
wait.until(EC.element_to_be_clickable((By.XPATH,"//*[@id='ctl00_ctl00_ctl00_c_hdetail_lblSalesOrder2']")))

Target=driver.find_element_by_id("ctl00_ctl00_ctl00_c_hdetail_lblPat2")
Get_Value=Target.get_attribute("outerHTML")
print(Get_Value)

2条回答
Get_Value=Target.get_attribute("href")
Get_Value=re.findall('\d+', Get_Value)[0]
print(Get_Value)

使用regex\d查找数字,\d+表示一个或多个数字

# re is short for [r]egular [e]xpression
from re import match

# This is your example string from the question. The string contains both single
# and double quotes, so I used triple quotes to avoid needing to escape them.
string = """<a id="ctl00_ctl00_ctl00_c_hdetail_lblPat2" href="javascript:popupPatient(218809, '0');">CHATARPAL, LALITA</a>"""

# match any number of characters, then the method name, then capture the number.
pattern = r'.*?popupPatient\((\d+)'

# Get the first capture group from the regex and print it
print(match(pattern, string).group(1))

相关问题 更多 >