如何使用selenium python在网站中刮取：：before元素

wd = webdriver.Chrome(chrome_path) url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM=' wd.get(url) phone = wd.find_element_by_xpath('//a[@class="tel ttel"]').text print(phone)

2条回答

网友

1楼 · 编辑于 2024-05-19 01:05:00

您还可以从计算样式中获取:before内容：

chars = driver.execute_script("return [...document.querySelectorAll('.telCntct a.tel span')].map(span => window.getComputedStyle(span,':before').content)")

但在本例中，您只剩下奇怪的unicode内容，然后必须将这些内容映射到数字。在

网友

2楼 · 编辑于 2024-05-19 01:05:00

你不需要硒。css样式指令中包含了应用给伪before元素值的内容的说明：

在这里，.icon-之后的2/3个字母字符串，例如acb映射到容纳before内容的span元素。\9d0后面的值是实际显示值的+1。您可以从这些值对中创建一个字典（进行调整），以便从span类值中解码每个before处的数字。在

2/3字母字符串如何映射到内容的示例：

我的方法可能有点冗长，因为我对Python不太熟悉，但逻辑应该很清楚。在

import requests
import re
from bs4 import BeautifulSoup
url = 'https://www.justdial.com/Bangalore/Spardha-Mithra-IAS-KAS-Coaching-Centre-Opposite-Maruthi-Medicals-Vijayanagar/080PXX80-XX80-140120184741-R6P8_BZDET?xid=QmFuZ2Fsb3JlIEJhbmsgRXhhbSBUdXRvcmlhbHM='
res  = requests.get(url, headers  = {'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(res.content, 'lxml')

cipherKey = str(soup.select('style[type="text/css"]')[1])
keys = re.findall('-(\w+):before', cipherKey, flags=0)
values = [int(item)-1 for item in re.findall('9d0(\d+)', cipherKey, flags=0)]
cipherDict = dict(zip(keys,values))
cipherDict[list(cipherDict.keys())[list(cipherDict.values()).index(10)]] = '+'
decodeElements = [item['class'][1].replace('icon-','') for item in soup.select('.telCntct span[class*="icon"]')]

telephoneNumber = ''.join([str(cipherDict.get(i)) for i in decodeElements])
print(telephoneNumber)

相关问题更多 >

编程相关推荐

热门问题

热门文章