形势
我想从这个网站上抓取: http://www.dpm.tn/dpm_pharm/medicament/listmedicparnomspec.php
我的代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import requests
from bs4 import BeautifulSoup
# agent
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36"
# headless driver
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument(f'user-agent={user_agent}')
options.add_argument("--window-size=1920,1080")
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
options.add_argument("--disable-extensions")
options.add_argument("--proxy-server='direct://'")
options.add_argument("--proxy-bypass-list=*")
options.add_argument("--start-maximized")
options.add_argument('--disable-gpu')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path="D:\Downloads\chromedriver.exe", options=options)
# request test
medecine = 'doliprane'
# submiting a search
driver.get('http://www.dpm.tn/dpm_pharm/medicament/listmedicparnomspec.php')
e = driver.find_element_by_name('id')
e.send_keys(medecine)
e.submit()
# geting the result table
try:
table = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody')
print('succes')
except:
print('failed')
获取链接的代码:
print('bs4 turn \n')
result = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')
rows = result.find_all('tr')
links = []
real_link = []
for row in rows:
links.append(row.find('a', href= True))
for each in links:
print(each['href'])
问题:
无论何时运行此命令,我都会收到以下错误:
'NoneType' object is not subscriptable
问题:
我如何获取此信息并根据需要查找href属性
访问时,请尝试:
不要使用selenium,而是使用请求库获取数据并对其进行解析
代码:
如果您有任何问题,请告诉我:)
我解决了这个问题,但用硒代替了靓汤:
这是为我工作
相关问题 更多 >
编程相关推荐