我正在抓取一个google scholar个人资料页面,现在我有来自Beauty soup库的python代码,该库从页面收集数据:
url = "https://scholar.google.com/citations?user=VjJm3zYAAAAJ&hl=en"
while True:
response = requests.get(url)
data = response.text
soup = BeautifulSoup(data,'html.parser')
research_article = soup.find_all('tr',{'class':'gsc_a_tr'})
for research in research_article:
title = research.find('a',{'class':'gsc_a_at'}).text
authors = research.find('div',{'class':'gs_gray'}).text
print('Title:', title,'\n','\nAuthors:', authors)
我还拥有selenium库中的python代码,它可以自动打开配置文件页面,单击“显示更多”按钮:
driver = webdriver.Chrome(executable_path ="/Applications/chromedriver84")
driver.get(url)
try:
#Wait up to 10s until the element is loaded on the page
element = WebDriverWait(driver, 10).until(
#Locate element by id
EC.presence_of_element_located((By.ID, 'gsc_bpf_more'))
)
finally:
element.click()
我如何组合这两块代码,以便单击“显示更多”按钮,并刮取整个页面?提前谢谢
此脚本将打印页面中的所有标题和作者:
印刷品:
相关问题 更多 >
编程相关推荐