获取所有隐藏链接

2024-04-25 17:01:44 发布

男 | 程序猿一只，喜欢编程写python代码。

我是新来的。我想得到所有隐藏的链接

<div class="page-body">
  <div class="page-title"></div>
  <div class="page cursorPointer">
    <a title="" data-placement="top" data-toggle="tooltip" href="#" data-original-title="Verified"></a></div>
</div>

这是我的剧本：

#!/usr/bin/python3
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.chrome.options import Options
import requests
import re
from openpyxl import Workbook

driver = webdriver.Chrome(options=options)

    driver.get(
       "https://someurl.com")

    pagelist = []

    content = driver.page_source
    soup = BeautifulSoup(content, 'lxml')
    for a in soup.findAll('div', attrs={'class': 'page cursorPointer'}):
        page = a.find_element_by_xpath("//a[@href]")

    pagelist.append(page.get_attribute("href"))

    df = pd.DataFrame({'Page': pagelist})
    df.to_excel('pagelist.xlsx', index=False, encoding='utf-8')

我有个错误：

page = a.find_element_by_xpath("//a[@href]") TypeError: 'NoneType' object is not callable

Tags： from import div data title driver selenium page

1条回答

网友

1楼 · 发布于 2024-04-25 17:01:44

发生这种情况是因为您正在对soup对象使用selenium方法。试着这样做：

pagelist = driver.execute_script("""
  return [...document.querySelectorAll('a[href]')].map(a => a.href)
""")

获取所有隐藏链接

相关问题更多 >

编程相关推荐

热门问题

热门文章

获取所有隐藏链接

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >