Python get请求返回与view sou不同的HTML

from selenium import webdriver from bs4 import BeautifulSoup browser = webdriver.PhantomJS() browser.get("http://archiveofourown.org/works/6846694") soup = BeautifulSoup(browser.page_source, "html.parser") soup.prettify()

2条回答

网友

1楼 · 编辑于 2024-06-10 18:08:06

如果您需要完整的页面源代码（包含所有JavaScript执行和异步请求），那么最后一种方法是朝正确方向迈出的一步。你只缺了一件事-你需要give PhantomJS time在阅读源代码之前加载页面（双关语）。在

此外，您还需要单击“继续”以同意查看成人内容：

from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS()
driver.get("http://archiveofourown.org/works/6846694")

wait = WebDriverWait(driver, 10)

# click proceed
proceed = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Proceed")))
proceed.click()

# wait for the content to be present
wait.until(EC.presence_of_element_located((By.ID, "workskin")))

soup = BeautifulSoup(driver.page_source, "html.parser")
soup.prettify()

网友

2楼 · 编辑于 2024-06-10 18:08:06

Alexce解释了为什么你的代码没有给你想要的，如果你想要的只是源代码中的文本，如果你添加了参数view_adult=true：

import requests
from bs4 import BeautifulSoup
url = "http://archiveofourown.org/works/6846694?view_adult=true"


r= requests.get(url)
soup = BeautifulSoup(r.content, "lxml")
chap = soup.select_one("#chapter-1")
preface = soup.select_one("div.preface.group")


print(preface)
print(chap)

这会给你：

^{pr2}$

希望这就是你所需要的。在

相关问题更多 >

编程相关推荐

热门问题

热门文章