Seleniu页面\u源被截断

2024-04-28 03:03:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我想要这个页面的源代码,例如:https://paris-sportifs.pmu.fr/event/699032

下面是我的代码片段:

opts = Options()
opts.add_argument("Host=[paris-sportifs.pmu.fr]")
capabilities = DesiredCapabilities.FIREFOX
capabilities["marionette"] = True
browser = webdriver.Firefox(options=opts, capabilities=capabilities)
browser.get(event_url)
time.sleep(3)
div_list = browser.find_elements_by_class_name('table--header--inner.collapsed')
for item in div_list:
    browser.execute_script("arguments[0].click();", item)

单击功能工作正常,如果我在selenium浏览器中单击show source code,则源代码就是预期的源代码。 但如果我这样做:

html = browser.page_source
print(len(browser.page_source))

len的结果是1472551,而不是400多万

我尝试过等待一段时间(甚至观看了使用while-True循环来查看结果是否发生了变化),但没有任何效果

有什么想法吗

谢谢


Tags: divbrowsereventtruesource源代码pagefr
1条回答
网友
1楼 · 发布于 2024-04-28 03:03:23

来自文档(https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/remote/RemoteWebDriver.html#getPageSource()

Description copied from interface: WebDriver Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.

因此,您可能无法获得预期的DOM的整个html。要做到这一点,您可能必须以其他方式获取它,如此注释中所述:Python Selenium accessing HTML source

相关问题 更多 >