刮削：无法从web访问信息

html = urllib.urlopen("https://www.rockethub.com/projects/34210-lunar-lion-the-first-ever-university-led-mission-to-the-moon") tree=BeautifulSoup(html, "lxml") description=tree.find('div',{'id':'description_section','class':'description-section'})

2条回答

网友

1楼 · 编辑于 2024-06-06 04:13:59

我找到了如何用R报废：

library("rvest")

url="https://www.rockethub.com/projects/34210-lunar-lion-the-first-ever-university-led-mission-to-the-moon/description"

url %>% 
  html() %>% 
  html_nodes(xpath='//div[@id="description_section"]', xmlValue) %>%
  html_text()

网友

2楼 · 编辑于 2024-06-06 04:13:59

您需要发出一个额外的请求来获取描述。下面是一个使用^{}+BeautifulSoup的完整工作示例：

import requests
from bs4 import BeautifulSoup

url = "https://www.rockethub.com/projects/34210-lunar-lion-the-first-ever-university-led-mission-to-the-moon/"
with requests.Session() as session:
    session.headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
    }

    # get the token
    response = session.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    token = soup.find("meta", {"name": "csrf-token"})["content"]

    # get the description
    description_url = url + "description"
    response = session.get(description_url, headers={"X-CSRF-Token": token, "X-Requested-With": "XMLHttpRequest"})

    soup = BeautifulSoup(response.content, "html.parser")
    description = soup.find('div', {'id':'description_section', 'class': 'description-section'})
    print(description.get_text(strip=True))

相关问题更多 >

编程相关推荐

热门问题

热门文章

刮削：无法从web访问信息

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >