我需要从journal_url中获取整个HTML,在本例中,它将是http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues。我已经遵循了这个站点上几个问题上显示的请求示例,但是我没有得到用.text或.json()方法返回的正确HTML请求.get. 我的目标是显示整个HTML,包括每年下面的有序列表和卷下拉列表。在
import requests
import pandas as pd
import http.cookiejar
for i in range(0,len(df)):
journal_name = df.loc[i,"Journal Full Title"]
journal_url = df.loc[i,"URL"]+"/issues"
access_start = df.loc[i,"Content Start Date"]
access_end = df.loc[i,"Content End Date"]
#cj = http.cookiejar.CookieJar()
#opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
headers = {"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"}
r = requests.get(journal_url, headers=headers)
response = r.text
print(response)
如果您的最终目标是从该页面解析您上面提到的内容,那么它是:
相关问题 更多 >
编程相关推荐