当我使用urllib2爬网wibsite时,但是没有标签,比如html,body

2024-05-12 23:57:23 发布

您现在位置:Python中文网/ 问答频道 /正文

import urllib2

url = 'http://www.bilibili.com/video/av1669338'

user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

headers={"User-Agent":user_agent}

request=urllib2.Request(url,headers=headers)

response=urllib2.urlopen(request)

text = response.read()

text[:100]

\xx00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xxx00\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxe\xe5\xaf\xf0~Y\xd5\xd5\xa8\xeeF\x83\xa7'


Tags: textimportcomhttpurlresponserequestwww
2条回答

导入请求 从bs4导入

定义数据(): url='http://www.bilibili.com/video/av1669338' user_agent=“Mozilla/5.0(Windows NT 10.0;Win64;x64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/51.0.2704.103 Safari/537.36” headers={“用户代理”:用户代理} 响应=请求。获取(url,headers=标题)

data = response.content
_html = BeautifulSoup(data)
_meta = _html.head.select('meta[name=keywords]')
print _meta[0]['content']

试试这个:

import bs4, requests
res = requests.get("http://www.bilibili.com/video/av1669338")
soup = bs4.BeautifulSoup(res.content, "lxml")
result = soup.find("meta", attrs = {"name":"keywords"}).get("content")
print result

相关问题 更多 >