使用Python进行HTTP横幅抓取

1 投票

2 回答

2298 浏览

数据工程师

提问于 2025-04-16 00:11

我想做一个HTTP横幅抓取工具，但当我连接到一个服务器的80端口，并发送一些请求（比如“HEAD / HTTP/1.1”）时，接收数据的部分没有像我在netcat中那样返回任何东西。

我该怎么做呢？

谢谢！

http 数据接收网络抓取服务器请求

2 个回答

你有没有发送一个"\r\n\r\n"来表示请求结束？如果没有的话，服务器还在等着接收请求的其他部分。

回答于 2025-04-16 由 Python大师

分享举报

试试使用urllib2模块。

>>> data = urllib2.urlopen('http://www.example.com').read()
>>> print data
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=utf-8">
  <TITLE>Example Web Page</TITLE>
</HEAD> 
<body>  
<p>You have reached this web page by typing &quot;example.com&quot;,
&quot;example.net&quot;,
  or &quot;example.org&quot; into your web browser.</p>
<p>These domain names are reserved for use in documentation and are not available 
  for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 
  2606</a>, Section 3.</p>
</BODY>
</HTML>

>>>

如果你只问例子，可能会错过一些细节。要查看content-type这个头信息：

>>> stream = urllib2.urlopen('http://www.example.com')
>>> stream.headers['content-type']
'text/html; charset=UTF-8'
>>> data = stream.read()
>>> print data[:100]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
  <META http-equiv=
>>>

回答于 2025-04-16 由 Python大师

分享举报

使用Python进行HTTP横幅抓取

2 个回答

撰写回答