以Python逆向讀取URL插座

2024-03-29 09:31:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从一个发布在网上的日志文件中提取信息并读取输出。我唯一真正需要的信息是在文件的末尾。这些文件相当大,将整个socket输出存储到一个变量中,读取它会消耗大量内存。是否有从下到上读取插座的方法?你知道吗

我目前拥有:

socket = urllib.urlopen(urlString)
OUTPUT = socket.read()
socket.close()
OUTPUT = OUTPUT.split("\n")
for line in OUTPUT:
    if "xxxx" in line:
        print line

我使用的是python2.7。我非常想从Socket的输出端读大约30行。你知道吗


Tags: 文件方法内存in信息outputlinesocket
1条回答
网友
1楼 · 发布于 2024-03-29 09:31:54

在这个用例中,您需要的是HTTP Range请求。以下是我找到的教程:

http://stuff-things.net/2015/05/13/web-scale-http-tail/

我要澄清的是:先用Head请求得到大小,然后再做Range请求的好处是,您不必传输所有内容。您提到您有相当大的文件资源,因此这将是最好的解决方案:)

编辑:在下面添加了此代码。。。你知道吗

下面是那篇博客文章的一个演示(简化版),但已翻译成Python。请注意,这不会适用于所有HTTP服务器!更多内联评论:

"""
illustration of how to 'tail' a file using http. this will not work on all
webservers! if you need an http server to test with you can try the
rangehttpserver module:

$ pip install requests
$ pip install rangehttpserver
$ python -m RangeHTTPServer
"""
import requests

TAIL_SIZE = 1024

url = 'http://localhost:8000/lorem-ipsum.txt'
response = requests.head(url)

# not all servers return content-length in head, for some reason
assert 'content-length' in response.headers, 'Content length unknown- out of luck!'

# check the the resource length and construct a request header for that range
full_length = int(response.headers['content-length'])
assert full_length > TAIL_SIZE
headers = {
  'range': 'bytes={}-{}'.format( full_length - TAIL_SIZE, full_length)
}

# Make a get request, with the range header
response = requests.get(url, headers=headers)
assert 'accept-ranges' in response.headers, 'Accept-ranges response header missing'
assert response.headers['accept-ranges'] == 'bytes'
assert len(response.text) == TAIL_SIZE

# Otherwise you get the entire file
response = requests.get(url)
assert len(response.text) == full_length

相关问题 更多 >