如何从Python scrip捕获Curl的输出

2024-05-15 11:21:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我想找到关于使用curl的网页的信息,但是在Python中,到目前为止我有:

os.system("curl --head www.google.com")

如果我查一下,它会打印出来:

HTTP/1.1 200 OK
Date: Sun, 15 Apr 2012 00:50:13 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=3e39ad65c9fa03f3:FF=0:TM=1334451013:LM=1334451013:S=IyFnmKZh0Ck4xfJ4; expires=Tue, 15-Apr-2014 00:50:13 GMT; path=/; domain=.google.com
Set-Cookie: NID=58=Giz8e5-6p4cDNmx9j9QLwCbqhRksc907LDDO6WYeeV-hRbugTLTLvyjswf6Vk1xd6FPAGi8VOPaJVXm14TBm-0Seu1_331zS6gPHfFp4u4rRkXtSR9Un0hg-smEqByZO; expires=Mon, 15-Oct-2012 00:50:13 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked

我想做的是,能够使用regex匹配其中的200(我不需要帮助),但是,我找不到方法将上面的所有文本转换为字符串。我该怎么做? 我试过:info = os.system("curl --head www.google.com")但是info只是0


Tags: pathinfocomoscookiedomainwwwgoogle
3条回答

试试这个:

import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", "/index.html")
r1 = conn.getresponse()
print r1.status, r1.reason

试试这个,使用^{}

import subprocess
proc = subprocess.Popen(["curl", "--head", "www.google.com"], stdout=subprocess.PIPE)
(out, err) = proc.communicate()
print out

documentation所述:

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several other, older modules and functions, such as:

os.system
os.spawn*
os.popen*
popen2.*
commands.*

不知为什么。。。我需要使用curl(没有pycurl,httplib2…),也许这可以帮助某些人:

import os
result = os.popen("curl http://google.es").read()
print result

相关问题 更多 >