pycurl和curl在请求相同资源时表现不同;curl正确地给出了一个JSON对象,PycURL给出了一个HTML对象

2024-05-23 21:47:02 发布

您现在位置:Python中文网/ 问答频道 /正文

ipinfo.io通过在其website上输入或通过curl命令行实用程序向其发送请求,提供与IP地址对应的网站/服务器的信息,例如:

$ curl  https://ipinfo.io/172.217.169.6

JSON格式的输出:

{
  "ip": "172.217.169.68",
  "hostname": "lhr48s09-in-f4.1e100.net",
  "city": "London",
  "region": "England",
  "country": "GB",
  "loc": "51.5085,-0.1257",
  "org": "AS15169 Google LLC",
  "postal": "EC1A",
  "timezone": "Europe/London",
  "readme": "https://ipinfo.io/missingauth"
}

我最终要做的是在Python中执行此操作,并将此结果存储为JSON对象。我相信使用pycURL的以下代码应该会产生相同的输出:

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, "https://ipinfo.io/172.217.169.6")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close

body = buffer.getvalue()
print(body.decode('iso-8859-1'))

也就是说,将相同的JSON字符串写入缓冲区

然而,它却打印了大量的HTML输出,也就是说,我怀疑pycURL实际页面中的HTML请求的是来自的数据,而不是JSON数据。e、 g:

<!DOCTYPE html>
<html>
<head>
    <title>
    172.217.169.6 IP Address Details
 - IPinfo.io</title>
    <meta charset="utf-8">
    <meta name="apple-itunes-app" content="app-id=917634022">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, user-scalable=no">
    <meta name="description" content="Full IP address details for 172.217.169.6 (AS15169 Google LLC) including geolocation and map, hostname, and API details.">

    <link rel="manifest" href="/static/manifest.json">
    <link rel="icon" sizes="48x48" href="/static/deviceicons/android-icon-48x48.png">


...
    

</html>

基本上,我如何让pycURL也接收这个JSON数据?



我试图比较两者的详细输出,但我无法找出它们行为不同的原因,只是内容类型字段不同;curl的“application/json”和pycURL的“text/html”,这解释了不同的输出。冒着让这篇文章长篇大论的风险,我也提供了以下内容:

curl(命令行)详细输出:

$ curl -v https://ipinfo.io/172.217.169.6
*   Trying 34.117.59.81:443...
* TCP_NODELAY set
* Connected to ipinfo.io (34.117.59.81) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=ipinfo.io
*  start date: Jul 10 20:18:59 2021 GMT
*  expire date: Oct  8 21:18:59 2021 GMT
*  subjectAltName: host "ipinfo.io" matched cert's "ipinfo.io"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55a887a40e10)
> GET /172.217.169.6 HTTP/2
> Host: ipinfo.io
> user-agent: curl/7.68.0
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200 
< access-control-allow-origin: *
< x-frame-options: DENY
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< content-type: application/json; charset=utf-8
< content-length: 286
< date: Tue, 27 Jul 2021 21:03:50 GMT
< x-envoy-upstream-service-time: 1
< via: 1.1 google
< alt-svc: clear
< 
{
  "ip": "172.217.169.6",
  "hostname": "lhr25s26-in-f6.1e100.net",
  "city": "London",
  "region": "England",
  "country": "GB",
  "loc": "51.5085,-0.1257",
  "org": "AS15169 Google LLC",
  "postal": "EC1A",
  "timezone": "Europe/London",
  "readme": "https://ipinfo.io/missingauth"
* Connection #0 to host ipinfo.io left intact
}

pycURL详细输出:

$ python3 ip_helper.py
*   Trying 34.117.59.81:443...
* TCP_NODELAY set
* Connected to ipinfo.io (34.117.59.81) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=ipinfo.io
*  start date: Jul 10 20:18:59 2021 GMT
*  expire date: Oct  8 21:18:59 2021 GMT
*  subjectAltName: host "ipinfo.io" matched cert's "ipinfo.io"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x19d65c0)
> GET /172.217.169.6 HTTP/2
Host: ipinfo.io
user-agent: PycURL/7.43.0.6 libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
accept: */*

* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 200 
< access-control-allow-origin: *
< x-frame-options: DENY
< x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
< referrer-policy: strict-origin-when-cross-origin
< content-type: text/html; charset=utf-8
< content-length: 44645
< date: Tue, 27 Jul 2021 21:07:50 GMT
< x-envoy-upstream-service-time: 13
< via: 1.1 google
< alt-svc: clear
< 
* Connection #0 to host ipinfo.io left intact
<!DOCTYPE html>
<html>
<head>
    <title>
    172.217.169.6 IP Address Details
 - IPinfo.io</title>
    <meta charset="utf-8">
    <meta name="apple-itunes-app" content="app-id=917634022">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, user-scalable=no">
    <meta name="description" content="
    
        Full IP address details for 172.217.169.6 (AS15169 Google LLC) including geolocation and map, hostname, and API details.
    
">

    <link rel="manifest" href="/static/manifest.json">
    <link rel="icon" sizes="48x48" href="/static/deviceicons/android-icon-48x48.png">


...

</html>

谢谢你抽出时间


Tags: tonameiniohttphtmlbuffergoogle
1条回答
网友
1楼 · 发布于 2024-05-23 21:47:02

docs开始:

We try to automatically detect when someone wants to call our API versus view our website, and then we send back the appropriate JSON response rather than HTML. We do this based on the user agent for known popular programming languages, tools, and frameworks. However, there are a couple of other ways to force a JSON response when it doesn't happen automatically. One is to add /json to the URL, and the other is to set an Accept header to application/json

看来有三种不同的方法可以使用pycurl返回JSON

  1. /json附加到您的URL:
c.setopt(c.URL, "https://ipinfo.io/172.217.169.6/json")
  1. Accept头设置为仅允许JSON响应:
c.setopt(c.HTTPHEADER, ["Accept: application/json"])
  1. 设置User-Agent标题,使网站认为它在与curl而不是pycurl交谈:
c.setopt(c.HTTPHEADER, ["User-Agent: curl"])

相关问题 更多 >