在python请求中处理井号（#）

3条回答

网友

1楼 · 编辑于 2024-04-18 21:16:10

基本上，URL中的任何文本井号后的内容都是而不是发送到服务器。这适用于浏览器和requests。在

URL的格式表明type=#results部分实际上是一个查询参数。在

requests将自动编码查询参数，而浏览器不会。下面是各种查询以及服务器在每种情况下接收到的内容：

浏览器中的URL参数

在浏览器中使用井号时，池号后面的任何内容都会发送到服务器：

https://httpbin.org/anything/type=#results

退货：

^{pr2}$

服务器接收到的URL是https://httpbin.org/anything/type=。在
被请求的页面被称为type=，这似乎不正确。在

浏览器中的查询参数

<key>=<value>格式表明它可能是您正在传递的查询参数。不过，井号后面的任何内容都会发送到服务器：

https://httpbin.org/anything?type=#results

退货：

{
  "args": {
    "type": ""
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,de;q=0.7", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "*redacted*"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "*redacted*", 
  "url": "https://httpbin.org/anything?type="
}

服务器接收到的URL是https://httpbin.org/anything?type=。在
被请求的页面称为anything。在
接收到没有值的参数type。在

浏览器中编码的查询参数

https://httpbin.org/anything?type=%23results

退货：

{
  "args": {
    "type": "#results"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,de;q=0.7", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "*redacted*"
  }, 
  "json": null, 
  "method": "GET", 
  "origin": "*redacted*", 
  "url": "https://httpbin.org/anything?type=%23results"
}

服务器接收到的URL是https://httpbin.org/anything?type=%23results。在
被请求的页面称为anything。在
接收到值为#results的参数type。在

带有URL参数的Python请求

requests也不会在井号后向服务器发送任何内容：

import requests

r = requests.get('https://httpbin.org/anything/type=#results')
print(r.url)
print(r.json())

退货：

https://httpbin.org/anything/type=#results
{
    "args": {},
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.21.0"
    },
    "json": null,
    "method": "GET",
    "origin": "*redacted*",
    "url": "https://httpbin.org/anything/type="
}

服务器接收到的URL是https://httpbin.org/anything?type=。在
被请求的页面称为anything。在
接收到没有值的参数type。在

带查询参数的Python请求

requests自动编码查询参数：

import requests

r = requests.get('https://httpbin.org/anything', params={'type': '#results'})
print(r.url)
print(r.json())

退货：

https://httpbin.org/anything?type=%23results
{
    "args": {
        "type": "#results"
    },
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.21.0"
    },
    "json": null,
    "method": "GET",
    "origin": "*redacted*",
    "url": "https://httpbin.org/anything?type=%23results"
}

服务器接收到的URL是https://httpbin.org/anything?type=%23results。在
被请求的页面称为anything。在
接收到值为#results的参数type。在

具有双编码查询参数的Python请求

如果手动编码查询参数，然后将其传递给requests，则它将再次对已编码的查询参数进行编码：

import requests

r = requests.get('https://httpbin.org/anything', params={'type': '%23results'})
print(r.url)
print(r.json())

退货：

https://httpbin.org/anything?type=%23results
{
    "args": {
        "type": "%23results"
    },
    "data": "",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.21.0"
    },
    "json": null,
    "method": "GET",
    "origin": "*redacted*",
    "url": "https://httpbin.org/anything?type=%2523results"
}

服务器接收到的URL是https://httpbin.org/anything?type=%2523results。在
被请求的页面称为anything。在
接收到值为%23results的参数type。在

网友

2楼 · 编辑于 2024-04-18 21:16:10

answer by Cloudomation提供了很多有趣的信息，但我认为它可能不是您要查找的。假设python论坛中的this identical thread也是您编写的，请继续阅读：

从您提供的信息来看，type=#results似乎被用来过滤原始csv并只返回部分数据。
如果是这样，那么type=部分并不是真正必要的（尝试不使用它的URL，并确保得到相同的结果）。在

我会解释：

url中的#符号称为fragment identifier，在不同类型的页面中，它有不同的用途。在text/csv页中，它的作用是按列、行或两者的组合来过滤csv表。你可以阅读更多关于它的here。在

在您的例子中，results可能是一个查询参数，用于以自定义方式筛选csv表。在

不幸的是，正如Cloudomation的回答所示，碎片数据在服务器端不可用，因此您将无法通过python请求参数以您尝试的方式访问它。

您可以尝试在Javascript as suggested here中访问它，或者简单地下载整个（未过滤的）CSV表并自己过滤它。

在python中有许多方法可以轻松高效地完成这项工作。查看here了解更多信息，或者如果需要更多控制，可以将CSV导入pandas dataframe。在

编辑：

我看到你找到了一个解决方法，通过连接字符串并传递第二个请求。因为这是可行的，所以您可能不需要将参数转换为字符串（如建议的here）。如果它能达到您所追求的效果，这将是一个更高效、更优雅的解决方案：

params = {'key1': 'value1', 'key2': 'value2'} // sample params dict

def _get_statcast_results(params):

    // convert params to string - alternatively you can  use %-formatting 
    params_str = "&".join(f"{k}={v}" for k,v in payload.items())

    s = session()

    data = s.get(statcast_url, params = params_str, timeout=30)

    return data.content

网友
3楼 · 编辑于 2024-04-18 21:16:10

我只通过了一次试验，但希望有一个解决办法。我没有通过params传递“#results”，而是用基本url+所有其他参数启动一个会话，用“#results”连接它，然后通过第二个get运行它。在

statcast_url = 'https://baseballsavant.mlb.com/statcast_search/csv?'
results_url = '&type=#results&'

def _get_statcast_results(params):

    s = session()
    _get = s.get(statcast_url, params=params, timeout=30, allow_redirects=True)

    new_url = _get.url+results_url
    data = s.get(new_url, timeout=30)

    return data.content

还需要进行更多的试验，但我认为这应该行得通。感谢大家的支持。尽管我没有得到一个直接的回答，但他们的回答还是帮了我很多忙。在

浏览器中的URL参数

浏览器中的查询参数

浏览器中编码的查询参数

带有URL参数的Python请求

带查询参数的Python请求

具有双编码查询参数的Python请求

相关问题更多 >

编程相关推荐

热门问题

热门文章