如何基于Scrapy中第一个请求的响应构造请求列表?

2024-04-16 07:52:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我想要一个API。API将返回一些数据和数据总量。我想

  1. 首先调用API一次,获取数据总量
  2. 然后将数据总量除以页面大小,得到总页数
  3. 接下来,构建要发送的请求列表

但我不知道如何在Scrapy中做到这一点。这是我的start_requests

def start_requests(self):
        url = "https://hkapi.centanet.com/api/Transaction/Map.json" 

        page = 1

        headers = {
            'lang': 'tc',
            'Content-Type': 'application/json; charset=UTF-8',
            'Connection': 'Keep-Alive',
            'User-Agent': 'okhttp/4.7.2' 
        }

        payload = {
            "daterange": 180,
            "postType": "s",
            "refdate": "20200701",
            "order": "desc",
            "page": f"{page}",
            "pageSize": 100,
            "pixelHeight": 2220,
            "pixelWidth": 1080,
            "points[0].lat": 22.695053063373795,
            "points[0].lng": 113.85844465345144,
            "points[1].lat": 22.695053063373795,
            "points[1].lng": 114.38281349837781,
            "points[2].lat": 21.993328259196705,
            "points[2].lng": 114.38281349837781,
            "points[3].lat": 21.993328259196705,
            "points[3].lng": 113.85844465345144,
            "sort": "score",
            "zoom": 9.745128631591797,
            "platform": "android"
        }

        yield scrapy.Request(url, callback=self.parse, method="POST", headers=headers, body=json.dumps(payload))

这是我的parse

    def parse(self, response):
        json_response = json.loads(response.text)
        yield json_response

我想我可以在parse函数中提取数据总数并计算页面总数。但是我怎样才能得到这个数字并构建一个有效负载列表呢

例如,如果总页数为3。然后我将构造一个长度为3的有效负载列表。然后通过有效载荷循环

JSON响应示例:

    {
    "DITems":[],
    "TransactionCount": 34037,
    "Count": 34037,
    "MinPoint": {
        "Lat": 22.2390387561,
        "Lng": 113.9203349215
    },
    "MaxPoint": {
        "Lat": 22.5454478015,
        "Lng": 114.2243478859
    },
    "RoundTripNeeded": false
    }

谢谢!这是我第一个使用Scrapy的项目


Tags: 数据selfapijson列表parseresponsepage
1条回答
网友
1楼 · 发布于 2024-04-16 07:52:40

如果我理解正确,那么您所要做的就是在负载周围执行for循环,并在获得第一个请求的总页数后基于该特定负载发送请求

根据评论更新

我使用total_pages = json.loads(response.text)['total_pages']作为一个示例,在parse函数中访问json文件中的总页面

代码示例

url = "https://hkapi.centanet.com/api/Transaction/Map.json" 
headers = {
          'lang': 'tc',
          'Content-Type': 'application/json; charset=UTF-8',
          'Connection': 'Keep-Alive',
          'User-Agent': 'okhttp/4.7.2' 
         }

first_payload = {
            "daterange": 180,
            "postType": "s",
            "refdate": "20200701",
            "order": "desc",
            "page": "1",
            "pageSize": 100,
            "pixelHeight": 2220,
            "pixelWidth": 1080,
            "points[0].lat": 22.695053063373795,
            "points[0].lng": 113.85844465345144,
            "points[1].lat": 22.695053063373795,
            "points[1].lng": 114.38281349837781,
            "points[2].lat": 21.993328259196705,
            "points[2].lng": 114.38281349837781,
            "points[3].lat": 21.993328259196705,
            "points[3].lng": 113.85844465345144,
            "sort": "score",
            "zoom": 9.745128631591797,
            "platform": "android"
           }  

def start_requests(self):
   
    yield scrapy.Request(url=self.url, callback=self.parse, method="POST", headers=self.headers, body=json.dumps(self.first_payload))

def parse(self,response):
    total_pages = json.loads(response.text)['total_pages']
    for i in range(2,total_pages+1):
        page = i
        payload = {
           "daterange": 180,
           "postType": "s",
           "refdate": "20200701",
           "order": "desc",
           "page": f"{page}",
           "pageSize": 100,
           "pixelHeight": 2220,
           "pixelWidth": 1080,
           "points[0].lat": 22.695053063373795,
           "points[0].lng": 113.85844465345144,
           "points[1].lat": 22.695053063373795,
           "points[1].lng": 114.38281349837781,
           "points[2].lat": 21.993328259196705,
           "points[2].lng": 114.38281349837781,
           "points[3].lat": 21.993328259196705,
           "points[3].lng": 113.85844465345144,
           "sort": "score",
           "zoom": 9.745128631591797,
           "platform": "android"
          }
       yield scrapy.Request(url=self.url, callback=self.parse_new_requests, method="POST", headers=self.headers, body=json.dumps(payload))

def parse_new_requests(self,response):
    json_response = json.loads(response.text)
    yield json_response

解释

我们首先请求获取total_page变量。然后我们在parse函数中定义total_pages。然后我们可以使用它在range(2,total_page+1)中创建for循环,因为我们不需要第一页。创建每个特定的有效负载,然后将该有效负载传递到parse_new_requests

相关问题 更多 >