用Python和Scrapy抓取ASP页面

2024-05-14 19:18:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我不熟悉python和Scrapy

对于我当前的项目,我正在尝试创建一个scraper,它可以通过POST方法将查询传递到ASP页面,并解析输出页面中的<td>

我已经编写了以下代码

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        start_urls = ['https://www.bseindia.com/corporates/Forth_Results.aspx']
        download_delay = 1.5

        scrapy.FormRequest.from_response(
            response,
            formdata={
                'ContentPlaceHolder1_SmartSearch_smartSearch': 'TORRENT PHARMACEUTICALS LTD',
                'ctl00$ContentPlaceHolder1$SmartSearch$hdnCode': 500420,
                'ctl00$ContentPlaceHolder1$hf_scripcode': 500420,
                'ctl00$ContentPlaceHolder1$hidCurrentDate': '7/20/2020 12:00:00 AM',
                '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                '__VIEWSTATEGENERATOR': response.css('input#__VIEWSTATEGENERATOR::attr(value)').extract.first(),
                '__EVENTVALIDATION': response.css('input#__EVENTVALIDATION::attr(value)').extract.first()
            },
            callback=self.parse,
        )

    def parse(self, response):
        return response.css('tr.TTrow td[2] ::text').extract()

它给了我以下错误:

NameError: name 'response' is not defined

我想在cronjob中运行这个scraper,搜索字段(ContentPlaceHolder1\u SmartSearch\u SmartSearch)通过名称列表传递


Tags: selfinputvalueresponseextract页面scrapercss
1条回答
网友
1楼 · 发布于 2024-05-14 19:18:04

您无权访问start_requests中的响应

如果将代码移动到parse函数,它应该可以工作:

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['https://www.bseindia.com/corporates/Forth_Results.aspx']
    download_delay = 1.5

    def parse(self, response):
        formdata = {
           'ContentPlaceHolder1_SmartSearch_smartSearch': 'TORRENT PHARMACEUTICALS LTD',
           'ctl00$ContentPlaceHolder1$SmartSearch$hdnCode': "500420",
           'ctl00$ContentPlaceHolder1$hf_scripcode': "500420",
           'ctl00$ContentPlaceHolder1$hidCurrentDate': '7/20/2020 12:00:00 AM',
           '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
           '__VIEWSTATEGENERATOR': response.css('input#__VIEWSTATEGENERATOR::attr(value)').extract_first(),
           '__EVENTVALIDATION': response.css('input#__EVENTVALIDATION::attr(value)').extract_first()
        }

        return scrapy.FormRequest.from_response(
            response,
            formdata=formdata,
            callback=self.parse_post,
        )

    def parse_post(self, response):
        data = ....

相关问题 更多 >

    热门问题