基于嵌套框架和javascrip的Web抓取

2条回答

网友

1楼 · 编辑于 2024-04-19 08:57:53

对于这样的任务，我会使用Requests。在

import requests

r = requests.get("http://talkingbox.dyndns.org:49495/in?id=3B9054BC032E53EF691A9A1803040F1C&msg=" + your_question)

对于不包含动态内容的网页，r.text是您想要的。在

由于您没有提供有关动态网页的更多信息，因此没有更多的内容可以说。在

网友

2楼 · 编辑于 2024-04-19 08:57:53

无论是美化组，机械化，请求，甚至刮，加载动态页面将不得不完成另一个步骤由你写。在

例如，使用scrapy this可能看起来像：

class TheBotSpider(BaseSpider):
    name = 'thebot'
    allowed_domains = ['thebot.de', 'talkingbox.dyndns.org']

    def __init__(self, *a, **kw):
        super(TheBotSpider, self).__init__(*a, **kw)
        self.domain = 'http://talkingbox.dyndns.org:49495/'
        self.start_urls = [self.domain + 
                           'in?id=3B9054BC032E53EF691A9A1803040F1C&msg=' + 
                           self.question]

    def parse(self, response):
        sel = Selector(response)
        url = sel.xpath('//frame[@name="frout"]/@src').extract()[0]
        yield Request(url=url, callback=dynamic_page)

    def dynamic_page(self, response):
        .... xpath to scrape answer

以问题作为论据：

^{pr2}$

有关如何使用scray的详细信息，请参见scrapy tutorial

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于嵌套框架和javascrip的Web抓取

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >