Python django-easy-scraper包_程序模块 - PyPI

Dango应用程序可刮网页

django-easy-scraper的Python项目详细描述

Django简易刮刀

一个独立的django应用程序，可以轻松地与django和no-django应用程序一起使用/初始化。抓取机制在Regular Expression和{}上，这意味着你可以很容易地使用你熟悉的东西。在

它需要安装pythonrequests模块

安装

pip install django-easy-scraper

基本用途

如果使用正则表达式：

from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):

    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

如果使用xpath：

^{pr2}$

立即刮伤

url = 'www.example.com/bla-bla-details-page/
data = ScrapeExampleDotCom.regex_url_scraper(url)

print(data)

如果正则表达式模式正确，则响应应该如下所示：

{
    'price': 4,
    'title': 'an scraped title',
}

regex_url_scraper方法总是给您json响应

所以，如果您在regex_fields中添加了许多regex模式，它将用您在字典中添加的结果来响应字典键的数量。在

多个站点拼凑在一起

你不需要一直为不同的站点调用不同的方法！！打一次电话就行了，找点乐子，对吧？在

就像你要刮三个地方：

www.example.com

www.exampletwo.com

www.examplethree.com

但是那些网站产品会如何自动刮蹭，这会吓到你吗？在

上面所有站点的Wirte Regex模式，其中包含您要清除的字段：

from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):
    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

class ScrapeExampleTwo(scraper.Scraper):
    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

class ScrapeExampleThree(scraper.Scraper):
    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

你已经为你写了正则表达式所有你要抓取的网站

现在是时候使用我们的Switch类，它将根据您要抓取的站点路由脚本/类？酷，对吧！！在

这是魔术真正开始的地方：

把你所有的类放到字典里switcher。在

Important Note:

key名称应该是域名，纯域名，没有www或http或斜杠，不要添加任何前缀/后缀 value应该是您为其编写的域的类，并将其方法放在`regex_url_scraper'中

from django_easy_scraper import switch

class Switch(switch.BaseSwitch):
    switcher = {
        'example.com': ScrapeExampleDotCom.regex_url_scraper,
        'exampletwo.com': ScrapeExampleTwo.regex_url_scraper,
        'examplethree.com': ScrapeExampleThree.regex_url_scraper,
    }

If you use xpath, you have pass xpath_scraper instead of regex_url_scraper

因此，您已经根据脚本/类获得的url完成了路由。在

将响应数据作为python字典获取，如上面的站点：

url = 'Any of site you have written class for the site and added in switch class'

response = Switch.get_data(url=url, raise_exception=False)

print(response) # Will give you an object of data that you trying to scrape

Switch类为您提供了工具，可以根据站点链接传递到它的get_data方法自动路由抓取类。在

get_data方法的raise_exception如果您想在找不到预期字段时引发异常，它是句柄吗

有问题吗？

请在我们的github repo上打开一个问题：https://github.com/dearopen/django-easy-scraper

如果你喜欢的话，别忘了参与这个项目。在

刮擦快乐！！在

欢迎加入QQ群-->： 979659372

django-easy-scraper 1.0.6

django-easy-scraper的Python项目详细描述

Django简易刮刀

安装

基本用途

立即刮伤

多个站点拼凑在一起

就像你要刮三个地方：

将响应数据作为python字典获取，如上面的站点：

有问题吗？

推荐PyPI第三方库

mrsh

cellunet

snakep

chazutsu

pyQtApp

ohmycron

awsume-cygwin

Mambu

innatis

django-subsites

cutImages

uvspotif

gitpullall

strongarm

irbench-python

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

django-easy-scraper 1.0.6

django-easy-scraper的Python项目详细描述

Django简易刮刀

安装

基本用途

立即刮伤

多个站点拼凑在一起

就像你要刮三个地方：

将响应数据作为python字典获取，如上面的站点：

有问题吗？

推荐PyPI第三方库

mrsh

cellunet

snakep

chazutsu

pyQtApp

ohmycron

awsume-cygwin

Mambu

innatis

django-subsites

cutImages

uvspotif

gitpullall

strongarm

irbench-python

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签