Python scrapy-rss包_程序模块 - PyPI

scrapy框架的rss工具

scrapy-rss的Python项目详细描述

用于轻松生成RSS feed的工具，该工具使用Scrapy framework生成包含每个刮削项的内容。

package适用于python 2.7、3.3、3.4、3.5、3.6和3.7。

如果使用python 3.3，则必须使用scrapy<；1.5.0。

Installation

使用pip安装scrapy_rss
```
pip install scrapy_rss
```
或者对特定的解释器使用pip，例如：
```
pip3 install scrapy_rss
```

或直接使用设置工具：

cd path/to/root/of/scrapy_rss
python setup.py install

或使用特定解释器的设置工具，例如：

cd path/to/root/of/scrapy_rss
python3 setup.py install

如何使用

配置

向scrapy项目设置添加参数（settings.py文件）或蜘蛛的custom_settings属性：

添加将项目导出到RSS源的项目管道：

ITEM_PIPELINES={# ...'scrapy_rss.pipelines.RssExportPipeline':900,# or another priority# ...}

添加所需的馈送参数：
馈送文件
保存结果rss提要的绝对或相对文件路径。例如，feed.rss或output/feed.rss。
提要标题
频道名称（feed），
馈送说明
描述频道的短语或句子（feed），
馈送链接
对应于频道（feed）的HTML网站的URL
```
FEED_FILE='path/to/feed.rss'FEED_TITLE='Some title of the channel'FEED_LINK='http://example.com/rss'FEED_DESCRIPTION='About channel'
```

feed（channel）元素定制[可选]

如果要更改其他频道参数（如语言、版权、管理编辑器，网站管理员，发布日期，上次生成日期，类别，生成器，文档，TTL）然后声明从RssItemExporter类继承的导出器，例如：

fromscrapy_rss.exportersimportRssItemExporterclassMyRssItemExporter(RssItemExporter):def__init__(self,*args,**kwargs):kwargs['generator']=kwargs.get('generator','Special generator')kwargs['language']=kwargs.get('language','en-us')super(CustomRssItemExporter,self).__init__(*args,**kwargs)

并将FEED_EXPORTER参数添加到scrapy项目设置中或蜘蛛的custom_settings属性：

FEED_EXPORTER='myproject.exporters.MyRssItemExporter'

使用量

直接将项目声明为rssitem（）：

importscrapy_rssitem1=scrapy_rss.RssItem()

或者对名为rss的rss字段使用预定义的项类RssedItem 这就是RssItem：

importscrapy_rssclassMyItem(scrapy_rss.RssedItem):field1=scrapy.Field()field2=scrapy.Field()# ...item2=MyItem()

设置/获取项字段。RssItem()的区分大小写属性适用于rss元素， rss元素的属性也区分大小写。如果编辑器允许自动完成，那么它会为RssedItem和RssItem的实例建议属性。允许设置rss元素的任何子集（例如，仅标题）。例如：

fromdatetimeimportdatetimeitem1.title='RSS item title'# set value of <title> elementtitle=item1.title.title# get value of <title> elementitem1.description='description'item1.guid='item identifier'item1.guid.isPermaLink=True# set value of attribute isPermalink of <guid> element,# isPermaLink is False by defaultis_permalink=item1.guid.isPermaLink# get value of attribute isPermalink of <guid> elementguid=item1.guid.guid# get value of element <guid>item1.category='single category'category=item1.categoryitem1.category=['first category','second category']first_category=item1.category[0].category# get value of the element <category> with multiple valuesall_categories=[cat.categoryforcatinitem1.category]# direct attributes settingitem1.enclosure.url='http://example.com/file'item1.enclosure.length=0item1.enclosure.type='text/plain'# or dict based attributes settingitem1.enclosure={'url':'http://example.com/file','length':0,'type':'text/plain'}item1.guid={'guid':'item identifier','isPermaLink':True}item1.pubDate=datetime.now()# correctly works with Python' datetimesitem2.rss.title='Item title'item2.rss.guid='identifier'item2.rss.enclosure={'url':'http://example.com/file','length':0,'type':'text/plain'}

所有允许的元素都列在scrapy_rss/items.py中。具有约束和默认值的每个元素的所有允许属性列在scrapy_rss/elements.py中。您还可以阅读RSS specification了解更多详细信息。

零星项目示例

Examples directory包含几个使用scrapy_rss演示的scrapy项目。它爬行 this website其源代码是 here。

只需转到scrapy project目录并运行命令即可

scrapy crawl first_spider
scrapy crawl second_spider

此后，将在同一目录中创建feed.rss和feed2.rss文件。

欢迎加入QQ群-->： 979659372

scrapy-rss 0.1.7

scrapy-rss的Python项目详细描述

目录

Installation

如何使用

配置

feed（channel）元素定制[可选]

使用量

零星项目示例

推荐PyPI第三方库

RumAlchem

oceanlib

youtube-batch

djangocarrot

matialvarezs_node_accounts

hone

ESN

cwpythonwrapper

IRTools

dyfunconn

enCompres

fastqc_db

boxyboi

blockstack-storage-drivers

elsi-test-one

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrapy-rss 0.1.7

scrapy-rss的Python项目详细描述

目录

Installation

如何使用

配置

feed（channel）元素定制[可选]

使用量

零星项目示例

推荐PyPI第三方库

RumAlchem

oceanlib

youtube-batch

djangocarrot

matialvarezs_node_accounts

hone

ESN

cwpythonwrapper

IRTools

dyfunconn

enCompres

fastqc_db

boxyboi

blockstack-storage-drivers

elsi-test-one

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签