Python scrapy-sqlitem包_程序模块 - PyPI

将项保存到sql数据库的scrapy扩展

scrapy-sqlitem的Python项目详细描述

scrapy sqlitem

scrapy sqlitem允许您使用sqlalchemy模型定义scrapy项或者桌子。它还提供了一种在大块的。

这个项目是测试版的。欢迎提出请求和反馈。这个使用SQL数据库后端进行重载写的常规注意事项应用程序仍然适用。

快速启动

pip install scrapy_sqlitem

Define items using Sqlalchemy ORM

fromscrapy_sqlitemimportSqlItemclassMyModel(Base):__tablename__='mytable'id=Column(Integer,primary_key=True)name=Column(String)classMyItem(SqlItem):sqlmodel=MyModel

Or Define Items using Sqlalchemy Core

fromscrapy_sqlitemimportSqlItemclassMyItem(SqlItem):sqlmodel=Table('mytable',metadataColumn('id',Integer,primary_key=True),Column('name',String,nullable=False))

如果尚未创建表，请确保创建它们。见 sqlalchemy文档和示例spider。

使用sqlspider可以很容易地将刮下的项保存到数据库

settings.py

DATABASE_URI="sqlite:///"

定义蜘蛛

fromscrapy_sqlitemimportSqlSpiderclassMySpider(SqlSpider):name='myspider'start_urls=('http://dmoz.org',)defparse(self,response):selector=Selector(response)item=MyItem()item['name']=selector.xpath('//title[1]/text()').extract_first()yielditem

运行蜘蛛

scrapy crawl myspider

查询数据库

Select*frommytable;id|name|----+-----------------------------------+
1|DMOZ-theOpenDirectoryProject|

其他信息

不想使用sqlspider？改为编写管道。

fromsqlalchemyimportcreate_engineclassCommitSqlPipeline(object):def__init__(self):self.engine=create_engine("sqlite:///")defprocess_item(self,item,spider):item.commit_item(engine=self.engine)

在保存到数据库之前删除缺少所需主键数据的项

fromscrapy.exceptionsimportDropItemclassDropMissingDataPipeline(object):defprocess_item(self,item,spider):ifitem.null_required_fields:raiseDropItemelse:returnitem# Watch out for Serial primary keys that are considered null.

以块而不是逐项保存到数据库

继承自sqlspider和..

在“设置”中

DEFAULT_CHUNKSIZE=500CHUNKSIZE_BY_TABLE={'mytable':1000,'othertable':250}

如果将块保存到数据库时出错，它将尝试保存每个项目一次一个

访问底层的sqlalchemy表以查询数据库

INSERTINTOmytable(id,name)VALUES('1','ryan')

myitem=MyItem()# bind the table to an engine (I could have done this when I created the table too)myitem.table.metadata.bind=self.enginemyitem.table.select().where(item.table.c.id==1).execute().fetchone()(1,'ryan')

数据库中的哪一行与我的项中的数据匹配？

myitem=MyItem()myitem['id']=1myitem.get_matching_dbrow(bind=self.engine)(1,'ryan')

这和上面的查询是一样的！

有问题

如果您对刮伤的项或蜘蛛关闭的项进行子类划分，请确保调用super！

classMySpider(SqlSpider):defparse(self,response):passdefspider_closed(self,spider,reason):super(MySpider,self).spider_closed(spider,reason)self.log("Log some really important custom stats")

请小心其他混合剂。继承结构可以得到一些凌乱。如果mro子类项中的一个类被刮擦而没有 call super sqlspider的item_scraped方法永远不会被调用。

sqlitem的其他方法

sqlitem.table

返回与该项对应的sqlalchemy核心表。

sqlitem.null必需字段

返回一组标记为NOT的数据库键名可为空，项中的相应数据为空。

sqlitem.null主关键字字段

返回一组主键名，其中在项中为空。

sqlitem.主键

sqlitem.必需的密钥

sqlitem.get_matching_dbrow（bind=none，use_cache=true）

在数据库中查找与中的主键数据匹配的数据项目

待办事项

连续积分测试

欢迎加入QQ群-->： 979659372

scrapy-sqlitem 0.1.2

scrapy-sqlitem的Python项目详细描述

scrapy sqlitem

快速启动

Define items using Sqlalchemy ORM

Or Define Items using Sqlalchemy Core

使用sqlspider可以很容易地将刮下的项保存到数据库

定义蜘蛛

运行蜘蛛

查询数据库

其他信息

不想使用sqlspider？改为编写管道。

在保存到数据库之前删除缺少所需主键数据的项

以块而不是逐项保存到数据库

访问底层的sqlalchemy表以查询数据库

有问题

如果您对刮伤的项或蜘蛛关闭的项进行子类划分，请确保调用super！

sqlitem的其他方法

sqlitem.table

sqlitem.null必需字段

sqlitem.null主关键字字段

sqlitem.主键

sqlitem.必需的密钥

sqlitem.get_matching_dbrow（bind=none，use_cache=true）

待办事项

推荐PyPI第三方库

ndeftool

kernel-api-client

napi-p

skater

rexster

sumdir

reposit

django-like

foreground_app_info

Erik

ngs-toolkit

livestock

fava

txkernel

holdmybeer

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签