一个来自Vantetider.se的统计数据的刮刀,建立在StatStrapper的顶部。
vantetider-scraper的Python项目详细描述
这是一个来自http://www.vantetider.se的统计数据刮刀,它构建在statscaper包的顶部<;https://github.com/jplusplus/statscaper>;
安装
pip install -r requirements.txt
scraper必须执行很多请求,并使用requests cache<;https://pypi.python.org/pypi/requests cache>;来存储查询
示例用法
fromvantetiderimportVantetiderScraperscraper=VantetiderScraper()scraper.items# List _implemeted_ datasets# [<VantetiderDataset: VantatKortareAn60Dagar (Väntat kortare än 60 dagar )>, <VantetiderDataset: Overbelaggning (Överbeläggningar)>, <VantetiderDataset: PrimarvardTelefon (Telefontillgänglighet)>, <VantetiderDataset: PrimarvardBesok (Läkarbesök)>, <VantetiderDataset: SpecialiseradBesok (Förstabesök)>, <VantetiderDataset: SpecialiseradOperation (Operation/åtgärd)>]dataset=scraper.get("Overbelaggning")# Get a specific dataset# List all available dimensionsprintdataset.dimensionsprintdatatset.regions# List available regionprintdatatset.years# List available years# Make a query, you have to explicitly define all dimension values you want# to query. By default the scraper will fetch default values.res=dataset.fetch({"region":"Blekinge","year":"2016","period":"Februari",# Currenty we can only query by id of dimension value"type_of_overbelaggning":["0","1"],# "Somatik" and "Psykiatri"})# Do something with the resultdf=res.pandas
实际应用,使用dataset.py存储。
fromvantetiderimportVantetiderScraperfromvantetider.allowed_valuesimportTYPE_OF_OVERBELAGGNING,PERIODSimportdatasetdb=dataset.connect('sqlite:///vantetider.db')TOPIC="Overbelaggning"# Set up local dbtable=db.create_table(TOPIC)scraper=VantetiderScraper()dataset=scraper.get(TOPIC)# Get all available regions and years for queryyears=[x.valueforxindataset.years]regions=[x.valueforxindataset.regions]# Query in chunks to be able to store to database on the runforregioninregions:foryearinyears:res=dataset.fetch({"year":year,"type_of_overbelaggning":[x[0]forxinTYPE_OF_OVERBELAGGNING],"period":PERIODS,"region":region,})df=res.pandasdata=res.list_of_dictstable.insert_many(data)
待办事项
- 刮除“aterbesok”、“undersokningar”、“bupdetalj”、“bup”。
- 启用查询所有维度上的标签名称
- 将更多允许值添加到vanetider/allowed\u values.py
- 使请求缓存成为可选。
devlop
运行测试:
make tests