在构建flask网站时,我使用一个外部JSON提要向本地mongoDB提供内容。在将密钥从JSON重新调整为Mongo中的键时,将解析和馈送此提要。在
feed中的一个可用键称为“img_url”,其中包含一个指向图像的url。在
在Python中,有没有一种方法可以模仿php风格的cURL?我想抓取这个密钥,下载图像,并将其存储在本地的某个地方,同时保留其他相关的密钥,并将其作为数据库的一个条目。在
以下是我目前为止的剧本:
import json
import sys
import urllib2
from datetime import datetime
import pymongo
import pytz
from utils import slugify
# from utils import logger
client = pymongo.MongoClient()
db = client.artlogic
def fetch_artworks():
# logger.debug("downloading artwork data from Artlogic")
AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"
while True:
f = urllib2.urlopen(url)
data = json.load(f)
AL_artworks += data['rows']
# logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))
# Stop we are at the last page
if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
break
url = data['feed_data']['next_page_link']
# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).
# logger.debug("updating local mongodb database with %s entries" % len(artworks))
for artwork in AL_artworks:
# Mongo does not like keys that have a dot in their name,
# this property does not seem to be used anyway so let us
# delete it:
if 'artworks.description2' in artwork:
del artwork['artworks.description2']
# upsert int the database:
db.AL_artworks.update({"id": artwork['id']}, artwork, upsert=True)
# artwork['artist_id'] is not functioning properly
db.AL_artists.update({"artist": artwork['artist']},
{"artist_sort": artwork['artist_sort'],
"artist": artwork['artist'],
"slug": slugify(artwork['artist'])},
upsert=True)
# db.meta.update({"subject": "artworks"}, {"updated": datetime.now(pytz.utc), "subject": "artworks"}, upsert=True)
return AL_artworks
if __name__ == "__main__":
fetch_artworks()
首先,您可能喜欢requests库。在
否则,如果你想坚持stdlib,它将是这样的:
除了正确的异常捕捉(如果您需要,我可以开发,但我相信文档会足够清晰)。在
您可以将
fetchfile()
放入异步作业的pool中,一次获取多个文件。在相关问题 更多 >
编程相关推荐