Python中用于JSON提要的cURL方法

2024-06-01 05:19:48 发布

您现在位置:Python中文网/ 问答频道 /正文

在构建flask网站时,我使用一个外部JSON提要向本地mongoDB提供内容。在将密钥从JSON重新调整为Mongo中的键时,将解析和馈送此提要。在

feed中的一个可用键称为“img_url”,其中包含一个指向图像的url。在

在Python中,有没有一种方法可以模仿php风格的cURL?我想抓取这个密钥,下载图像,并将其存储在本地的某个地方,同时保留其他相关的密钥,并将其作为数据库的一个条目。在

以下是我目前为止的剧本:

    import json
    import sys
    import urllib2
    from datetime import datetime

    import pymongo
    import pytz

    from utils import slugify
    # from utils import logger

    client = pymongo.MongoClient()
    db = client.artlogic

    def fetch_artworks():
    # logger.debug("downloading artwork data from Artlogic")

AL_artworks = []
AL_artists = []
url = "http://feeds.artlogic.net/artworks/artlogiconline/json/"

while True:
    f = urllib2.urlopen(url)
    data = json.load(f)

    AL_artworks += data['rows']

    # logger.debug("retrieved page %s of %s of artwork data" % (data['feed_data']['page'], data['feed_data']['no_of_pages']))

    # Stop we are at the last page
    if data['feed_data']['page'] == data['feed_data']['no_of_pages']:
        break

    url = data['feed_data']['next_page_link']

# Now we have a list called ‘artworks’ in which all the descriptions are stored
# We are going to put them into the mongoDB database,
# Making sure that if the artwork is already encoded (an object with the same id
# already is in the database) we update the existing description instead of
# inserting a new one (‘upsert’).

# logger.debug("updating local mongodb database with %s entries" % len(artworks))

for artwork in AL_artworks:
    # Mongo does not like keys that have a dot in their name,
    # this property does not seem to be used anyway so let us
    # delete it:
    if 'artworks.description2' in artwork:
        del artwork['artworks.description2']
    # upsert int the database:
    db.AL_artworks.update({"id": artwork['id']}, artwork, upsert=True)


    # artwork['artist_id'] is not functioning properly
    db.AL_artists.update({"artist": artwork['artist']},
                      {"artist_sort": artwork['artist_sort'],
                       "artist":  artwork['artist'],
                       "slug": slugify(artwork['artist'])},
                      upsert=True)

# db.meta.update({"subject": "artworks"}, {"updated": datetime.now(pytz.utc), "subject": "artworks"}, upsert=True)
return AL_artworks

    if __name__ == "__main__":
        fetch_artworks()

Tags: oftheinfromimporturldataartist
1条回答
网友
1楼 · 发布于 2024-06-01 05:19:48

首先,您可能喜欢requests库。在

否则,如果你想坚持stdlib,它将是这样的:

def fetchfile(url, dst):
    fi = urllib2.urlopen(url)
    fo = open(dst, 'wb')
    while True:
        chunk = fi.read(4096)
        if not chunk: break
        fo.write(chunk)


fetchfile(
    data['feed_data']['next_page_link'],
    os.path.join('/var/www/static', uuid.uuid1().get_hex()
)

除了正确的异常捕捉(如果您需要,我可以开发,但我相信文档会足够清晰)。在

您可以将fetchfile()放入异步作业的pool中,一次获取多个文件。在

相关问题 更多 >