如何将带有图像网址的图像保存到mongodb?

4 投票
1 回答
3758 浏览
提问于 2025-04-18 17:27

我有一个问题:我需要在网页抓取时把图片保存到mongodb。我有一个图片链接。我试过这样做:

images_binaries = [] # this will store all images data before saving it to mongodb
# save as file on hard disc
urllib.urlretrieve(url, self.album_path + '/' + photo_file_name)
images_binaries.append(open(self.album_path + '/' + photo_file, 'r').read())
....
# after that I append this array of images raw data to Item
post = WaralbumPost()
post['images_binary'] = images_binaries
....

这是Waralbum项目的代码:

from scrapy.item import Item, Field

class WaralbumPost(Item):
    images_binary = Field()

但是在保存到mongo时出现了错误:bson.errors.InvalidStringData: strings in documents must be valid UTF-8: '\xff\.....

有什么更好的方法吗?把原始图片数据转换一下能解决这个问题吗?也许,scrapy有更好的方法来保存图片?感谢大家的回答

解决方案: 我删除了这些行: images_binaries.append(open(self.album_path + '/' + photo_file, 'r').read()) post['images_binary'] = images_binaries 在我的WaralbumPost中,我还保存了图片的URL。然后,在pipelines.py中,我获取这个URL并把图片保存到mongo。pipelines.py的代码:

class WarAlbum(object):
def __init__(self):
    connection = pymongo.Connection(settings['MONGODB_SERVER'], settings['MONGODB_PORT'])
    db = connection[settings['MONGODB_DB']]
    self.collection = db[settings['MONGODB_COLLECTION']]
    self.grid_fs = gridfs.GridFS(getattr(connection, settings['MONGODB_DB']))

def process_item(self, item, spider):
    links = item['img_links']
    ids = []
    for i, link in enumerate(links):
        mime_type = mimetypes.guess_type(link)[0]
        request = requests.get(link, stream=True)
        _id = self.grid_fs.put(request.raw, contentType=mime_type, filename=item['local_images'][i])
        ids.append(_id)
    item['data_chunk_id'] = ids
    self.collection.insert(dict(item))
    log.msg("Item wrote to MongoDB database %s/%s" %
            (settings['MONGODB_DB'], settings['MONGODB_COLLECTION']),
            level=log.DEBUG, spider=spider)
    return item

希望这对某些人有帮助

1 个回答

3

使用GridFS。举个例子:

String newFileName = "my-image";
File imageFile = new File("/users/victor/images/image.png");
GridFS gfsPhoto = new GridFS(db, "photo");
GridFSInputFile gfsFile = gfsPhoto.createFile(imageFile);
gfsFile.setFilename(newFileName);
gfsFile.save();

撰写回答