在Python中维护网页的更新文件缓存？

2条回答

网友

1楼 · 编辑于 2024-06-16 12:30:58

我不认为有一个库可以做到这一点，但它并不难实现。以下是我在请求中使用的一些函数，可能会对您有所帮助：

import os
import os.path as op
import requests
import urlparse

CACHE = 'path/to/cache'

def _file_from_url(url):
    return op.basename(urlparse.urlsplit(url).path)


def is_cached(*args, **kwargs):
    url = kwargs.get('url', args[0] if args else None)
    path = op.join(CACHE, _file_from_url(url))

    if not op.isfile(path):
        return False

    res = requests.head(*args, **kwargs)

    if not res.ok:
        return False

    # Check if cache is stale. For me, checking content-length fitted my use case.
    # You can use modification date or etag here:

    if not 'content-length' in res.headers:
        return False

    return op.getsize(path) == int(res.headers['content-length'])


def update_cache(*args, **kwargs):
    url = kwargs.get('url', args[0] if args else None)
    path = op.join(CACHE, _file_from_url(url))

    res = requests.get(*args, **kwargs)

    if res.ok:
        with open(path, 'wb') as handle:
            for block in res.iter_content(1024):
                if block:
                    handle.write(block)
                    handle.flush()

用法：

^{pr2}$

网友

2楼 · 编辑于 2024-06-16 12:30:58

我有同样的要求，找到了requests-cache。添加它非常容易，因为它扩展了requests。您可以将缓存保存在内存中并在脚本结束后消失，也可以使用sqlite、mongodb或redis使其持久化。这是我写的两行字，和广告上说的一样：

import requests, requests_cache
requests_cache.install_cache('scraper_cache', backend='sqlite', expire_after=3600)

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中维护网页的更新文件缓存？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >