Python测试URL和图片类型

5 投票

4 回答

7720 浏览

提问于 2025-04-16 04:05

在下面的代码中，如何检查类型是否是网址（url）或者类型是否是图片呢？

for dictionaries in d_dict:
  type  = dictionaries.get('type')
  if (type starts with http or https):
    logging.debug("type is url")
  else if type ends with .jpg or .png or .gif
    logging.debug("type is image")
  else:
     logging.debug("invalid type")

url验证数据类型判断图片类型检查

4 个回答

我根据之前的评论写了一个Python脚本。这个脚本首先通过HEAD请求来检查内容类型，如果这个检查失败了，就会检查文件的类型。希望这对你有帮助。

    import mimetypes
    import urllib2


    class HeadRequest(urllib2.Request):
        def get_method(self):
            return 'HEAD'

    def get_contenttype(image_url):
        try:
            response= urllib2.urlopen(HeadRequest(image_url))
            maintype= response.headers['Content-Type'].split(';')[0].lower()
            return maintype
        except urllib2.HTTPError as e:
            print(e)
            return None

    def get_mimetype(image_url):
        (mimetype, encoding) =  mimetypes.guess_type(image_url)
        return mimetype

    def get_extension_from_type(type_string):
        if type(type_string) == str or type(type_string) == unicode:
            temp = type_string.split('/')
            if len(temp) >= 2:
                return temp[1]
            elif len(temp) >= 1:
                return temp[0]
            else:
                return None

    def get_type(image_url):
        valid_types = ('image/png', 'image/jpeg', 'image/gif', 'image/jpg')
        content_type = get_contenttype(image_url)
        if content_type in valid_types:
            return get_extension_from_type(content_type)
        mimetypes = get_mimetype(image_url)
        if mimetypes in valid_types:
            return get_extension_from_type(mimetypes)
        return None

回答于 2025-04-16 由 Python大师

分享举报

你不能仅仅通过网址来判断一个资源是什么类型的。比如说，一个GIF文件的地址可能没有以.gif结尾，或者结尾是个误导性的.txt。实际上，现在网址重写很流行，很多图片的地址根本没有文件扩展名。

真正决定一个网络资源类型的是Content-Type这个HTTP响应头，所以你想要准确知道资源类型，唯一的方法就是获取这个资源，看看你能得到什么响应。你可以通过查看urllib.urlopen(url).headers返回的头信息来做到这一点，但这实际上是把文件本身也下载下来了。为了提高效率，你可以选择发送一个HEAD请求，这样就不会传输整个文件：

import urllib2
class HeadRequest(urllib2.Request):
    def get_method(self):
        return 'HEAD'

response= urllib2.urlopen(HeadRequest(url))
maintype= response.headers['Content-Type'].split(';')[0].lower()
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
    logging.debug('invalid type')

如果你必须根据网址中的文件扩展名来猜测类型（比如因为你没有网络连接），你应该先用urlparse解析网址，去掉任何?query或#fragment部分，这样像http://www.example.com/image.png?blah=blah&foo=.txt这样的地址就不会让你困惑。此外，你还应该考虑使用mimetypes来将文件名映射到Content-Type，这样你就可以利用它对文件扩展名的了解：

import urlparse, mimetypes

maintype= mimetypes.guess_type(urlparse.urlparse(url).path)[0]
if maintype not in ('image/png', 'image/jpeg', 'image/gif'):
    logging.debug('invalid type')

（例如，这样也允许其他扩展名。至少你应该允许.jpeg对应image/jpeg文件，以及变种的三字母Windows扩展名.jpg。）

回答于 2025-04-16 由 Python大师

分享举报

使用正则表达式。

import re

r_url = re.compile(r"^https?:")
r_image = re.compile(r".*\.(jpg|png|gif)$")

for dictionaries in d_dict:
  type  = dictionaries.get('type')
  if r_url.match(type):
    logging.debug("type is url")
  else if r_image.match(type)
    logging.debug("type is image")
  else:
     logging.debug("invalid type")

有两点需要注意：type 是一个内置的功能，另外，图片也可以通过网址加载。

回答于 2025-04-16 由 Python大师

分享举报

Python测试URL和图片类型

4 个回答

撰写回答