检查是否存在与输入图像完全相同的图像

1条回答

网友

1楼 · 发布于 2024-04-26 17:51:45

比较RGB像素值

您可以使用pillow模块访问特定图像的像素数据。请记住，pillow支持these image formats。在

根据您的描述，如果我们对两个图像相同意味着什么做了一些假设，则两个图像必须：

具有相同的尺寸（高和宽）
具有相同的RGB像素值（输入图像中像素[x，y]的RGB值必须与输出图像中像素[x，y]的RGB值相同）
具有相同的方向（与前面的假设有关，与旋转90度的同一图像相比，图像被视为不完全相同）

那么如果我们有两个使用pillow模块的图像

from PIL import Image

original = Image.open("input.jpg")
possible_duplicate = Image.open("output.jpg")

下面的代码可以比较这两个图像，看看它们是否相同

^{pr2}$

通过打电话

compare_images(original, possible_duplicate)

使用这个函数，我们可以浏览一组图像

from PIL import Image

def find_duplicate_image(input_image, output_images):
  # only open the input image once
  input_image = Image.open(input_image)

  for image in output_images:
    if compare_images(input_image, Image.open(image)):
      return image

把它们放在一起，我们可以简单地打电话

original = "input.jpg"
possible_duplicates = ["output.jpg", "output2.jpg", ...]

duplicate = find_duplicate_image(original, possible_duplicates)

注意，上面的实现只会找到第一个重复项，并返回该值。如果没有找到重复项，None将被返回。在

要记住的一点是，像这样对每个像素执行比较可能会很昂贵。我使用this image并使用timeit模块将此作为输入和输出运行{}，并取所有这些运行的平均值

num_trials = 100
trials = timeit.repeat(
    repeat=num_trials,
    number=1,
    stmt="compare_images(Image.open('input.jpg'), Image.open('input.jpg'))",
    setup="from __main__ import compare_images; from PIL import Image"
)
avg = sum(trials) / num_trials

print("Average time taken per comparison was:", avg, "seconds")

# Average time taken per comparison was 1.3337286046380177 seconds

请注意，这是在一个只有600×600像素的图像上完成的。如果你用一组“大量的”可能的重复图像来做这个，我将用“大量”来表示至少1M个相似尺寸的图像，这可能需要15天（1000000*1.28s/60秒/60分钟/24小时）来检查并比较每个输出图像和输入图像，这并不理想。在

还要记住，这些指标会根据您使用的机器和操作系统而有所不同。我提供的数字更多是为了说明问题。在

替代实施

虽然我自己还没有完全研究这个实现，但是可以尝试的一种方法是使用hash function预先计算集合中每个图像的像素数据的哈希值。如果您将这些数据存储在数据库中，每个哈希值都包含指向原始图像或图像名称的链接，那么您所要做的就是使用相同的哈希函数计算输入图像的哈希值，然后比较这些哈希值。这样可以节省大量的计算时间，并使算法更加高效。在

This blog post描述了一种实现方法。在

更新-2018-08-06

根据OP的请求，如果给了您可能的重复映像的目录，而不是显式的映像路径本身，那么您可以使用os和{}模块，如下所示

import ntpath
import os

def get_all_images(directory):
  image_paths = []

  for filename in os.listdir(directory):
    # to be as careful as possible, you might check to make sure that
    # the file is in fact an image, for instance using
    # filename.endswith(".jpg") to check for .jpg files for instance
    image_paths.append("{}/{}".format(directory, filename))

  return image_paths

def get_filename(path):
  return ntpath.basename(path)

使用这些函数，更新后的程序可能看起来像

possible_duplicates = get_all_images("/path/to/images")
duplicate_path = find_duplicate_image("/path/to/input.jpg", possible_duplicates)
if duplicate_path:
  print(get_filename(duplicate_path))

如果有重复的图像，上面只会打印重复图像的名称，否则将不打印任何内容。在

比较RGB像素值

替代实施

更新-2018-08-06

相关问题更多 >

编程相关推荐

热门问题

热门文章