将CGImage转换为Python图像(PIL/OpenCV)

1 投票
5 回答
1316 浏览
提问于 2025-04-18 01:52

我想在我的屏幕上做一些图案识别,并打算使用Quartz/PyObjc库来获取屏幕截图。

我得到的截图是一个CGImage格式的图片。我想用openCV库在里面搜索一个图案,但我找不到怎么把这些数据转换成openCV能识别的格式。

所以我想做的是这个:

#get screenshot and reference pattern
img = getScreenshot() # returns CGImage instance, custom function, using Quartz
reference = cv2.imread('ref/reference_start.png') #get the reference pattern

#search for the pattern using the opencv library
result = cv2.matchTemplate(screen, reference, cv2.TM_CCOEFF_NORMED)

#this is what I need
minVal,maxVal,minLoc,maxLoc = cv2.minMaxLoc(result)

我完全不知道该怎么做,也在谷歌上找不到相关信息。

5 个回答

0

这是对Arqu回答的一个改进版本。PIL(至少是Pillow)可以直接加载BGRA数据,不需要进行拆分和合并。

width = Quartz.CGImageGetWidth(cgimg)
height = Quartz.CGImageGetHeight(cgimg)
pixeldata = Quartz.CGDataProviderCopyData(Quartz.CGImageGetDataProvider(cgimg))
bpr = Quartz.CGImageGetBytesPerRow(image)
# Convert to PIL Image.  Note: CGImage's pixeldata is BGRA
image = Image.frombuffer("RGBA", (width, height), pixeldata, "raw", "BGRA", bpr, 1)
0

这里有一段代码,可以用来截屏并把截图保存到文件里。如果你想把这个截图读入到PIL(一个处理图像的库)中,只需要用标准的 Image(path) 就可以了。如果你截取的区域不大,这段代码运行起来非常快。比如说,截取一个800x800像素的区域,每次截图只需要不到50毫秒。而如果你要截取双屏的全分辨率(2880x1800 + 2560x1440),每次截图大约需要1.9秒。

来源: https://github.com/troq/flappy-bird-player/blob/master/screenshot.py

import Quartz
import LaunchServices
from Cocoa import NSURL
import Quartz.CoreGraphics as CG

def screenshot(path, region=None):
    """saves screenshot of given region to path
    :path: string path to save to
    :region: tuple of (x, y, width, height)
    :returns: nothing
    """
    if region is None:
        region = CG.CGRectInfinite

    # Create screenshot as CGImage
    image = CG.CGWindowListCreateImage(
        region,
        CG.kCGWindowListOptionOnScreenOnly,
        CG.kCGNullWindowID,
        CG.kCGWindowImageDefault)

    dpi = 72 # FIXME: Should query this from somewhere, e.g for retina displays

    url = NSURL.fileURLWithPath_(path)

    dest = Quartz.CGImageDestinationCreateWithURL(
        url,
        LaunchServices.kUTTypePNG, # file type
        1, # 1 image in file
        None
        )

    properties = {
        Quartz.kCGImagePropertyDPIWidth: dpi,
        Quartz.kCGImagePropertyDPIHeight: dpi,
        }

    # Add the image to the destination, characterizing the image with
    # the properties dictionary.
    Quartz.CGImageDestinationAddImage(dest, image, properties)

    # When all the images (only 1 in this example) are added to the destination,
    # finalize the CGImageDestination object.
    Quartz.CGImageDestinationFinalize(dest)


if __name__ == '__main__':
    # Capture full screen
    screenshot("testscreenshot_full.png")

    # Capture region (100x100 box from top-left)
    region = CG.CGRectMake(0, 0, 100, 100)
    screenshot("testscreenshot_partial.png", region=region)
2

我也在尝试这个,不过我需要更好的性能,所以直接保存到文件再读取的速度有点慢。经过很多搜索和调试,我最终找到了这个方法:

#get_pixels returns a image reference from CG.CGWindowListCreateImage
imageRef = self.get_pixels()
pixeldata = CG.CGDataProviderCopyData(CG.CGImageGetDataProvider(imageRef))
image = Image.frombuffer("RGBA", (self.width, self.height), pixeldata, "raw", "RGBA", self.stride, 1)
#Color correction from BGRA to RGBA
b, g, r, a = image.split()
image = Image.merge("RGBA", (r, g, b, a))

另外要注意的是,由于我的图片不是标准大小(需要填充),所以它有些奇怪的表现,我不得不调整缓冲区的步幅。如果你是从标准屏幕宽度截取全屏截图,可以直接使用步幅0,它会自动计算。

现在你可以把PIL格式的图片转换成numpy数组,这样在OpenCV中处理起来会更方便,使用以下代码:

image = np.array(image)
3

补充一下Arqu的回答,如果你的最终目标是使用opencv或numpy,那么直接用np.frombuffer会比先创建一个PIL图像更快。因为np.frombuffer的处理时间和Image.frombuffer差不多,但省去了从图像转换成numpy数组这一步(在我这台机器上,这一步大约需要100毫秒,而其他步骤大约只需要50毫秒)。

import Quartz.CoreGraphics as CG
from PIL import Image 
import time
import numpy as np

ct = time.time()
region = CG.CGRectInfinite

# Create screenshot as CGImage
image = CG.CGWindowListCreateImage(
    region,
    CG.kCGWindowListOptionOnScreenOnly,
    CG.kCGNullWindowID,
    CG.kCGWindowImageDefault)

width = CG.CGImageGetWidth(image)
height = CG.CGImageGetHeight(image)
bytesperrow = CG.CGImageGetBytesPerRow(image)

pixeldata = CG.CGDataProviderCopyData(CG.CGImageGetDataProvider(image))
image = np.frombuffer(pixeldata, dtype=np.uint8)
image = image.reshape((height, bytesperrow//4, 4))
image = image[:,:width,:]

print('elapsed:', time.time() - ct)
2

所有这些回答都忽略了Tom Gangemi对这个回答的评论。宽度不是64的倍数的图片会出现问题。我使用了np strides(步幅)来实现一个高效的方法:

cg_img = CG.CGWindowListCreateImage(
    CG.CGRectNull,
    CG.kCGWindowListOptionIncludingWindow,
    wnd_id,
    CG.kCGWindowImageBoundsIgnoreFraming | CG.kCGWindowImageNominalResolution
)

bpr = CG.CGImageGetBytesPerRow(cg_img)
width = CG.CGImageGetWidth(cg_img)
height = CG.CGImageGetHeight(cg_img)

cg_dataprovider = CG.CGImageGetDataProvider(cg_img)
cg_data = CG.CGDataProviderCopyData(cg_dataprovider)

np_raw_data = np.frombuffer(cg_data, dtype=np.uint8)

np_data = np.lib.stride_tricks.as_strided(np_raw_data,
                                          shape=(height, width, 3),
                                          strides=(bpr, 4, 1),
                                          writeable=False)

撰写回答