Python simages包_程序模块 - PyPI

数据中的相似图像

simages的Python项目详细描述

：monkey:simages:monkey:

在数据集中查找相似的图像。

用于在使用google-images-download刮除图像后从数据集中删除重复图像。

python api返回pairs, duplicates，其中对是（有序的）最近的对，距离是相应的嵌入距离。

安装

有关所有详细信息，请参见installation docs。

pip install simages

或从源安装：

git clone https://github.com/justinshenk/simages
cd simages
pip install .

要安装交互界面，install mongodb并使用pip install "simages[all]"。

演示

最小的命令行接口，simages-show：

simages_demo

与simages add/find交互图像删除：

使用量

存在两个接口：

matplotlib接口，用于打印副本以供目视检查
MongoDB+Flask接口，允许交互删除[可选]

最小接口

在控制台中，输入带有图像的目录并使用simages-show：

$ simages-show --data-dir .

usage: simages-show [-h] [--data-dir DATA_DIR] [--show-train]
                    [--epochs EPOCHS] [--num-channels NUM_CHANNELS]
                    [--pairs PAIRS] [--zdim ZDIM] [-s]

  -h, --help            show this help message and exit
  --data-dir DATA_DIR, -d DATA_DIR
                        Folder containing image data
  --show-train, -t      Show training of embedding extractor every epoch
  --epochs EPOCHS, -e EPOCHS
                        Number of passes of dataset through model for
                        training. More is better but takes more time.
  --num-channels NUM_CHANNELS, -c NUM_CHANNELS
                        Number of channels for data (1 for grayscale, 3 for
                        color)
  --pairs PAIRS, -p PAIRS
                        Number of pairs of images to show
  --zdim ZDIM, -z ZDIM  Compression bits (bigger generally performs better but
                        takes more time)
  -s, --show            Show closest pairs

网络接口[可选]

注意：要安装web接口api，install and run mongodb并使用pip install "simages[all]"安装可选依赖项。

将您的图片添加到数据库（这将需要一些时间，具体取决于图片的数量）

simages add <images_folder_path>

网页将显示所有相似或重复的图片：

simages find <images_folder_path>

Usage:
    simages add <path> ... [--db=<db_path>] [--parallel=<num_processes>]
    simages remove <path> ... [--db=<db_path>]
    simages clear [--db=<db_path>]
    simages show [--db=<db_path>]
    simages find <path> [--print] [--delete] [--match-time] [--trash=<trash_path>] [--db=<db_path>] [--epochs=<epochs>]
    simages -h | --help
Options:
    -h, --help                Show this screen
    --db=<db_path>            The location of the database or a MongoDB URI. (default: ./db)
    --parallel=<num_processes> The number of parallel processes to run to hash the image
                               files (default: number of CPUs).
    find:
        --print               Only print duplicate files rather than displaying HTML file
        --delete              Move all found duplicate pictures to the trash. This option takes priority over --print.
        --match-time          Adds the extra constraint that duplicate images must have the
                              same capture times in order to be considered.
        --trash=<trash_path>  Where files will be put when they are deleted (default: ./Trash)
        --epochs=<epochs>     Epochs for training [default: 2]

python api

核阵列

fromsimagesimportfind_duplicatesimportnumpyasnparray_data=np.random.random(100,3,48,48)# N x C x H x Wpairs,distances=find_duplicates(array_data)

文件夹

fromsimagesimportfind_duplicatesdata_dir="my_images_folder"pairs,distances=find_duplicates(data_dir)

find_duplicates的默认选项是：

deffind_duplicates(input:Union[strornp.ndarray],n:int=5,num_epochs:int=2,num_channels:int=3,show:bool=False,show_train:bool=False,**kwargs):"""Find duplicates in dataset. Either `array` or `data_dir` must be specified.    Args:        input (str or np.ndarray): folder directory or N x C x H x W array        n (int): number of closest pairs to identify        num_epochs (int): how long to train the autoencoder (more is generally better)        show (bool): display the closest pairs        show_train (bool): show output every        z_dim (int): size of compression (more is generally better, but slower)        kwargs (dict): etc, passed to `EmbeddingExtractor`    Returns:        pairs (np.ndarray): indices for closest pairs of images, n x 2 array        distances (np.ndarray): distances of each pair to each other

`Embeddings`API

fromsimagesimportEmbeddingsimportnumpyasnpN=1000data=np.random.random((N,28,28))embeddings=Embeddings(data)# Access the arrayarray=embeddings.array# N x z (compression size)# Get 10 closest pairs of imagespairs,distances=embeddings.duplicates(n=5)

In[0]:pairsOut[0]:array([[912,990],[716,790],[907,943],[483,492],[806,883]])In[1]:distancesOut[1]:array([0.00148035,0.00150703,0.00158789,0.00168699,0.00168721])

`EmbeddingExtractor`API

fromsimagesimportEmbeddingExtractorimportnumpyasnpN=1000data=np.random.random((N,28,28))extractor=EmbeddingExtractor(data,num_channels=1)# grayscale# Show 10 closest pairs of imagespairs,distances=extractor.show_duplicates(n=10)

类属性和参数：

classEmbeddingExtractor:"""Extract embeddings from data with models and allow visualization.    Attributes:        trainloader (torch loader)        evalloader (torch loader)        model (torch.nn.Module)        embeddings (np.ndarray)    """def__init__(self,input:Union[str,np.ndarray],num_channels=None,num_epochs=2,batch_size=32,show_train=True,show=False,z_dim=8,**kwargs,):"""Inits EmbeddingExtractor with input, either `str` or `np.nd.array`, performs training and validation.    Args:    input (np.ndarray or str): data    num_channels (int): grayscale = 1, color = 3    num_epochs (int): more is better (generally)    batch_size (int): number of images per batch    show_train (bool): show intermediate training results    show (bool): show closest pairs    z_dim (int): compression size    kwargs (dict)    """

指定要用参数n标识的对数。

工作原理

simages使用带pytorch的卷积自动编码器，并将潜在表示与closely：三角形规则进行比较。

依赖关系

simages取决于以下软件包：

closely
torch
torchvision
SCIKIT学习
matplotlib

可选依赖项，与pip install simages[all]一起安装，包括：

pymongodb
快速群集
烧瓶
金贾
dnspython
python魔术
术语颜色

引用

如果您使用simages，请引用它：

    @misc{justin_shenk_2019_3237830,
      author       = {Justin Shenk},
      title        = {justinshenk/simages: v19.0.1},
      month        = jun,
      year         = 2019,
      doi          = {10.5281/zenodo.3237830},
      url          = {https://doi.org/10.5281/zenodo.3237830}
    }

欢迎加入QQ群-->： 979659372

simages 19.0.2.post1

simages的Python项目详细描述

：monkey:simages:monkey:

安装

演示

使用量

最小接口

网络接口[可选]

python api

核阵列
fromsimagesimportfind_duplicatesimportnumpyasnparray_data=np.random.random(100,3,48,48)# N x C x H x Wpairs,distances=find_duplicates(array_data)

文件夹

`Embeddings`API

`EmbeddingExtractor`API

工作原理

依赖关系

引用

推荐PyPI第三方库

pcbledriverp

pysparkifier

dgllife

ruitestpackage3

fog05-sdk

hanabi-learning-environment

cdk8s

rediscollections

dashplus

jsonify-html

extensionhelpers

finestructure

pyobjcframeworkcoremediaio

postgresqlaudit

HubbleSec

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

simages 19.0.2.post1

simages的Python项目详细描述

：monkey:simages:monkey:

安装

演示

使用量

最小接口

网络接口[可选]

python api

核阵列 fromsimagesimportfind_duplicatesimportnumpyasnparray_data=np.random.random(100,3,48,48)# N x C x H x Wpairs,distances=find_duplicates(array_data)

文件夹

EmbeddingsAPI

EmbeddingExtractorAPI

工作原理

依赖关系

引用

推荐PyPI第三方库

pcbledriverp

pysparkifier

dgllife

ruitestpackage3

fog05-sdk

hanabi-learning-environment

cdk8s

rediscollections

dashplus

jsonify-html

extensionhelpers

finestructure

pyobjcframeworkcoremediaio

postgresqlaudit

HubbleSec

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

核阵列
fromsimagesimportfind_duplicatesimportnumpyasnparray_data=np.random.random(100,3,48,48)# N x C x H x Wpairs,distances=find_duplicates(array_data)

`Embeddings`API

`EmbeddingExtractor`API

导航栏

项目链接

标签