用于快速开放修改搜索的光谱库搜索引擎优化

ann-solo的Python项目详细描述


安·索洛

有关详细信息:

{ STN 1 } ANSSOLO < /强>(^ {STR 1 } $ < /强> pRAPT ^ {STR 1 } $< <强> >STR 1 } $n<强>>8bor ^ {STR 1 } $s/强>PcTral^ ^ STR 1 } $LIbRARY)是一种用于快速准确的开放式修改搜索的谱库搜索引擎。Ann SoLo使用近似最近邻索引来加快打开修改搜索,只选择有限数量的最相关的库谱来与未知的查询频谱进行比较。这与级联搜索策略相结合,以最大化所识别的未修改和修改的频谱的数量,同时严格控制错误发现率和移位点积得分,以灵敏地匹配修改的谱到未修改的对应物。

该软件在Apache2.0许可下是开源的。

安装

ann solo需要python 3.6或更高版本。ann solo的gpu版本可以在linux系统上使用,而cpu版本同时支持linux和osx平台。有关操作系统支持的详细信息,请参阅下面链接的FAISS安装说明。

安装ann solo的推荐方法是使用pip:

pip install ann_solo

依赖性

ann solo有以下依赖关系:

我们建议使用conda安装这些依赖项。安装ann solo时,将自动安装任何缺少的依赖项。有两个例外:

ann solo搜索

运行ann solo,使用ann_solo在命令行上直接搜索光谱数据,或者使用python -m ann_solo.ann_solo作为命名的python模块(如果您没有足够的权限安装命令行脚本)。

ann solo参数可以指定为命令行参数或在配置文件中。参数首选项是命令行参数配置文件默认设置。

有关可用参数及其默认值的详细信息,请运行ann_solo -h

大多数选项都有合理的默认值。一些位置参数指定需要使用哪些输入和输出文件。此外,前体和碎片质量公差没有默认值,因为这些值依赖于数据集。

请注意,要在级联搜索模式下运行ann solo,需要为级联搜索的两个级别指定两个不同的前体质量公差(precursor_tolerance_(mass|mode)precursor_tolerance_(mass|mode)_open)。

usage: ann_solo [-h] [-c CONFIG_FILE] [--resolution RESOLUTION]
                [--min_mz MIN_MZ] [--max_mz MAX_MZ] [--remove_precursor]
                [--remove_precursor_tolerance REMOVE_PRECURSOR_TOLERANCE]
                [--min_intensity MIN_INTENSITY] [--min_peaks MIN_PEAKS]
                [--min_mz_range MIN_MZ_RANGE]
                [--max_peaks_used MAX_PEAKS_USED]
                [--max_peaks_used_library MAX_PEAKS_USED_LIBRARY]
                [--scaling {sqrt,rank}] --precursor_tolerance_mass
                PRECURSOR_TOLERANCE_MASS --precursor_tolerance_mode {Da,ppm}
                [--precursor_tolerance_mass_open PRECURSOR_TOLERANCE_MASS_OPEN]
                [--precursor_tolerance_mode_open {Da,ppm}]
                --fragment_mz_tolerance FRAGMENT_MZ_TOLERANCE
                [--allow_peak_shifts] [--fdr FDR]
                [--fdr_tolerance_mass FDR_TOLERANCE_MASS]
                [--fdr_tolerance_mode {Da,ppm}]
                [--fdr_min_group_size FDR_MIN_GROUP_SIZE] [--mode {ann,bf}]
                [--bin_size BIN_SIZE] [--hash_len HASH_LEN]
                [--num_candidates NUM_CANDIDATES] [--batch_size BATCH_SIZE]
                [--num_list NUM_LIST] [--num_probe NUM_PROBE] [--no_gpu]
                spectral_library_filename query_filename out_filename

ANN-SoLo: Approximate nearest neighbor spectral library searching
=================================================================

Bittremieux et al. Fast open modification spectral library searching through
approximate nearest neighbor indexing. Journal of Proteome Research 17,
3464-3474 (2018).

Bittremieux et al. Extremely fast and accurate open modification spectral
library searching of high-resolution mass spectra using feature hashing and
graphics processing units. bioRxiv (2019).

Official code website: https://github.com/bittremieux/ANN-SoLo

Args that start with '--' (eg. --resolution) can also be set in a config file
(config.ini or specified via -c). Config file syntax allows: key=value,
flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi).
If an arg is specified in more than one place, then commandline values
override config file values which override defaults.

positional arguments:
  spectral_library_filename
                        spectral library file (supported formats: splib)
  query_filename        query file (supported formats: mgf)
  out_filename          name of the mzTab output file containing the search
                        results

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config CONFIG_FILE
                        config file path
  --resolution RESOLUTION
                        spectral library resolution; masses will be rounded to
                        the given number of decimals (default: no rounding)
  --min_mz MIN_MZ       minimum m/z value (inclusive, default: 11 m/z)
  --max_mz MAX_MZ       maximum m/z value (inclusive, default: 2010 m/z)
  --remove_precursor    remove peaks around the precursor mass (default: no
                        peaks are removed)
  --remove_precursor_tolerance REMOVE_PRECURSOR_TOLERANCE
                        the window (in m/z) around the precursor mass to
                        remove peaks (default: 0 m/z)
  --min_intensity MIN_INTENSITY
                        remove peaks with a lower intensity relative to the
                        maximum intensity (default: 0.01)
  --min_peaks MIN_PEAKS
                        discard spectra with less peaks (default: 10)
  --min_mz_range MIN_MZ_RANGE
                        discard spectra with a smaller mass range (default:
                        250 m/z)
  --max_peaks_used MAX_PEAKS_USED
                        only use the specified most intense peaks for the
                        query spectra (default: 50)
  --max_peaks_used_library MAX_PEAKS_USED_LIBRARY
                        only use the specified most intense peaks for the
                        library spectra (default: 50)
  --scaling {sqrt,rank}
                        to reduce the influence of very intense peaks, scale
                        the peaks by their square root or by their rank
                        (default: rank)
  --precursor_tolerance_mass PRECURSOR_TOLERANCE_MASS
                        precursor mass tolerance (small window for the first
                        level of the cascade search)
  --precursor_tolerance_mode {Da,ppm}
                        precursor mass tolerance unit (options: Da, ppm)
  --precursor_tolerance_mass_open PRECURSOR_TOLERANCE_MASS_OPEN
                        precursor mass tolerance (wide window for the second
                        level of the cascade search)
  --precursor_tolerance_mode_open {Da,ppm}
                        precursor mass tolerance unit (options: Da, ppm)
  --fragment_mz_tolerance FRAGMENT_MZ_TOLERANCE
                        fragment mass tolerance (m/z)
  --allow_peak_shifts   use the shifted dot product instead of the standard
                        dot product
  --fdr FDR             FDR threshold to accept identifications during the
                        cascade search (default: 0.01)
  --fdr_tolerance_mass FDR_TOLERANCE_MASS
                        mass difference bin width for the group FDR
                        calculation during the second cascade level (default:
                        0.1 Da)
  --fdr_tolerance_mode {Da,ppm}
                        mass difference bin unit for the group FDR calculation
                        during the second cascade level (default: Da)
  --fdr_min_group_size FDR_MIN_GROUP_SIZE
                        minimum group size for the group FDR calculation
                        during the second cascade level (default: 20)
  --mode {ann,bf}       search using an approximate nearest neighbors or the
                        traditional (brute-force) mode; 'bf': brute-force,
                        'ann': approximate nearest neighbors (default: ann)
  --bin_size BIN_SIZE   ANN vector bin width (default: 0.04 Da)
  --hash_len HASH_LEN   ANN vector length (default: 800)
  --num_candidates NUM_CANDIDATES
                        number of candidates to retrieve from the ANN index
                        for each query (default: 1024), maximum 1024 when
                        using GPU indexing
  --batch_size BATCH_SIZE
                        number of query spectra to process simultaneously
                        (default: 16384)
  --num_list NUM_LIST   number of partitions in the ANN index (default: 256)
  --num_probe NUM_PROBE
                        number of partitions in the ANN index to inspect
                        during querying (default: 128), maximum 1024 when
                        using GPU indexing
  --no_gpu              don't use the GPU for ANN searching (default: GPU is
                        used if available)

光谱-光谱匹配查看器

使用ann solo绘图仪从搜索结果中可视化频谱-频谱匹配。绘图仪可以使用ann_solo_plot直接在命令行上运行,也可以使用python -m ann_solo.plot_ssm作为命名的python模块(如果您没有足够的权限安装命令行脚本)。

绘图仪需要一个由ann solo生成的mztab标识文件作为命令行参数,并需要查询的标识符来可视化。 请注意,用于执行搜索的光谱库需要位于MZTAB文件中指定的准确位置。

绘图仪将创建一个带有镜像图的PNG文件,以可视化指定的光谱-光谱匹配。

usage: ann_solo_plot [-h] mztab_filename query_id

Visualize spectrum–spectrum matches from your ANN-SoLo identification results

positional arguments:
  mztab_filename  Identifications in mzTab format
  query_id        The identifier of the query to visualize

optional arguments:
  -h, --help      show this help message and exit

联系人

有关详细信息,您可以访问official code website或向wout.bittremieux@uantwerpen.be发送电子邮件。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Android使用两个后台服务错误   解压缩HTTPInputStream时,java GZIPInputStream过早关闭   javax和javax的区别是什么。网ssl。密钥库和服务器。ssl。为SpringBoot应用程序指定密钥库时的密钥库属性   java生成两个JPanel,而我只需要一个   java深度链接从play store安装应用程序时获取数据   java 安卓应用程序在退出时未正确释放蓝牙   java正确使用setCellValueFactory   java开放JdbcTemplate连接处于只读模式?   使用Spring MVC创建服务时发生java错误   JavaFX获取安装在计算机中的特定应用程序的版本   SecureRandom的安全问题:PRNG在java 1.5中不一致   windows我可以创建一个独立的。带Inno设置的Java应用程序的exe安装程序?   如何使用JavaServlet下载csv文件?   java从生成的缓冲图像中添加图像作为jasper中的数据记录?   java日期和时间解析