用于快速开放修改搜索的光谱库搜索引擎优化
ann-solo的Python项目详细描述
安·索洛
有关详细信息:
{ STN 1 } ANSSOLO < /强>(^ {STR 1 } $ < /强> pRAPT ^ {STR 1 } $< <强> >STR 1 } $n<强>>8bor ^ {STR 1 } $s/强>PcTral^ ^ STR 1 } $LIbRARY)是一种用于快速准确的开放式修改搜索的谱库搜索引擎。Ann SoLo使用近似最近邻索引来加快打开修改搜索,只选择有限数量的最相关的库谱来与未知的查询频谱进行比较。这与级联搜索策略相结合,以最大化所识别的未修改和修改的频谱的数量,同时严格控制错误发现率和移位点积得分,以灵敏地匹配修改的谱到未修改的对应物。
该软件在Apache2.0许可下是开源的。
安装
ann solo需要python 3.6或更高版本。ann solo的gpu版本可以在linux系统上使用,而cpu版本同时支持linux和osx平台。有关操作系统支持的详细信息,请参阅下面链接的FAISS安装说明。
安装ann solo的推荐方法是使用pip:
pip install ann_solo
依赖性
ann solo有以下依赖关系:
- ConfigArgParse
- Cython
- Faiss
- Joblib
- Matplotlib
- mmh3
- Numba
- NumExpr
- NumPy
- Pandas
- Pyteomics
- SciPy
- spectrum_utils
- tqdm
我们建议使用conda安装这些依赖项。安装ann solo时,将自动安装任何缺少的依赖项。有两个例外:
- numpy需要在安装ann solo之前可用。
- faiss安装取决于特定的GPU版本。有关详细信息,请参阅Faiss installation instructions。
ann solo搜索
运行ann solo,使用ann_solo
在命令行上直接搜索光谱数据,或者使用python -m ann_solo.ann_solo
作为命名的python模块(如果您没有足够的权限安装命令行脚本)。
ann solo参数可以指定为命令行参数或在配置文件中。参数首选项是命令行参数配置文件默认设置。
有关可用参数及其默认值的详细信息,请运行ann_solo -h
。
大多数选项都有合理的默认值。一些位置参数指定需要使用哪些输入和输出文件。此外,前体和碎片质量公差没有默认值,因为这些值依赖于数据集。
请注意,要在级联搜索模式下运行ann solo,需要为级联搜索的两个级别指定两个不同的前体质量公差(precursor_tolerance_(mass|mode)
和precursor_tolerance_(mass|mode)_open
)。
usage: ann_solo [-h] [-c CONFIG_FILE] [--resolution RESOLUTION]
[--min_mz MIN_MZ] [--max_mz MAX_MZ] [--remove_precursor]
[--remove_precursor_tolerance REMOVE_PRECURSOR_TOLERANCE]
[--min_intensity MIN_INTENSITY] [--min_peaks MIN_PEAKS]
[--min_mz_range MIN_MZ_RANGE]
[--max_peaks_used MAX_PEAKS_USED]
[--max_peaks_used_library MAX_PEAKS_USED_LIBRARY]
[--scaling {sqrt,rank}] --precursor_tolerance_mass
PRECURSOR_TOLERANCE_MASS --precursor_tolerance_mode {Da,ppm}
[--precursor_tolerance_mass_open PRECURSOR_TOLERANCE_MASS_OPEN]
[--precursor_tolerance_mode_open {Da,ppm}]
--fragment_mz_tolerance FRAGMENT_MZ_TOLERANCE
[--allow_peak_shifts] [--fdr FDR]
[--fdr_tolerance_mass FDR_TOLERANCE_MASS]
[--fdr_tolerance_mode {Da,ppm}]
[--fdr_min_group_size FDR_MIN_GROUP_SIZE] [--mode {ann,bf}]
[--bin_size BIN_SIZE] [--hash_len HASH_LEN]
[--num_candidates NUM_CANDIDATES] [--batch_size BATCH_SIZE]
[--num_list NUM_LIST] [--num_probe NUM_PROBE] [--no_gpu]
spectral_library_filename query_filename out_filename
ANN-SoLo: Approximate nearest neighbor spectral library searching
=================================================================
Bittremieux et al. Fast open modification spectral library searching through
approximate nearest neighbor indexing. Journal of Proteome Research 17,
3464-3474 (2018).
Bittremieux et al. Extremely fast and accurate open modification spectral
library searching of high-resolution mass spectra using feature hashing and
graphics processing units. bioRxiv (2019).
Official code website: https://github.com/bittremieux/ANN-SoLo
Args that start with '--' (eg. --resolution) can also be set in a config file
(config.ini or specified via -c). Config file syntax allows: key=value,
flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi).
If an arg is specified in more than one place, then commandline values
override config file values which override defaults.
positional arguments:
spectral_library_filename
spectral library file (supported formats: splib)
query_filename query file (supported formats: mgf)
out_filename name of the mzTab output file containing the search
results
optional arguments:
-h, --help show this help message and exit
-c CONFIG_FILE, --config CONFIG_FILE
config file path
--resolution RESOLUTION
spectral library resolution; masses will be rounded to
the given number of decimals (default: no rounding)
--min_mz MIN_MZ minimum m/z value (inclusive, default: 11 m/z)
--max_mz MAX_MZ maximum m/z value (inclusive, default: 2010 m/z)
--remove_precursor remove peaks around the precursor mass (default: no
peaks are removed)
--remove_precursor_tolerance REMOVE_PRECURSOR_TOLERANCE
the window (in m/z) around the precursor mass to
remove peaks (default: 0 m/z)
--min_intensity MIN_INTENSITY
remove peaks with a lower intensity relative to the
maximum intensity (default: 0.01)
--min_peaks MIN_PEAKS
discard spectra with less peaks (default: 10)
--min_mz_range MIN_MZ_RANGE
discard spectra with a smaller mass range (default:
250 m/z)
--max_peaks_used MAX_PEAKS_USED
only use the specified most intense peaks for the
query spectra (default: 50)
--max_peaks_used_library MAX_PEAKS_USED_LIBRARY
only use the specified most intense peaks for the
library spectra (default: 50)
--scaling {sqrt,rank}
to reduce the influence of very intense peaks, scale
the peaks by their square root or by their rank
(default: rank)
--precursor_tolerance_mass PRECURSOR_TOLERANCE_MASS
precursor mass tolerance (small window for the first
level of the cascade search)
--precursor_tolerance_mode {Da,ppm}
precursor mass tolerance unit (options: Da, ppm)
--precursor_tolerance_mass_open PRECURSOR_TOLERANCE_MASS_OPEN
precursor mass tolerance (wide window for the second
level of the cascade search)
--precursor_tolerance_mode_open {Da,ppm}
precursor mass tolerance unit (options: Da, ppm)
--fragment_mz_tolerance FRAGMENT_MZ_TOLERANCE
fragment mass tolerance (m/z)
--allow_peak_shifts use the shifted dot product instead of the standard
dot product
--fdr FDR FDR threshold to accept identifications during the
cascade search (default: 0.01)
--fdr_tolerance_mass FDR_TOLERANCE_MASS
mass difference bin width for the group FDR
calculation during the second cascade level (default:
0.1 Da)
--fdr_tolerance_mode {Da,ppm}
mass difference bin unit for the group FDR calculation
during the second cascade level (default: Da)
--fdr_min_group_size FDR_MIN_GROUP_SIZE
minimum group size for the group FDR calculation
during the second cascade level (default: 20)
--mode {ann,bf} search using an approximate nearest neighbors or the
traditional (brute-force) mode; 'bf': brute-force,
'ann': approximate nearest neighbors (default: ann)
--bin_size BIN_SIZE ANN vector bin width (default: 0.04 Da)
--hash_len HASH_LEN ANN vector length (default: 800)
--num_candidates NUM_CANDIDATES
number of candidates to retrieve from the ANN index
for each query (default: 1024), maximum 1024 when
using GPU indexing
--batch_size BATCH_SIZE
number of query spectra to process simultaneously
(default: 16384)
--num_list NUM_LIST number of partitions in the ANN index (default: 256)
--num_probe NUM_PROBE
number of partitions in the ANN index to inspect
during querying (default: 128), maximum 1024 when
using GPU indexing
--no_gpu don't use the GPU for ANN searching (default: GPU is
used if available)
光谱-光谱匹配查看器
使用ann solo绘图仪从搜索结果中可视化频谱-频谱匹配。绘图仪可以使用ann_solo_plot
直接在命令行上运行,也可以使用python -m ann_solo.plot_ssm
作为命名的python模块(如果您没有足够的权限安装命令行脚本)。
绘图仪需要一个由ann solo生成的mztab标识文件作为命令行参数,并需要查询的标识符来可视化。 请注意,用于执行搜索的光谱库需要位于MZTAB文件中指定的准确位置。
绘图仪将创建一个带有镜像图的PNG文件,以可视化指定的光谱-光谱匹配。
usage: ann_solo_plot [-h] mztab_filename query_id
Visualize spectrum–spectrum matches from your ANN-SoLo identification results
positional arguments:
mztab_filename Identifications in mzTab format
query_id The identifier of the query to visualize
optional arguments:
-h, --help show this help message and exit
联系人
有关详细信息,您可以访问official code website或向wout.bittremieux@uantwerpen.be发送电子邮件。