一种用于鸟枪蛋白质组学实验的蛋白质综述方法

diffacto的Python项目详细描述


要求

AnacondaPython3.5+

所需套餐:

  • 纽比1.10+
  • scipy 0.17+
  • 熊猫0.18+
  • 网络x 1.10+
  • SCIKIT学习0.17+
  • pyteomics3.3+

通过pip

安装
pip install numpy scipy pandas networkx scikit-learn pyteomics

通过conda

安装
conda env create -f environment.yml
source activate diffacto_35

用法

run_diffacto.py [-h] -i I [-db [DB]] [-samples [SAMPLES]] [-log2 LOG2]
                     [-normalize {average,median,GMM,None}]
                     [-farms_mu FARMS_MU] [-farms_alpha FARMS_ALPHA]
                     [-reference REFERENCE] [-min_samples MIN_SAMPLES]
                     [-use_unique USE_UNIQUE]
                     [-impute_threshold IMPUTE_THRESHOLD]
                     [-cutoff_weight CUTOFF_WEIGHT] [-fast FAST] [-out OUT]
                     [-mc_out MC_OUT]
optional arguments:
-h, --help            show this help message and exit
-i I                  Peptides abundances in CSV format. The first row
                      should contain names for all samples. The first column
                      should contain unique peptide sequences. Missing
                      values should be empty instead of zeros. (default:
                      None)
-db [DB]              Protein database in FASTA format. If None, the peptide
                      file must have protein ID(s) in the second column.
                      (default: None)
-samples [SAMPLES]    File of the sample list. One run and its sample group
                      per line, separated by tab. If None, read from peptide
                      file headings, then each run will be summarized as a
                      group. (default: None)
-log2 LOG2            Input abundances are in log scale (True) or linear
                      scale (False) (default: False)
-normalize {average,median,GMM,None}
                      Method for sample-wise normalization. (default: None)
-farms_mu FARMS_MU    Hyperparameter mu (default: 0.1)
-farms_alpha FARMS_ALPHA
                      Hyperparameter weight of prior probability (default:
                      0.1)
-reference REFERENCE  Names of reference sample groups (separated by
                      semicolon) (default: average)
-min_samples MIN_SAMPLES
                      Minimum number of samples peptides needed to be
                      quantified in (default: 1)
-use_unique USE_UNIQUE
                      Use unique peptides only (default: False)
-impute_threshold IMPUTE_THRESHOLD
                      Minimum fraction of missing values in the group.
                      Impute missing values if missing fraction is larger
                      than the threshold. (default: 0.99)
-cutoff_weight CUTOFF_WEIGHT
                      Peptides weighted lower than the cutoff will be
                      excluded (default: 0.5)
-fast FAST            Allow early termination in EM calculation when noise
                      is sufficiently small. (default: False)
-out OUT              Path to output file (writing in TSV format).
-mc_out MC_OUT        Path to MCFDR output (writing in TSV format).
                      (default: None)

示例

  • 以对数刻度记录肽丰度。将肽映射到蛋白质数据库human.fa,使用gmm(高斯混合模型)对每个样本进行归一化,读取samplelables.txt文件中的样本组,并将蛋白质量化结果输出到protein.txt文件。肽丰度将通过比较所有样品的平均丰度来衡量。
python run_diffacto.py -i peptides.csv -log2 True -db HUMAN.fa -normalize GMM -samples sampleLables.txt -out protein.txt
  • 以线性标度记录肽丰度,使用中位数丰度进行每个样本的标准化,读取samplelables.txt文件中的样本组,并将蛋白质定量结果输出到protein.txt文件。肽丰度将通过比较样本列表中标记为样本1和样本3的样本的平均丰度来衡量。使用蛋白质特有的肽,至少在20个样品中定量。对于给定的样本组,如果缺失值占结果的70%以上,则在最小非缺失丰度的一半处估算缺失值。应用序贯蒙特卡罗置换试验和估计差异表达蛋白的mcfdr。
python run_diffacto.py -i peptides.csv -out protein.txt -normalize median -samples sampleLables.txt -ref Sample1;Sample3  -use_unique True  -min_samples 20  -impute_threshold 0.7 -mc_out protein.MCFDR.txt

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Clojure关键字在内存中的大小是多少?   Java中有固定长度的通用数组对象吗?   PostgreSQL:通过Java更新我的用户表   错误:使用java解析xml   java Json显示列表中对象的名称   java比较JodaTime时区   与JAVA中的API和包的区别?   java的int值在for循环中不改变   谷歌应用引擎中的java RSA   迁移到spring 5后出现java非法字符错误   java Websphere管理控制台不工作   JavaGSON如何始终在json中包含毫秒?   带有空格和双引号的windows Java ProcessBuilder命令参数失败   java错误:重复的zip条目[43.jar:org/apache/http/annotation/NotThreadSafe.class]