一种用于鸟枪蛋白质组学实验的蛋白质综述方法
diffacto的Python项目详细描述
要求
AnacondaPython3.5+
所需套餐:
- 纽比1.10+
- scipy 0.17+
- 熊猫0.18+
- 网络x 1.10+
- SCIKIT学习0.17+
- pyteomics3.3+
通过pip
安装pip install numpy scipy pandas networkx scikit-learn pyteomics
通过conda
安装conda env create -f environment.yml source activate diffacto_35
用法
run_diffacto.py [-h] -i I [-db [DB]] [-samples [SAMPLES]] [-log2 LOG2] [-normalize {average,median,GMM,None}] [-farms_mu FARMS_MU] [-farms_alpha FARMS_ALPHA] [-reference REFERENCE] [-min_samples MIN_SAMPLES] [-use_unique USE_UNIQUE] [-impute_threshold IMPUTE_THRESHOLD] [-cutoff_weight CUTOFF_WEIGHT] [-fast FAST] [-out OUT] [-mc_out MC_OUT] optional arguments: -h, --help show this help message and exit -i I Peptides abundances in CSV format. The first row should contain names for all samples. The first column should contain unique peptide sequences. Missing values should be empty instead of zeros. (default: None) -db [DB] Protein database in FASTA format. If None, the peptide file must have protein ID(s) in the second column. (default: None) -samples [SAMPLES] File of the sample list. One run and its sample group per line, separated by tab. If None, read from peptide file headings, then each run will be summarized as a group. (default: None) -log2 LOG2 Input abundances are in log scale (True) or linear scale (False) (default: False) -normalize {average,median,GMM,None} Method for sample-wise normalization. (default: None) -farms_mu FARMS_MU Hyperparameter mu (default: 0.1) -farms_alpha FARMS_ALPHA Hyperparameter weight of prior probability (default: 0.1) -reference REFERENCE Names of reference sample groups (separated by semicolon) (default: average) -min_samples MIN_SAMPLES Minimum number of samples peptides needed to be quantified in (default: 1) -use_unique USE_UNIQUE Use unique peptides only (default: False) -impute_threshold IMPUTE_THRESHOLD Minimum fraction of missing values in the group. Impute missing values if missing fraction is larger than the threshold. (default: 0.99) -cutoff_weight CUTOFF_WEIGHT Peptides weighted lower than the cutoff will be excluded (default: 0.5) -fast FAST Allow early termination in EM calculation when noise is sufficiently small. (default: False) -out OUT Path to output file (writing in TSV format). -mc_out MC_OUT Path to MCFDR output (writing in TSV format). (default: None)
示例
- 以对数刻度记录肽丰度。将肽映射到蛋白质数据库human.fa,使用gmm(高斯混合模型)对每个样本进行归一化,读取samplelables.txt文件中的样本组,并将蛋白质量化结果输出到protein.txt文件。肽丰度将通过比较所有样品的平均丰度来衡量。
python run_diffacto.py -i peptides.csv -log2 True -db HUMAN.fa -normalize GMM -samples sampleLables.txt -out protein.txt
- 以线性标度记录肽丰度,使用中位数丰度进行每个样本的标准化,读取samplelables.txt文件中的样本组,并将蛋白质定量结果输出到protein.txt文件。肽丰度将通过比较样本列表中标记为样本1和样本3的样本的平均丰度来衡量。使用蛋白质特有的肽,至少在20个样品中定量。对于给定的样本组,如果缺失值占结果的70%以上,则在最小非缺失丰度的一半处估算缺失值。应用序贯蒙特卡罗置换试验和估计差异表达蛋白的mcfdr。
python run_diffacto.py -i peptides.csv -out protein.txt -normalize median -samples sampleLables.txt -ref Sample1;Sample3 -use_unique True -min_samples 20 -impute_threshold 0.7 -mc_out protein.MCFDR.txt