来自NCI-CPTAC梦想蛋白质基因组挑战的一个包
proteo-estimator的Python项目详细描述
蛋白质估计器
概述
我们提出了第一个数据科学竞赛,旨在预测蛋白质水平从拷贝数和转录水平,以及磷酸化水平从蛋白质水平。获胜模型的性能优于标准的基线机器学习方法,并且简单地使用转录水平作为新患者样本的蛋白质水平预测性能的代理。 一项深入的分析揭示了通常的预测基因和重要性之间的联系。我们向社区提供所有提交的模型以供重复使用,并提供一个web应用程序来探索这一挑战的结果,以支持改进肿瘤样本的大规模蛋白质基因组特征,并更好地理解信号解除调控。
安装
pipinstallproteo_estimator
需要Python3
用法
importproteo_estimatoraspr# Subchallenge 2: predicting protein levels from copy number and transcript levelsprediction_file_protein=pr.predict_protein_abundances(tumor,rna,cna,output_dir,logging=True)# Subchallenge 3: predicting phospho levels from protein abundance and transcript levelsprediction_file_phospho=pr.predict_phospho(tumor,rna,protein,output_dir,logging=True)
预测蛋白质丰度
参数
Parameter | Default | Type | Description |
---|---|---|---|
tumor | str | Tumor type, options are 'breast' and 'ovarian' | |
rna | str | Absolute file path for rna table. Table must be in TSV format of genes x samples | |
cna | str | Absolute file path for cna table. Table must be in TSV format of genes x samples | |
output_dir | str | Absolute file path for output directory. Prediction table and confidence scores will be saved under this directory as prediction.tsv and confidence.tsv | |
logging | True | bool | Print progress to stdout |
返回值
Output | Type | Description |
---|---|---|
prediction_file | str | Path to tab-separated file of predicted protein levels in the shape of genes x samples. This file will be saved in the directory passed to the parameter "output_dir" as prediction.tsv |
预测磷
参数
Parameter | Default | Type | Description |
---|---|---|---|
tumor | str | Tumor type, options are 'breast' and 'ovarian' | |
rna | str | Absolute file path for rna table. Table must be in TSV format of genes x samples | |
protein | str | Absolute file path for protein abundance table. Table must be in TSV format of genes x samples | |
output_dir | str | Absolute file path for output directory. Prediction table and confidence scores will be saved under this directory as prediction.tsv and confidence.tsv | |
logging | True | bool | Print progress to stdout |
返回值
Output | Type | Description |
---|---|---|
prediction_file | str | Path to tab-separated file of predicted protein levels in the shape of genes x samples. This file will be saved in the directory passed to the parameter "output_dir" as prediction.tsv |
注
请确保您的Docker守护进程在后台运行。 所有文件路径都必须是绝对路径。