回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>如果我们有三个示例输入文件:</p>
<p>测试\u 95 \u目标\u 1334 \u化验\u细节3.csv</p>
<pre><code>A,accession,result_id,cpd_number,lot_no,assay_id,alt_assay_id,version_no,result_type,type_desc,operator,result_value,unit_id,unit_value,unit_desc,batch_no,experiment_date,discipine,assay_name,activity_flag
95,PKC,123456,cpd-0123456,1,1334,5678,1,1,IC50,>,26.21,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Enzymatic,PBA,
95,PKC,123456,cpd-0123456,1,1334,4600,1,1,IC50,,17.1,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Enzymatic,PBA,
95,PKC,123456,cpd-1234567,1,1334,2995,1,1,Ki,,30,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Enzymatic,PBA,
95,PKC,123456,cpd-1234567,1,1334,2900,1,1,IC50,,30,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Enzymatic,PBA,
</code></pre>
<p>测试\u 95 \u目标\u 1338 \u化验\u细节3.csv</p>
<pre><code>A,accession,result_id,cpd_number,lot_no,assay_id,alt_assay_id,version_no,result_type,type_desc,operator,result_value,unit_id,unit_value,unit_desc,batch_no,experiment_date,discipine,assay_name,activity_flag
95,PKC,123456,cpd-0123456,1,1338,3999,1,1,IC50,,55,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Biochemical,PBA,
95,PKC,123456,cpd-0123456,1,1338,1985,1,1,IC50,,66,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Biochemical,PBA,
95,PKC,123456,cpd-1234007,1,1338,2995,1,1,Ki,,18,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Biochemical,PBA,
95,PKC,123456,cpd-1239867,1,1338,2900,1,1,IC50,,20,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Biochemical,PBA,
95,PKC,123456,cpd-1234567,1,1338,2900,1,1,IC50,,20,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Biochemical,PBA,
</code></pre>
<p>测试\u 95 \u目标\u 2888 \u化验\u详情3</p>
<pre><code>Test,accession,result_id,cpd_number,lot_no,assay_id,alt_assay_id,version_no,result_type,type_desc,operator,result_value,unit_id,unit_value,unit_desc,batch_no,experiment_date,discipine,assay_name,activity_flag
95,PKC,123456,cpd-0123456,1,2888,3830,1,1,IC50,>,24.49,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Cell,PBA,
95,PKC,123456,cpd-0123456,1,2888,4600,1,1,IC50,,19.6799,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Cell,PBA,
95,PKC,123456,cpd-1234567,1,2888,3830,1,1,IC50,,30,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Cell,PBA,
95,PKC,123456,cpd-5566778,1,2888,3830,1,1,IC50,,30,1,uM,micromolar,67682,1/24/2007 12:00:00AM,Cell,PBA,
</code></pre>
<p>有没有办法使用bash/awk(python也是受欢迎的!)把第18栏(18美元)是“酶”、“生化”和“细胞”的文件分类?我们的目标是在18美元内从生化或酶中选择具有最大数量的独特化合物($4)的文件,并在18美元内从细胞中选择具有最大数量的独特化合物的文件。你知道吗</p>
<p>在本例中,我们将从第18列为“酶促”或“生化”的文件中选择“Test\u 95\u target\u 1338\u assay\u Detail3.csv”。(因为“Test\u 95\u target\u 1338\u assay\u Detail3.csv”在$4中有3个唯一的化合物,而“Test\u 95\u target\u 1334\u assay\u Detail3.csv”只有2个唯一的化合物。3>;2)在本例中,我们将为细胞类别选择“Test\u 95\u target\u 2888\u assay\u Detail3.csv”,因为它是唯一一个。你知道吗</p>
<p>尝试如下:此脚本将查找行数最多的csv文件,并将该文件名作为变量用于以下过程。我有另一个脚本来找到csv文件,它有最多的独特化合物($4)。我把那个脚本放在另一台笔记本电脑里了,明天早上才能拿到。那就贴下面的吧。你知道吗</p>
<pre><code>#!/bin/bash
for A in 95
do
wc -l Test_${A}_target_*_assay_Detail_average.csv > Test_${A}_target.csv
### This will make
#4 Test_95_target_1334_assay_Detail3.csv
#4 Test_95_target_1338_assay_Detail3.csv
#4 Test_95_target_2388_assay_Detail3.csv
#13 Total
head -n -1 Test_${A}_target.csv > Test_${A}_target2.csv # remove the last line "total"
sort -k1 -r -n Test_${A}_target2.csv > Test_${A}_target3.csv # sort the count column
# Only pick the second column in the "wc -l" output
awk -F " " '{print $2}' Test_${A}_target3.csv > Test_${A}_target4.csv # Grasp the $2 file name info
max=$(head -n 1 Test_${A}_target4.csv) # Make the top file name as the variable "max" for the following process
echo $max
rm Test_${A}_target3.csv Test_${A}_target2.csv Test_${A}_target.csv
done
</code></pre>
<p>输出:</p>
<pre><code>echo $max
Test_95_target_1338_assay_Detail3.csv
</code></pre>
<p>但是,我不太明白如何根据18美元的信息对csv文件进行分类。任何一位大师能提供一些意见或解决方案吗?谢谢。你知道吗</p>