Python列表，csv，复制rem

f = open("/CSV-sorted.csv") gene_prev = "" hit_list = [] csv_f = csv.reader(f) for lines in csv_f: #time.sleep(0.1) gene = lines[0] sample = lines[11].split(",") repeat = lines[8] for samples in sample: hit_list.append(samples) if gene == gene_prev: for samples in sample: hit_list.append(samples) print gene print hit_list print set(hit_list) print "samples:", len(set(hit_list)) hit_list = [] gene_prev = gene

1条回答

网友

1楼 · 发布于 2024-05-23 20:56:36

删除重复项的标准方法是转换为^{}。你知道吗

不过，我觉得你读文件的方式有些问题。第一个问题：它不是csv文件（前两个字段之间有冒号）。第二个是什么

gene = lines[0]
sample = lines[11].split(",")
repeat = lines[8]

该怎么办？你知道吗

如果我写这篇文章，我会用另一个“，”替换“：”。因此，通过此修改并使用集合字典，您的代码将如下所示：

# Read in csv file and convert to list of list of entries. Use with so that 
# the file is automatically closed when we are done with it
csvlines = []
with open("CSV-sorted.csv") as f:
    for line in f:
        # Use strip() to clean up trailing whitespace, use split() to split
        # on commas.
        a = [entry.strip() for entry in line.split(',')]
        csvlines.append(a)

# I'll print it here so you can see what it looks like:
print(csvlines)



# Next up: converting our list of lists to a dict of sets.

# Create empty dict
sample_dict = {}

# Fill in the dict
for line in csvlines:
    gene = line[0] # gene is first entry
    samples = set(line[1:]) # rest of the entries are samples

    # If this gene is in the dict already then join the two sets of samples
    if gene in sample_dict:
        sample_dict[gene] = sample_dict[gene].union(samples)

    # otherwise just put it in
    else:
        sample_dict[gene] = samples


# Now you can print the dictionary:
print(sample_dict)

输出为：

[['AHCTF1', 'Sample1', 'Sample2', 'Sample4'], ['AHCTF1', 'Sample2', 'Sample7', 'Sample12'], ['AHCTF1', 'Sample5', 'Sample6', 'Sample7']]
{'AHCTF1': {'Sample12', 'Sample1', 'Sample2', 'Sample5', 'Sample4', 'Sample7', 'Sample6'}}

第二行是你的字典。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章