在单独的.txt fi中打印行中的唯一元素

2024-04-24 04:30:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个巨大的输入文件

con1    P1  140 602
con1    P2  140 602
con2    P5  642 732
con3    P8  17  348
con3    P9  17  348

我想在每个con内部进行迭代,删除第[2]行和第[3]行中的重复元素,并将结果打印到一个新的.txt文件中,这样我的输出文件如下所示(注意:对于每个con,我的第二列可能不同)

^{pr2}$

我尝试的脚本(不确定如何完成)

from collections import defaultdict
start = defaultdict(int)
end = defaultdict(int)
o1=open('result1.txt','w')
o2=open('result2.txt','w')
with open('example.txt') as f:
    for line in f:
        line = line.split()
        start[line[2]]
        end[line[3]]
        if start.keys() == 1 and end.keys() ==1:
            o1.writelines(line)
        else:
            o2.write(line)

更新:附加示例

con20   EMT20540    951 1580
con20   EMT14935    975 1655
con20   EMT24081    975 1655
con20   EMT19916    975 1652
con20   EMT23831    975 1655
con20   EMT19915    975 1652
con20   EMT09010    975 1649
con20   EMT29525    975 1655
con20   EMT19914    975 1652
con20   EMT19913    975 1652
con20   EMT23832    975 1652
con20   EMT09009    975 1637
con20   EMT16812    975 1649

预期产量

con20   EMT20540    951 1580
con20   EMT14935    975 1655
con20   EMT19916    975 1652
con20   EMT09010    975 1649
con20   EMT09009    975 1637

Tags: 文件txtlineopenkeysconstartint
3条回答

您可以在此处使用^{}

from itertools import groupby

with open('input.txt') as f1, open('f_out', 'w') as f2:
    #Firstly group the data by the first column
    for k, g in groupby(f1, key=lambda x:x.split()[0]):
        # Now during the iteration over each group, we need to store only
        # those lines that have unique 3rd and 4th column. For that we can
        # use a `set()`, we store all the seen columns in the set as tuples and
        # ignore the repeated columns.   

        seen = set()
        for line in g:
            columns = tuple(line.rsplit(None, 2)[-2:])
            if columns not in seen:
                #The 3rd and 4th column were unique here, so
                # store this as seen column and also write it to the file.
                seen.add(columns)
                f2.write(line.rstrip() + '\n') 
                print line.rstrip()

输出:

^{pr2}$

我说:

f = open('example.txt','r').readlines()
array = []

for line in f:
  array.append(line.rstrip().split())


def func(array, j):
  offset = []
  if j < len(array):
    firstRow = array[j-1]
    for i in range(j, len(array)):
      if (firstRow[3] == array[i][3] and firstRow[2] == array[i][2]
        and firstRow[0] == array[i][0]):
        offset.append(i)

    for item in offset[::-1]:# Q. Why offset[::-1] and not offset?
      del array[item]

    return func(array, j=j+1)

func(array, 1)

for e in array:
  print '%s\t\t%s\t\t%s\t%s' % (e[0],e[1],e[2],e[3])

盒子上写着:

^{pr2}$

您可以简单地执行以下操作:

my_list = list(set(open(file_name, 'r')))

然后把它写到你的另一个文件里

简单示例

^{pr2}$

相关问题 更多 >