python替换文本中的重复信息

2024-04-26 23:27:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在编写一个程序,将文本文件中的每个单词放入xlsxwriter。这意味着我必须分道扬镳

我的问题是,在一行中的第一个不同元素之前,我必须除去重复的信息。我想不出怎么解决这个问题的办法

文本示例

Dave likes fresh green apples 
Dave likes fresh green peppers 
Dave hates fresh green apples 
Dave hates rotten green apples 
Jane likes fresh green apples

xlsxwriter的期望结果

    C1    C2    C3    C4    C5 
R1 Dave likes fresh  green apples 
R2  X     X     X      X   peppers 
R3  X   hates fresh  green apples 
R4  X     X   rotten green apples 
R5 Jane likes fresh  green apples

谢谢


Tags: 程序信息元素green单词likes文本文件rotten
1条回答
网友
1楼 · 发布于 2024-04-26 23:27:08

接受挑战

这样怎么样:

test.txt

Dave likes fresh green apples 
Dave likes fresh green peppers 
Dave hates fresh green apples 
Dave hates rotten green apples 
Jane likes fresh green apples
Dave likes fresh green watermelon
Jane likes fresh green peppers 

这是我的第一个想法(让它发挥作用,并从我原来的帖子中记录下来)

def read_lines_with_duplicate_replace_v1(path,replace_char="X"):
    """Generator that read the lines in the file contained in path
       and for each line that start as some previous line replace each
       part that is similar with replace_char. Yield a list with the result"""
    #assume that each line has the same number of elements
    record=dict()
    with open(path) as file:
        for line in file:
            result  = line.split()
            temp = tuple(result)
            if temp[0] in record:
                key = result[0]
                result[0] = replace_char
                for i in range(1,len(result)):
                    if result[i] == record[key][i-1]:
                        result[i] = replace_char
                    else:
                        break
            record[temp[0]] = temp[1:]
            yield result

这是第二个想法,只记得前一行

def read_lines_with_duplicate_replace_v2(path,replace_char="X"):
    """Generator that read the lines in the file contained in path
       and for each line that start as the previous line replace each
       part that is similar with replace_char. Yield a list with the result """
    #assume that each line has the same number of elements
    num_elem = 0
    previous_line = list()
    with open(path) as file:
        for line in file:
            result = line.split()
            if previous_line:
                for i in range(num_elem):
                    if result[i] == previous_line[i]:
                        result[i] = replace_char
                    else:
                        break
                previous_line[i:] = result[i:]
            else:
                previous_line.extend(result)
                num_elem = len(previous_line)
            yield result

输出:

>>> for x in read_lines_with_duplicate_replace_v1("test.txt"):
        print(*x)


Dave likes fresh green apples
X X X X peppers
X hates fresh green apples
X X rotten green apples
Jane likes fresh green apples
X likes fresh green watermelon
X X X X peppers
>>>
>>>
>>> for x in read_lines_with_duplicate_replace_v2("test.txt"):
        print(*x)


Dave likes fresh green apples
X X X X peppers
X hates fresh green apples
X X rotten green apples
Jane likes fresh green apples
Dave likes fresh green watermelon
Jane likes fresh green peppers
>>> 

相关问题 更多 >