两个文本文件的交集

2024-06-16 16:57:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个不同的文本文件,例如:

    text1 = Movie1 interact comedi seri featur ...
            Movie2 each week audienc write ...
            Movie3 struggl make success relationship ....

    text2 = Movie2 Action
            Movie3 Drama
            Movie4 Sci-fi

我想要的是

^{pr2}$

text1和text2只是说明性的,它们比它们大。text1包含许多电影的摘要,text2包含更多电影的类型信息。我只想提取10000个交集,只根据电影名称到text3和text4。如果考虑到我对Python不熟悉,那么如何在Python中做到这一点呢。在


Tags: 电影writeinteracteachweek文本文件text1text2
1条回答
网友
1楼 · 发布于 2024-06-16 16:57:57

假设已打开每个文本文件:

def process_file(f):
    return list(filter(lambda l: l.strip(), f.readlines())) # remove blank lines

def get_word(string): # try to get the first word of each line
    try:
        s = string.split(' ')
        return s[0], string
    except:
        return None, string

def insert_line(string, dict):    # insert the line into a dict
    word, line = get_word(string) # with the first word as key
    if word:
        dict[word] = line

lines1 = process_file(file1)
lines2 = process_file(file2)
dict1 = {}
for line in lines1:
    insert_line(line, dict1)
dict2 = {}
for line in lines2:
    insert_line(line, dict2) # build dicts
set1 = set(dict1.keys())     # build sets with keys
set2 = set(dict2.keys())
intersection = set1 & set2   # get set intersection
intersection_lines = []
for key in intersection:     # build list with intersection
    intersection_lines.append(dict1[key])

在这个脚本的末尾,intersection_lines将包含您希望从file1中得到的行。要对file2执行同样的操作,只需将dict1换成{}。这只是因为交集操作已经作为set操作符在set类中实现。请注意,这只在每行的第一个单词是唯一的情况下才有效。在

相关问题 更多 >