Python:比较两个文本文件的字符串并打印上下文
我有两个文本文件:
1) cities.txt
San Francisco
Los Angeles
Seattle
Dallas
2) master.txt
Atlanta is chill and laid-back.
I love Los Angeles.
Coming to Dallas was the right choice.
New York is so busy!
San Francisco is fun.
Moving to Boston soon!
Go to Seattle in the summer.
我想得到一个叫 output.txt 的文件
<main><beg>I love</beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to</beg><key>Dallas</key><end>was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end>is fun</end></main>
<main><beg>Go to</beg><key>Seattle</key><end>in the summer</end></main>
在 cities.txt 里的每一项都是一个 < key>。master.txt 文件要长得多,里面所有没有特定城市的行都要被忽略。它们的顺序也不一定。输出的内容要打印出 cities.txt 中的城市以及它们在 master.txt 中的 < beg> 和 < end> 的上下文(如果有的话)。
这是我现在的代码:
with open(master.txt) as f:
master = f.read()
working = []
with open(cities.txt) as f:
for i in (word.strip() for word in f):
if i in master:
print "<key>", i, "</key>"
我知道怎么检查两个文本文件(在 master.txt 中找到 'city')……但是我卡在了如何在找到城市后打印出 master.txt 中的 < beg> 和 < end> 的上下文!
2 个回答
1
这个也应该可以用,测试的是Python 2.6版本:
cities_dict = {}
with open('master.txt', 'r') as master_in:
with open('cities.txt') as city_in:
for city in city_in:
cities_dict[city.strip()] = '</beg><key>'+city.strip()+'</key><end>'
for line in master_in:
for key,val in cities_dict.iteritems():
if key in line:
line_out= '<main><beg>'+line.replace(key,val).replace('!','.').replace('.','').strip('\n')+'</end></main>'
print line_out
输出结果:
<main><beg>I love </beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to </beg><key>Dallas</key><end> was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end> is fun</end></main>
<main><beg>Go to </beg><key>Seattle</key><end> in the summer</end></main>
1
下面的内容应该能帮助你解决你想要的问题。这段代码在Python2和Python3中都能运行。
#!/usr/bin/python
import os
def parse(line, city):
start = line.find(city)
end = start + len(city)
# Following is a simple implementation. I haven't parsed for spaces
# and punctuations around tags.
return '<main><beg>' + line[:start] + '</beg><key>' + city + '</key><end>' \
+ line[end:] + '</end></main>'
master = [line.strip() for line in open(os.getcwd() + '/master.txt', 'r')]
cities = [line.strip() for line in open(os.getcwd() + '/cities.txt', 'r')]
data = []
for line in master:
for city in cities:
if city in line:
data.append(parse(line, city))
# Following would overwrite output.txt file in the current working directory
with open(os.getcwd() + '/output.txt', 'w') as foo:
for output in data:
foo.write(output + '\n')