如何通过公共名称比较两个文本文件中的值？

-1 投票

3 回答

3121 浏览

数据工程师

提问于 2025-04-18 01:13

我正在尝试写一个Python脚本，用来比较两个文本文件中特定列的内容，这些内容是通过一个共同的ID来关联的。

第一个文本文件包含了道路名称、起始里程和结束里程，格式如下：

名称起始结束

0045 0 45

0190 0 3

0006 0 190

第二个文本文件包含很多列，其中有三列是我感兴趣的。名称会重复很多次，我想把每个名称的每个实例与第一个文本文件中的相应里程进行比较。这些数据在名称或里程上并没有特定的顺序。

名称里程

0045 0.05

0045 1.0

0045 5.3

0006 74.6

0006 32.1

等等

我想检查第二个文本文件中的里程是否大于第一个文本文件中的起始里程，并且小于结束里程。任何在第一个文本文件的起始和结束里程之间的行都应该被写入一个由脚本创建的第三个文本文件。我知道如何写IF语句以及如何读写文本文件，但我在如何匹配名称并比较特定列上遇到了困难。

任何帮助都会非常感激！

条件筛选文件操作数据处理文本比较数据匹配文本分析列比较里程计算

3 个回答

这是一个用来找出两个文本文件中共同数字的程序

from sys import argv

script, filename1, filename2 = argv

txt1 = open(filename1, 'r')
txt2 = open(filename2, 'r')

r = []

for line in txt1.readlines():
    r.append(int(line.rstrip("\n")))

for l in txt2.readlines():
    if int(l.rstrip("\n")) in r:
        print l

打开命令提示符，去到你存放这三个文件的地方，然后输入

python ".py 文件" 文件名1 文件名2

回答于 2025-04-18 由 Python大师

分享举报

你可以把两个文件里的数据存放在字典里，字典的键就是你用的“名字”参数。这样，你就可以用这些名字来获取每个字典里对应的元素。

下面的代码只是一个大概的参考，我没有实际试过，里面几乎肯定有错误：

d1 = {}  # we are going to put all our file 1 data into a dict.
with open("file1")  as f: # open file 1
    for line in f:        # read each line
        key, begin, end = f.split() # this only works if there are ALWAYS three columns.
        d1[key] = (begin, end)     
#  file1 automatically closes after the "with" block

# same for file 2
d2 = {}
with open("file2")  as f:
    for line in f:       
        key, mile = f.split() 
        d2[key] = mile


common_keys = set(d1.keys()) & set(d2.keys())

# Here we are going to ignore all keys that are not in both
# datasets, but you can use other set operations to work with those entries.

# iterate through the common keys and fish the records out of the dictionaries.
for key in common_keys:
    begin, end = d1[key]
    mile = d2[key]

    ... now do any calculation you like with `mile`, `begin` and `end`.

回答于 2025-04-18 由 Python大师

分享举报

你需要做的事情如下。

首先，想法是把第一个文件的内容读进一个叫做 defaultdict 的数据结构里，使用 name 作为键，值则是一个包含 (begin, end) 这对数字的列表。接着，在读取第二个文件时，对于每一行，比较里面的里程数（mile）和我们之前创建的 defaultdict 中对应名字的开始（begin）和结束（end）值。如果这个 mile 的值不在 begin 和 end 之间，就把它写入到 output.txt 文件里：

from collections import defaultdict


data = defaultdict(list)
with open('input1.txt', 'r') as f:
    next(f)  # skip first line
    for line in f:
        line = line.strip()
        if line:
            items = line.split()
            data[items[0]].append(map(float, items[1:]))

with open('input2.txt', 'r') as f:
    with open('output.txt', 'w') as output_file:
        next(f)  # skip first line
        for line in f:
            line = line.strip()
            if line:
                name, mile = line.split()
                mile = float(mile)
                for begin, end in data.get(name, []):
                    if not (begin <= mile <= end):
                        output_file.write(line + '\n')

举个例子：

input1.txt：

Name Begin End

0045 0 45

0190 0 3

0006 0 190

input2.txt（注意有些 0045 和 0006 的值超出了 begin 和 end 的范围）：

这个脚本会生成 output.txt：

0045 1000
0006 3000

希望这能帮到你。

回答于 2025-04-18 由 Python大师

分享举报

如何通过公共名称比较两个文本文件中的值？

3 个回答

这是一个用来找出两个文本文件中共同数字的程序

打开命令提示符，去到你存放这三个文件的地方，然后输入

撰写回答