当行有多个值时比较Python中的两个CSV文件

2条回答

网友

1楼 · 编辑于 2024-05-13 09:38:15

有重复的数字似乎不合逻辑，但如果您想获得每行的匹配数字的计数，而不考虑索引，则将nums设为一个集合，并将每行中的一个数字的次数相加：

from itertools import islice, imap
import csv
with open("in.txt") as f,open("numbers.txt") as nums:
    # make a set of all winning nums
    nums = set(imap(str.rstrip, nums))
    r = csv.reader(f)
    # iterate over each row and sum how many matches we get
    for row in r:
        print("{} matched {}".format(row[0], sum(n in nums
                                                 for n in islice(row, 1, None))))

使用您的输入将输出：

^{pr2}$

假设您的文件是逗号分隔的，并且您的数字文件中每行都有一个数字。在

如果您真的想知道哪些数字（如果有的话）存在，那么您需要迭代该数字并打印我们集合中的每个数字：

from itertools import islice, imap
import csv

with open("in.txt") as f, open("numbers.txt") as nums:
    nums = set(imap(str.rstrip, nums))
    r = csv.reader(f)
    for row in r:
        for n in islice(row, 1, None):
            if n in nums:
                print("{} is in row {}".format(n, row[0]))
        print("")

不过，我也不确定重复的数字是否有意义。在

要根据匹配的数量对行进行分组，可以使用dict将sum用作键并附加第一列值：

from itertools import islice, imap
import csv
from collections import defaultdict
with open("in.txt") as f,open("numbers.txt") as nums:
    # make a set of all winning nums
    nums = set(imap(str.rstrip, nums))
    r = csv.reader(f)
    results = defaultdict(list)
    # iterate over each row and sum how many matches we get
    for row in r:
        results[sum(n in nums for n in islice(row, 1, None))].append(row[0])

结果：

defaultdict(<type 'list'>,
 {0: ['a', 'e', 'g'], 1: ['b', 'd', 'h', 'i'], 
 2: ['c', 'f', 'j']})

键是数字匹配，值是与n个数字匹配的行id。在

网友

2楼 · 编辑于 2024-05-13 09:38:15

如果我没听错，你想找到第一个索引（或所有索引）的数字在条目中获胜。如果你想要，你可以这样做：

with open('winningnumbers.csv', 'rb') as wn:
    reader = csv.reader(wn)
    winningnumbers = list(reader)

with open('Entries#x.csv', 'rb') as en:
    readere = csv.reader(en)
    winning_number_index = -1 # Default value which we will print if nothing is found
    current_index = 0 # Initial index
    for line in readere: # Iterate over entries file
        all_numbers_match = True # Default value that will be set to False if any of the elements doesn't match with winningnumbers
        for i in range(len(line)):
            if line[i] != winningnumbers[i]: # If values of current line and winningnumbers with matching indexes are not equal
                all_numbers_match = False # Our default value is set to False
                break # Exit "for" without finishing

        if all_numbers_match == True: # If our default value is still True (which indicates that all numbers match)
            winning_number_index = current_index # Current index is written to winning_number_index
            break # Exit "for" without finishing
        else: # Not all numbers match
            current_index += 1 

print(winning_number_index)

这将打印条目中第一个中奖号码的索引（如果需要所有索引，请在注释中写下）。在

注意：这不是解决问题的最佳代码。如果您不熟悉Python更高级的特性，那么撤销和调试就更容易了。在

你可能应该考虑不缩短你的变量。entries_reader比readere多花一秒钟的时间写，少花5秒就可以理解。在

这是一种更快、更短、更节省内存的变体，但可能更难理解：

^{pr2}$

可能不清楚的特征可能是enumerate()，any()以及在for中使用else，而不是{}。让我们一个一个地看一遍。在

要理解enumerate的这种用法，您需要理解以下语法：

a, b = [1, 2]

变量a和{}将根据列表中的值分配。在这种情况下，a将是1，b将是2。使用此语法，我们可以执行以下操作：

for a, b in [[1, 2], [2, 3], ['spam', 'eggs']]:
    # do something with a and b

在每个迭代中，a和b分别是1和2、2和3、“垃圾邮件”和“鸡蛋”。在

假设我们有一个列表a = ['spam', 'eggs', 'potatoes']。enumerate()只返回一个这样的“list”：[（1，'spam'），（2，'eggs'），（3，'potatos'）]。所以，当我们这样使用它的时候

for line_index, line in enumerate(readere):
    # Do something with line_index and line

line_index将是1，2，3，e.t.c

any()函数接受一个序列（list，tuple，e.t.c.），如果其中的所有元素都等于True，则返回{}。在

生成器表达式mylist = [line[i] == winningnumbers[i] for i in range(len(line))]返回一个列表，类似于以下内容：

mylist = []
for i in range(len(line)):
    mylist.append(line[i] == winningnumbers[i]) # a == b will return True if a is equal to b

因此，any只有在条目中的所有数字都与中奖号码匹配的情况下才会返回True。在

只有在for没有被break中断时，for部分中的代码才会被调用，因此在我们的情况下，设置一个默认索引返回是很好的。在

相关问题更多 >

编程相关推荐

热门问题

热门文章