Python在一个CSV文件中使用嵌套的forloop搜索字符串 - 问答 - Python中文网

Python在一个CSV文件中使用嵌套的forloop搜索字符串

2024-05-19 19:18:07 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我不熟悉stackoverflow和python。在

我在使用Python搜索一个CSV文件中是否有几个字符串（取自输入文件）时遇到了一些问题。在

基本上，我的python代码从输入文件一个接一个地获取字符串(输入文件.csv)，并搜索每个字符串是否位于另一个名为主文件.csv. 它只与主文件.csv，其中包含我要查找的相关数据。在

注意：文件非常大，超过100万行（并且还在增长）主文件.csv通常在30000排左右输入文件.csv. 在

这是密码。在

#!/usr/bin/python
import csv

mainfile = open('mainfile.csv', 'rb')
inputfile = open('inputfile.csv', 'rb')

mfreader = csv.reader(mainfile, delimiter=',') # mainfile reader
ifreader = csv.reader(inputfile) # inputfile reader, just one column, no delimeter

for ifrow in ifreader:
    for mfrow in mfreader:
        if ifrow[0] == mfrow[0]:
            print ifrow[0], mfrow[0] # This line is a print for debugging purpose
            print "Found a match for : %s " % ifrow[0]
            perform_some_operations()
        else:
            print ifrow[0], mfrow[0] # This line is a print for debugging purpose
            continue

mainfile.close()
inputfile.close()

问题： “嵌套for循环”只在inpufile的第一行中执行步骤。它“忽略”了输入文件.csv. 在

EDIT
In fact my comprehension of the problem was wrong. The first for-loop do steps through all the rows of the inputfile. This is the second nested for-loop which only goes once through the iteration process. And since it reaches the end, it doesn't perform any more iteration when the first for-loop iterates.

示例文件

下面是一些文件。在本例中，“行值”被简化。在

基本上，我们处理两个文件：

主文件：包含产品信息列表（序列号、型号、文本信息）
InputFile：包含我试图在主文件中找到的序列号列表

main文件(主文件.csv，文件大小：>；1000000（1M）行）

类型：序列号[varchar（64）]，型号[（varchar（64）]，信息[varchar（2048）]

^{pr2}$

输入文件(输入文件.csv，文件大小：~30000（30K）行）

类型：SerialNumber[varchar（64）]

SN000xyz
SN111xyz
SN222xyz
SN333xyz
SN444ddd

在上面的示例中，由于SN444ddd是在inputfile和mainfile中都能找到的唯一字符串，我的python代码应该返回我（如果我们取消调试行）：

Found a match for SN444ddd

然后我可以做一些手术。在

但事实并非如此。我从调试打印线得到的是：

$ ./myprogram.py
SN000xyz SerialNumber
SN000xyz SN111aaa
SN000xyz SN222bbb
SN000xyz SN333ccc
SN000xyz SN444ddd
SN000xyz SN555eee
$

只处理输入文件的第一行。在

EDIT WRONG. cf. previous edit.

它还与主文件.csv但那个“问题”其实并不重要。在

我哪里弄错了？在

谢谢你的帮助。在

Tags：文件 csv the 字符串 for this reader print

1条回答

网友

1楼 · 发布于 2024-05-19 19:18:07

主要问题似乎是ifreader和{}是{a1}，这意味着一旦他们用完了可用项列表，就不会重新开始。在

第二个问题是你的方法效率很低。我建议使用inputfile.csv中的序列号生成set，而不是在内部循环中一遍又一遍地遍历迭代器。集合不能包含重复的值，它们在检查值是否存在时非常有效。在

你的代码可能是这样的：

#!/usr/bin/python
import csv

def perform_some_operations():
    # ...
    pass

with open('inputfile.csv', 'rb') as inputfile:
    ifreader = csv.reader(inputfile) # inputfile reader, just one column, no delimeter
    serial_numbers = {row[0] for row in ifreader}

with open('mainfile.csv', 'rb') as mainfile:
    mfreader = csv.reader(mainfile, delimiter=',') # mainfile reader

    for row in mfreader:
        if row[0] in serial_numbers:
            print "match for    : %s " % row[0]
            perform_some_operations()
        else:
            print "NO MATCH for : %s " % row[0]

这里我使用了一个集合理解（大括号）来填充来自ifreader的值。之后，使用in运算符可以很容易地检查集合中的特定值。在

注意-不要使用'rb'模式来读取文件，而是应该使用codings模块并在打开文件时指定文件编码。在

^{pr2}$

使用与源数据匹配的encoding参数。在Python3中，open()函数本机支持encoding参数，而在Python2中，该模块可以提供帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章