如何使用字典有效地替换大量基于csv的数组中的字符串？

with open(self.nounDef["Noun Source File Name"], 'rU') as csvFile: for idx, row in enumerate(csv.reader(csvFile, delimiter=',')): if idx == 0: self.csvHeader = row self.csvFileArray.append(row)

2条回答

网友

1楼 · 编辑于 2024-04-20 13:22:31

下面的工作与上述，并已充分测试。。。你知道吗

  def m_globalSearchAndReplace(self, dataMap):
    replacements = dataMap.m_getMappingDictionary()
    keys = replacements.keys()
    for row in self.csvFileArray: # Loop through each row/list
      for idx, w in enumerate(row): # Loop through each word in the row/list
        for key in keys: # For every key in the dictionary...
          if key != 'NULL' and key != '-' and key != '.' and key != '':
            w = w.replace(key, replacements[key])
        row[idx] = w

简而言之，循环遍历csvFileArray中的每一行并获取每个单词。
然后，对于行中的每个单词，循环使用字典的（称为“replacements”）键来访问和应用每个映射。
然后（假设条件正确）用映射的值（在字典中）替换该值。

注意：虽然有效，但我不认为使用无休止循环是解决问题的最有效方法，我相信一定有更好的方法，使用正则表达式。所以，我将把这个问题留一段时间，看看是否有人能改进答案。

网友

2楼 · 编辑于 2024-04-20 13:22:31

在一个大循环中？您可以将csv文件作为字符串加载，这样您只需查看列表一次，而不是查看每个项目。尽管由于python字符串是不可变的，因此效率并不高，但是您的应用程序仍然面临着同样的问题。你知道吗

根据这个答案Optimizing find and replace over large files in Python（即效率），也许逐行操作会更好，所以如果这真的成为一个问题的话，内存中就不会有巨大的字符串了。你知道吗

编辑：像这样的。。。你知道吗

# open original and new file.
with open(old_file, 'r') as old_f, open(new_file, 'w') as new_f:
    # loop through each line of the original file (old file)
    for old_line in old_f:
        new_line = old_line
        # loop through your dictionary of replacements and make them.
        for r in replacements:
            new_line = new_line.replace(r, replacements[r])
        # write each line to the new file.
        new_f.write(new_line)

无论如何，我会忘记文件是一个csv文件，只是把它当作一个行或字符的大集合。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章