对于python中的大型数据集，什么是最快的性能元组？

3条回答

网友

1楼 · 编辑于 2024-05-15 20:56:36

有许多因素可能会导致程序运行缓慢。在

当与大字符串一起使用时，String concatenation in python的效率非常低。在

Strings in Python are immutable. This fact frequently sneaks up and bites novice Python programmers on the rump. Immutability confers some advantages and disadvantages. In the plus column, strings can be used as keys in dictionaries and individual copies can be shared among multiple variable bindings. (Python automatically shares one- and two-character strings.) In the minus column, you can't say something like, "change all the 'a's to 'b's" in any given string. Instead, you have to create a new string with the desired properties. This continual copying can lead to significant inefficiencies in Python programs.

考虑到示例中的每个字符串可能包含数千个字符，每次执行串联操作时，python都必须将这个巨大的字符串复制到内存中以创建一个新对象。在

这样会更有效率：

strings = []
strings.append('string')
strings.append('other_string')
...
','.join(strings)

在您的例子中，它应该存储一个列表，而不是每个字典键存储一个大字符串，您只需将每个匹配项附加到列表中，并且只有在末尾使用str.join进行字符串连接。在

另外，printing to stdout is also notoriously slow。如果要在50000项循环的每次迭代中打印到stdout，则每次迭代都会被无缓冲写入stdout所阻碍。考虑只打印每个nth迭代，或者改为写入一个文件（文件写入通常是缓冲的），然后从另一个终端拖尾该文件。在

网友

2楼 · 编辑于 2024-05-15 20:56:36

这个答案是基于OP对我的评论的回答。我问他会用dict做什么，暗示也许他一开始就不需要构建它。@西蒙回答：

i add it to an excel sheet, so I take the KEY, which is the name, and put it in A1, then I take the VALUE, which is 1345,345,135,346,3451,35.. etc etc, and put that into A2. then I do the rest of my programming with that information...... but i need those values seperated by commas and acessible inside that excel sheet like that!

所以看起来dict根本不需要构建。这里有一个替代方法：为每个名称创建一个文件，并将这些文件存储在^{中：

files = {}
name = 'John'  # let's say
if name not in files:
    files[name] = open(name, 'w')

然后，当您在50k行的excel上循环时，您可以执行如下操作（伪代码）：

^{pr2}$

由于您的value_string已经用逗号分隔，所以您的文件将是csv格式的，而不会对您的部分进行任何进一步的调整（除了您可能希望在完成后去掉最后一个逗号）。然后当您需要John的值时，只需value = open('John').read()。在

现在我从未使用过50k row excels，但是如果这不是比您现在的速度快一点，我会非常惊讶的。拥有持久性数据也是一个优点。在

编辑：

以上是一个面向内存的解决方案。写入文件比追加到列表慢得多（但可能比重新创建许多大字符串还要快）。但是如果列表很大（这看起来很可能），并且您遇到内存问题（不是说您会），您可以尝试使用文件方法。在

另一种方法，类似于性能列表（至少对于我尝试过的玩具测试）是使用StringIO：

from io import StringIO  # python 2: import StringIO import StringIO

string_ios = {'John': StringIO()}  # a dict to store StringIO objects
for value in ['ab', 'cd', 'ef']:
    string_ios['John'].write(value + ',')
print(string_ios['John'].getvalue())

这将输出'ab,cd,ef,'

网友

3楼 · 编辑于 2024-05-15 20:56:36

不要构建一个看起来像列表的字符串，而是使用一个实际的列表，并在完成后生成所需的字符串表示形式。在

相关问题更多 >

编程相关推荐

热门问题

热门文章