将字符串"aabbcc"拆分为["aa", "bb", "cc"]，不使用re.split

-3 投票

3 回答

1523 浏览

提问于 2025-04-17 12:34

我想要根据标题一次性把一个字符串拆分开。我在找一种简单的写法，想用列表推导式，但还没找到合适的：

s = "123456"

结果应该是这样的：

["12", "34", "56"]

我不想要的结果是：

re.split('(?i)([0-9a-f]{2})', s)
s[0:2], s[2:4], s[4:6]
[s[i*2:i*2+2] for i in len(s) / 2]

编辑：

好的，我想解析一个十六进制的RGB[A]颜色（还有可能是其他颜色或组件格式），以提取出所有的组件。看起来最快的方法是sven-marnach提供的最后一种：

sven-marnach的xrange：每次循环0.883微秒

python -m timeit -s 's="aabbcc";' '[int(s[i:i+2], 16) / 255. for i in xrange(0, len(s), 2)]'

pair/iter：每次循环1.38微秒

python -m timeit -s 's="aabbcc"' '["%c%c" % pair for pair in zip(* 2 * [iter(s)])]'

正则表达式：每次循环2.55微秒

python -m timeit -s 'import re; s="aabbcc"; c=re.compile("(?i)([0-9a-f]{2})"); 
split=re.split' '[int(x, 16) / 255. for x in split(c, s) if x != ""]'

性能比较列表推导式数据处理字符串拆分算法效率 RGB颜色颜色解析组件提取

3 个回答

Numpy 在单次查找时不如你喜欢的解决方案好：

$ python -m timeit -s 'import numpy as np; s="aabbccdd"' 'a = np.fromstring(s.decode("hex"), dtype="uint32"); a.dtype = "uint8"; list(a)'
100000 loops, best of 3: 5.14 usec per loop
$ python -m timeit -s 's="aabbcc";' '[int(s[i:i+2], 16) / 255. for i in xrange(0, len(s), 2)]'
100000 loops, best of 3: 2.41 usec per loop

但是如果你一次要进行多个转换，numpy就快多了：

$ python -m timeit -s 'import numpy as np; s="aabbccdd" * 100' 'a = np.fromstring(s.decode("hex"), dtype="uint32"); a.dtype = "uint8"; a.tolist()'
10000 loops, best of 3: 59.6 usec per loop
$ python -m timeit -s 's="aabbccdd" * 100;' '[int(s[i:i+2], 16) / 255. for i in xrange(0, len(s), 2)]'
1000 loops, best of 3: 240 usec per loop

在我电脑上，当批量处理的数量超过2时，numpy的速度更快。你可以通过设置 a.shape 为 (number_of_colors, 4) 来轻松分组这些值，不过这样会让 tolist 方法慢50%。

实际上，大部分时间都花在把数组转换成列表上。根据你想对结果做什么，你可能可以跳过这个中间步骤，从而获得一些好处：

$ python -m timeit -s 'import numpy as np; s="aabbccdd" * 100' 'a = np.fromstring(s.decode("hex"), dtype="uint32"); a.dtype = "uint8"; a.shape = (100,4)'
100000 loops, best of 3: 6.76 usec per loop

回答于 2025-04-17 由 Python大师

分享举报

In [4]: ["".join(pair) for pair in zip(* 2 * [iter(s)])]
Out[4]: ['aa', 'bb', 'cc']

请查看：在Python中，zip(*[iter(s)]*n)是如何工作的？，里面对那个奇怪的“2-iter在同一个str上”的语法有解释。

你在评论中提到你想要“最快的执行速度”，我不能保证这个实现能做到，但你可以用timeit来测量执行时间。记住唐纳德·克努斯关于过早优化的说法，当然了。对于你现在提到的问题（既然你已经说出来了），我觉得你会发现r, g, b = s[0:2], s[2:4], s[4:6]这个方法很难被超越。

$ python3.2 -m timeit -c '
s = "aabbcc"
["".join(pair) for pair in zip(* 2 * [iter(s)])]
'
100000 loops, best of 3: 4.49 usec per loop

参见。

python3.2 -m timeit -c '
s = "aabbcc"
r, g, b = s[0:2], s[2:4], s[4:6]
'
1000000 loops, best of 3: 1.2 usec per loop

回答于 2025-04-17 由 Python大师

分享举报

通过阅读评论，我们发现实际的问题是：以十六进制格式 RRGGBBAA 解析颜色定义字符串的最快方法是什么。这里有一些选项：

def rgba1(s, unpack=struct.unpack):
    return unpack("BBBB", s.decode("hex"))

def rgba2(s, int=int, xrange=xrange):
    return [int(s[i:i+2], 16) for i in xrange(0, 8, 2)]

def rgba3(s, int=int, xrange=xrange):
    x = int(s, 16)
    return [(x >> i) & 255 for i in xrange(0, 32, 8)]

果然，第一种版本是最快的：

In [6]: timeit rgba1("aabbccdd")
1000000 loops, best of 3: 1.44 us per loop

In [7]: timeit rgba2("aabbccdd")
100000 loops, best of 3: 2.43 us per loop

In [8]: timeit rgba3("aabbccdd")
100000 loops, best of 3: 2.44 us per loop

回答于 2025-04-17 由 Python大师

分享举报

将字符串"aabbcc"拆分为["aa", "bb", "cc"]，不使用re.split

3 个回答

撰写回答