Python 列表分割
如果我们在Python中有一个包含多个字符串的list
,并且想要根据某个特定的字符串来创建子列表,我们应该怎么做呢?
比如说:
l = ["data","more data","","data 2","more data 2","danger","","date3","lll"]
p = split_special(l,"")
这样会生成:
p = [["data","more data"],["data 2","more data 2","danger"],["date3","lll"]]
7 个回答
2
这里提到的 reduce
是一个函数,大家可以想象成它的作用是把一系列的数字或者数据通过某种方式合并成一个结果。
def split(iterable, where):
def splitter(acc, item, where=where):
if item == where:
acc.append([])
else:
acc[-1].append(item)
return acc
return reduce(splitter, iterable, [[]])
data = ["data","more data","","data 2","more data 2","danger","","date3","lll"]
print split(data, '')
结果是:
[['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]
5
这里有一个可能的实现方法,使用了itertools这个工具。
>>> l
['data', 'more data', '', 'data 2', 'more data 2', 'danger', '', 'date3', 'lll']
>>> it_l = iter(l)
>>> from itertools import takewhile, dropwhile
>>> [[e] + list(takewhile(lambda e: e != "", it_l)) for e in it_l if e != ""]
[['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]
注意*
这个方法的速度和使用groupby是一样快的。
>>> stmt_dsm = """
[list(group) for k, group in groupby(l, lambda x: x == "") if not k]
"""
>>> stmt_ab = """
it_l = iter(l)
[[e] + list(takewhile(lambda e: e != "", it_l)) for e in it_l if e != ""]
"""
>>> t_ab = timeit.Timer(stmt = stmt_ab, setup = "from __main__ import l, dropwhile, takewhile")
>>> t_dsm = timeit.Timer(stmt = stmt_dsm, setup = "from __main__ import l, groupby")
>>> t_ab.timeit(100000)
1.6863486541265047
>>> t_dsm.timeit(100000)
1.5298066765462863
>>> t_ab.timeit(100000)
1.735611326163962
>>>
42
itertools.groupby 是一种方法(通常也是这样):
>>> l = ["data","more data","","data 2","more data 2","danger","","date3","lll"]
>>> from itertools import groupby
>>> groupby(l, lambda x: x == "")
<itertools.groupby object at 0x9ce06bc>
>>> [list(group) for k, group in groupby(l, lambda x: x == "") if not k]
[['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]
在这个特定的情况下,我们甚至可以稍微“作弊”一下:
>>> [list(group) for k, group in groupby(l, bool) if k]
[['data', 'more data'], ['data 2', 'more data 2', 'danger'], ['date3', 'lll']]