从Apply to New Columns拆分元组列表的列

2024-04-18 23:59:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,看起来像这样:

df = pd.DataFrame(
    {'tod':    [[('a',10),('b',6),('h',3),('p',2)], 
                [('x',11),('e',2),('l',2)], 
                [('r',5),('l',5)], 
                [('n',15)]]})

                                 tod
0  [(a, 10), (b, 6), (h, 3), (p, 2)]
1          [(x, 11), (e, 2), (l, 2)]
2                   [(r, 5), (l, 5)]
3                          [(n, 15)]

我想将元组列表扩展到新列以获得:

^{pr2}$

如果元组不存在,我希望在相应的列中缺少值。在

我遇到了麻烦,因为每个单元格中每个列表的长度(元组的数目)对于每一行都是不同的,所以我希望在新列出现时动态地分配它们。每个单元格还包含一个元组对列表,而不是一个元组。在

我尝试过类似于this问题的方法,但这只允许将一个元组扩展到多个列(当您事先知道这些列时)。在

然后我看了thisthis,但是列的数量是未知的,所以我得到了:

pd.DataFrame.from_records([{k: v for v, k in row} for row in df.tod])
Out[171]: 
    2    3    5    6    10   11   15
0    p    h  NaN    b    a  NaN  NaN
1    l  NaN  NaN  NaN  NaN    x  NaN
2  NaN  NaN    l  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN  NaN    n

然后,我研究了拆分包含元组here和{a5}的单元格,它们着眼于将元组转换为一个序列,但同样,这不起作用,因为这些示例只处理已知长度的单个元组,而不是元组的列表

我如何解决这个问题?在

注意:我意识到我并没有为“你尝试了什么”编写太多代码—我的控制台是一堆垃圾,不断产生错误。为了保持干净,我把这个忘了。在


Tags: 数据indataframedf列表for动态nan
2条回答
n = max([len(row) for row in df.tod])
f = lambda l: sum(l, ()) + ('-', np.nan)*(n-len(l))
l = [list(f(row)) for row in df.tod]

ndf = pd.DataFrame(l,columns='l1   n1  l2    n2 l3    n3  l4    n4'.split())
#  l1  n1 l2   n2 l3   n3 l4   n4
#0  a  10  b  6.0  h  3.0  p  2.0
#1  x  11  e  2.0  l  2.0  -  NaN
#2  r   5  l  5.0  -  NaN  -  NaN
#3  n  15  -  NaN  -  NaN  -  NaN

df.join(ndf)
#
#                                 tod l1  n1 l2   n2 l3   n3 l4   n4
#0  [(a, 10), (b, 6), (h, 3), (p, 2)]  a  10  b  6.0  h  3.0  p  2.0
#1          [(x, 11), (e, 2), (l, 2)]  x  11  e  2.0  l  2.0  -  NaN
#2                   [(r, 5), (l, 5)]  r   5  l  5.0  -  NaN  -  NaN
#3                          [(n, 15)]  n  15  -  NaN  -  NaN  -  NaN

您可以展开元组,然后按生成器创建列名称,最后一个^{}到原始数据帧:

#https://stackoverflow.com/a/45122198/2901002
def mygen(lst):
    for item in lst:
        yield 'l{}'.format(item)
        yield 'n{}'.format(item)

df1 = pd.DataFrame([[b for a in row for b in a] for row in df.tod])
df1.columns = list(mygen(range(1, len(df1.columns) // 2 + 1)))
print(df1)
  l1  n1    l2   n2    l3   n3    l4   n4
0  a  10     b  6.0     h  3.0     p  2.0
1  x  11     e  2.0     l  2.0  None  NaN
2  r   5     l  5.0  None  NaN  None  NaN
3  n  15  None  NaN  None  NaN  None  NaN

df = df.join(df1)
print (df)
                                 tod l1  n1    l2   n2    l3   n3    l4   n4
0  [(a, 10), (b, 6), (h, 3), (p, 2)]  a  10     b  6.0     h  3.0     p  2.0
1          [(x, 11), (e, 2), (l, 2)]  x  11     e  2.0     l  2.0  None  NaN
2                   [(r, 5), (l, 5)]  r   5     l  5.0  None  NaN  None  NaN
3                          [(n, 15)]  n  15  None  NaN  None  NaN  None  NaN

相关问题 更多 >