在DataFram上运行GroupBy时,在两个索引之间包含缺少的ColumnValue

2024-06-17 13:28:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,它是按二进制、流和状态行分组的,并且创建了一个计数列。我试图让status列在两行业务切片之间的行数上保持一致(下面是商品和股票衍生品)

例如:如果您注意到商品切片中有一个名为 股票衍生品中不存在的“优先分析”

另一个例子:在大宗商品中不存在的股票衍生品中有一部分“优先发展”。是否仍然可以通过编程方式创建缺失状态并分配计数0或NaN

lineOfBusiness            stream            status                             count
    Commodities               BOW/Project       Closed                             2   
                                                In Analysis                        4   
                                                In Solution                        3   
                                                Open                              28   
                                                Prioritized for Analysis           1   
                                                Tech Execution                     7  



Equity Derivatives        BOW/Project       In Analysis                        2   
                                            In Solution                        1   
                                            Open                               4   
                                            Prioritized for Development        1   
                                            Tech Execution                     1   

Tags: inprojectfor状态status切片analysisopen
1条回答
网友
1楼 · 发布于 2024-06-17 13:28:29

使用unstack将为要取消堆栈的索引级别中的每个唯一值生成一列。列数将大于或等于索引级别中唯一值的数目。除非用fill_value参数另行指定,否则一个或多个其他级别不存在的任何级别值都将用np.nan填充

使用stack将通过将列级别附加到索引级别来重塑列级别。为了节省空间,默认情况下,stack会删除np.nan行,除非使用dropna=False参数指定

^{}
^{}

df.unstack('status').stack('status', dropna=False)
# equivalent code if `status` is in last level
# df.unstack().stack(dropna=False)

                                                            count
lineOfBusiness     stream      status                            
Commodities        BOW/Project Closed                         2.0
                               In Analysis                    4.0
                               In Solution                    3.0
                               Open                          28.0
                               Prioritized for Analysis       1.0
                               Prioritized for Development    NaN
                               Tech Execution                 7.0
Equity Derivatives BOW/Project Closed                         NaN
                               In Analysis                    2.0
                               In Solution                    1.0
                               Open                           4.0
                               Prioritized for Analysis       NaN
                               Prioritized for Development    1.0
                               Tech Execution                 1.0

df.unstack('status', fill_value=0).stack('status')
# equivalent code if `status` is in last level
# df.unstack(fill_value=0).stack()
                                                        count
lineOfBusiness     stream      status                            
Commodities        BOW/Project Closed                           2
                               In Analysis                      4
                               In Solution                      3
                               Open                            28
                               Prioritized for Analysis         1
                               Prioritized for Development      0
                               Tech Execution                   7
Equity Derivatives BOW/Project Closed                           0
                               In Analysis                      2
                               In Solution                      1
                               Open                             4
                               Prioritized for Analysis         0
                               Prioritized for Development      1
                               Tech Execution                   1

设置代码
让其他人更容易尝试

import pandas as pd
from io import StringIO

txt = """lineOfBusiness            stream            status                             count
    Commodities               BOW/Project       Closed                             2   
    Commodities               BOW/Project       In Analysis                        4   
    Commodities               BOW/Project       In Solution                        3   
    Commodities               BOW/Project       Open                              28   
    Commodities               BOW/Project       Prioritized for Analysis           1   
    Commodities               BOW/Project       Tech Execution                     7  
Equity Derivatives        BOW/Project       In Analysis                        2   
Equity Derivatives        BOW/Project       In Solution                        1   
Equity Derivatives        BOW/Project       Open                               4   
Equity Derivatives        BOW/Project       Prioritized for Development        1   
Equity Derivatives        BOW/Project       Tech Execution                     1   
"""

df = pd.read_clipboard(sep='\s{2,}', engine='python', index_col=[0, 1, 2])

相关问题 更多 >