按结果筛选多级分组

Person WL File Threshold AEM 440 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 452 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 464 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 476 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 488 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 AGC 440 AGC-2018-05-25_12_440 0 6 1 6 AGC-2018-05-25_50_440 0 6 1 6 452 AGC-2018-05-25_12_440 0 6 1 6 AGC-2018-05-25_50_440 0 6 1 6 464 AGC-2018-05-25_12_440 0 6 1 6 .. TRW 620 TRW-2017-04-08_60_572 0 6 1 6 632 TRW-2017-04-25_60_584 0 6 1 6 644 TRW-2017-04-08_60_572 0 6 1 6 656 TRW-2017-04-25_60_584 0 5 1 6 TRW-2017-04-25_60_656 0 6 1 6

File Threshold StepSize RevNum WL RevPos BkgdLt Person Date AbRevPos ExpNum EarlyEnd 48 AEM-2018-05-23_11_440 1 1.50 7.0 464 -2.07 11 AEM 2018-05-23 2.07 Two NaN 49 AEM-2018-05-23_11_440 1 0.82 8.0 464 -3.57 11 AEM 2018-05-23 3.57 Two NaN 50 AEM-2018-05-23_11_440 1 1.50 7.0 488 -2.58 11 AEM 2018-05-23 2.58 Two NaN 54 AEM-2018-05-23_11_440 1 0.82 8.0 488 -5.58 11 AEM 2018-05-23 5.58 Two NaN 55 AEM-2018-05-23_11_440 1 1.50 7.0 440 -3.00 11 AEM 2018-05-23 3.00 Two NaN <class 'pandas.core.frame.DataFrame'> Int64Index: 3286 entries, 48 to 7839 Data columns (total 12 columns): File 3286 non-null object Threshold 3286 non-null int64 StepSize 3286 non-null float64 RevNum 3286 non-null float64 WL 3286 non-null int64 RevPos 3286 non-null float64 BkgdLt 3286 non-null int32 Person 3286 non-null object Date 3286 non-null datetime64[ns] AbRevPos 3286 non-null float64 ExpNum 3286 non-null object EarlyEnd 0 non-null float64 dtypes: datetime64[ns](1), float64(5), int32(1), int64(2), object(3) memory usage: 320.9+ KB

Person WL File Threshold AEM 440 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 452 AEM-2018-05-23_11_440 0 6 1 6 AEM-2018-05-23_50_440 0 6 1 6 464 AEM-2018-05-23_11_440 0 6 1 6 Name: RevNum, dtype: int64

2条回答

网友

1楼 · 编辑于 2024-05-13 18:08:40

我为您创建了一个快速、完整且可验证的示例：

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'Letter':['a', 'b']*2, 'Number':[1]*3+[2]})

In [3]: df
Out[3]: 
  Letter  Number
0      a       1
1      b       1
2      a       1
3      b       2

In [4]: df.groupby(['Letter', 'Number'])['Number'].count()
Out[4]: 
Letter  Number
a       1         2
b       1         1
        2         1
Name: Number, dtype: int64

In [5]: grouped_counts = df.groupby(['Letter', 'Number'])['Number'].count()

In [6]: type(grouped_counts)
Out[6]: pandas.core.series.Series

如您所见，计数的最大数目是2，因此让我们筛选出计数小于2的所有组：

^{pr2}$

网友

2楼 · 编辑于 2024-05-13 18:08:40

我想出来了！这是一个超级简单的语法问题，从一个序列到一个数据帧！在

df_new.groupby('Person')['WL'].count() # produces Pandas Series
df_new.groupby('Person')[['WL']].count() # Produces Pandas DataFrame

在：https://shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/

我的代码现在看起来是这样的，我只能返回反转编号（RevNum）不是6的条目。在

^{pr2}$

“RevNum”前后的单括号的简单变化：

df_counts_grouped = df_new.groupby([df_new['Person'], df_new['WL'], df_new['File'], df_new['Threshold']])['RevNum'].count()

要在列标签“RevNum”周围加上双括号：

df_counts_grouped = df_new.groupby([df_new['Person'], df_new['WL'], df_new['File'], df_new['Threshold']])[['RevNum']].count()

修好了一切！在

相关问题更多 >

编程相关推荐

热门问题

热门文章