如何计算所有列中的唯一值，并在单独的数据框中显示它们的唯一名称？

| 1st Most Common Value | 2nd Most Common Value | 3rd Most Common Value | 4th Most Common Value | 5th Most Common Value | |-----------------------|-----------------------|-----------------------|-----------------------|-----------------------| | Grocery Store | Pub | Coffee Shop | Clothing Store | Park | | Pub | Grocery Store | Clothing Store | Park | Coffee Shop | | Hotel | Theatre | Bookstore | Plaza | Park | | Supermarket | Coffee Shop | Pub | Park | Cafe | | Pub | Supermarket | Coffee Shop | Cafe | Park |

| Venues | Count | |----------------|-------| | Bookstore | 1 | | Cafe | 2 | | Coffee Shop | 4 | | Clothing Store | 2 | | Grocery Store | 2 | | Hotel | 1 | | Park | 5 | | Plaza | 1 | | Pub | 4 | | Supermarket | 2 | | Theatre | 1 |

3条回答

网友

1楼 · 编辑于 2024-06-02 07:25:32

编辑：我在最初的回答中超越了自己（也感谢OP添加编辑/预期输出）。你想要this post，我认为最简单的答案是：

new_df = pd.DataFrame(df0.stack().value_counts())

如果您不关心值来自哪个列，而只需要它们的计数，那么在this post之后使用value_counts()（正如@Celius Stingher在评论中所说的）

如果确实要报告每列的每个值的频率，可以对每列使用value_counts()，但最终可能会出现不均匀的条目（要返回到DataFrame，可以执行某种join）

相反，我创建了一个小函数来计算df中出现的值，并返回一个新值：

import pandas as pd
import numpy as np

def counted_entries(df, array):
    output = pd.DataFrame(columns=df.columns, index=array)
    for i in array:
        output.loc[i] = (df==i).sum()
    return output

这适用于填充了随机动物值名称的df。您只需通过获取其值的set来传递df中的唯一条目：

columns = ['Column ' + str(i+1) for i in range(10)]
index = ['Row ' + str(i+1) for i in range(5)]

df = pd.DataFrame(np.random.choice(['pig','cow','sheep','horse','dog'],size=(5,10)), columns=columns, index=index)

unique_vals = list(set(df.stack())) #this is all the possible entries in the df

df2 = counted_entries(df, unique_vals)

df之前：

      Column 1 Column 2 Column 3 Column 4  ... Column 7 Column 8 Column 9 Column 10
Row 1      pig      pig      cow      cow  ...      cow      pig      dog       pig
Row 2    sheep      cow      pig    sheep  ...      dog      pig      pig       cow
Row 3      cow      cow      cow    sheep  ...    horse      dog    sheep     sheep
Row 4    sheep      cow    sheep      cow  ...      cow    horse      pig       pig
Row 5      dog      pig    sheep    sheep  ...    sheep    sheep    horse     horse

counted_entries()的输出

       Column 1  Column 2  Column 3  ...  Column 8  Column 9  Column 10
pig           1         2         1  ...         2         2          2
horse         0         0         0  ...         1         1          1
sheep         2         0         2  ...         1         1          1
dog           1         0         0  ...         1         1          0
cow           1         3         2  ...         0         0          1

网友

2楼 · 编辑于 2024-06-02 07:25:32

感谢您的编辑，也许这就是您想要的，使用value_counts作为完整的数据帧，然后聚合输出：

df0 = pd.DataFrame({'1st':['Grocery','Pub','Hotel','Supermarket','Pub'],
                    '2nd':['Pub','Grocery','Theatre','Coffee','Supermarket'],
                    '3rd':['Coffee','Clothing','Supermarket','Pub','Coffee'],
                    '4th':['Clothing','Park','Plaza','Park','Cafe'],
                    '5th':['Park','Coffee','Park','Cafe','Park']})

df1 = df0.apply(pd.Series.value_counts)
df1['Count'] = df1.sum(axis=1)
df1 = df1.reset_index().rename(columns={'index':'Venues'}).drop(columns=list(df0))
print(df1)

输出：

        Venues  Count
5         Park    5.0
2       Coffee    4.0
7          Pub    4.0
8  Supermarket    3.0
0         Cafe    2.0
1     Clothing    2.0
3      Grocery    2.0
4        Hotel    1.0
6        Plaza    1.0
9      Theatre    1.0

网友

3楼 · 编辑于 2024-06-02 07:25:32

您也可以这样做：

df = pd.read_csv('test.csv', sep=',')
list_of_list = df.values.tolist()
t_list = sum(list_of_list, [])
df = pd.DataFrame(t_list)
df.columns = ['Columns']
df = df.groupby(by=['Columns'], as_index=False).size().to_frame().reset_index().rename(columns={0: 'Count'})
print(df)

           Columns  Count
0        Bookstore      1
1             Cafe      2
2   Clothing Store      2
3      Coffee Shop      4
4    Grocery Store      2
5            Hotel      1
6             Park      5
7            Plaza      1
8              Pub      4
9      Supermarket      2
10         Theatre      1

相关问题更多 >

编程相关推荐

热门问题

热门文章