按日期和条件汇总列不匹配的结果

2024-06-16 10:12:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个有两个栏目的熊猫:日期和情感。 我需要按日期和情绪类型(积极、中性、消极)的数量对其进行分组

原始数据帧:

Original df

在我的代码之后,总金额/天与不同情绪的总和不匹配:

df_diario = df_com_sentiment.groupby( df_com_sentiment.date.dt.floor('d')).size().reset_index(name='n_tweets')
df_diario['TB_POSITIVE'] = df_com_sentiment.groupby( df_com_sentiment[df_com_sentiment['TextBlob_sentiment_type']=='POSITIVE'].date.dt.floor('d')).size().reset_index(name='TB_POSITIVE').TB_POSITIVE.astype(int)
df_diario['TB_NEGATIVE'] = df_com_sentiment.groupby( df_com_sentiment[df_com_sentiment['TextBlob_sentiment_type']=='NEGATIVE'].date.dt.floor('d')).size().reset_index(name='TB_NEGATIVE').TB_NEGATIVE.astype(int)
df_diario['TB_NEUTRAL'] = df_com_sentiment.groupby( df_com_sentiment[df_com_sentiment['TextBlob_sentiment_type']=='NEUTRAL'].date.dt.floor('d')).size().reset_index(name='TB_NEUTRAL').TB_NEUTRAL.astype(int)

按日期列列出的情绪类型数量

Number of sentiment types by day columns

如果你看看日期2020-02-15,总数=12,但正+负+中性的总和=14


Tags: namecomdfsizedateindexdttb
1条回答
网友
1楼 · 发布于 2024-06-16 10:12:21

你在找这样的东西吗:

import pandas as pd
import numpy as np
import datetime as dt

df = pd.DataFrame({'date':pd.date_range(start='2021-01-01', end=dt.datetime.today(),freq='3h'),
                   'sentiment':np.random.choice(['POSITIVE','NEGATIVE','NEUTRAL'],104)})

df1 = df.groupby([df.date.dt.date,df.sentiment])['sentiment'].count()

df1 = df1.unstack()
print (df1)

其输出将为:

sentiment   NEGATIVE  NEUTRAL  POSITIVE
date                                   
2021-01-01       4.0      3.0       1.0
2021-01-02       2.0      2.0       4.0
2021-01-03       4.0      3.0       1.0
2021-01-04       3.0      2.0       3.0
2021-01-05       4.0      1.0       3.0
2021-01-06       1.0      3.0       4.0
2021-01-07       3.0      3.0       2.0
2021-01-08       4.0      3.0       1.0
2021-01-09       4.0      1.0       3.0
2021-01-10       2.0      2.0       4.0
2021-01-11       5.0      3.0       NaN
2021-01-12       3.0      2.0       3.0
2021-01-13       1.0      3.0       4.0

此的输入数据帧为:

                   date sentiment
0   2021-01-01 00:00:00   NEUTRAL
1   2021-01-01 03:00:00  NEGATIVE
2   2021-01-01 06:00:00  NEGATIVE
3   2021-01-01 09:00:00  NEGATIVE
4   2021-01-01 12:00:00  NEGATIVE
5   2021-01-01 15:00:00   NEUTRAL
6   2021-01-01 18:00:00   NEUTRAL
7   2021-01-01 21:00:00  POSITIVE
8   2021-01-02 00:00:00  POSITIVE
9   2021-01-02 03:00:00  POSITIVE
10  2021-01-02 06:00:00  POSITIVE
11  2021-01-02 09:00:00   NEUTRAL
12  2021-01-02 12:00:00  NEGATIVE
13  2021-01-02 15:00:00  POSITIVE
14  2021-01-02 18:00:00  NEGATIVE
15  2021-01-02 21:00:00   NEUTRAL
16  2021-01-03 00:00:00   NEUTRAL
17  2021-01-03 03:00:00  NEGATIVE
18  2021-01-03 06:00:00  NEGATIVE
19  2021-01-03 09:00:00   NEUTRAL
20  2021-01-03 12:00:00  POSITIVE
21  2021-01-03 15:00:00  NEGATIVE
22  2021-01-03 18:00:00  NEGATIVE
23  2021-01-03 21:00:00   NEUTRAL
24  2021-01-04 00:00:00  NEGATIVE
25  2021-01-04 03:00:00  POSITIVE
26  2021-01-04 06:00:00  NEGATIVE
27  2021-01-04 09:00:00  POSITIVE
28  2021-01-04 12:00:00   NEUTRAL
29  2021-01-04 15:00:00   NEUTRAL
30  2021-01-04 18:00:00  NEGATIVE
31  2021-01-04 21:00:00  POSITIVE
32  2021-01-05 00:00:00  NEGATIVE
33  2021-01-05 03:00:00  NEGATIVE
34  2021-01-05 06:00:00  NEGATIVE
35  2021-01-05 09:00:00   NEUTRAL
36  2021-01-05 12:00:00  POSITIVE
37  2021-01-05 15:00:00  POSITIVE
38  2021-01-05 18:00:00  NEGATIVE
39  2021-01-05 21:00:00  POSITIVE
40  2021-01-06 00:00:00  POSITIVE
41  2021-01-06 03:00:00  POSITIVE
42  2021-01-06 06:00:00   NEUTRAL
43  2021-01-06 09:00:00  POSITIVE
44  2021-01-06 12:00:00   NEUTRAL
45  2021-01-06 15:00:00   NEUTRAL
46  2021-01-06 18:00:00  NEGATIVE
47  2021-01-06 21:00:00  POSITIVE
48  2021-01-07 00:00:00  POSITIVE
49  2021-01-07 03:00:00   NEUTRAL
50  2021-01-07 06:00:00  NEGATIVE
51  2021-01-07 09:00:00  NEGATIVE
52  2021-01-07 12:00:00  NEGATIVE
53  2021-01-07 15:00:00   NEUTRAL
54  2021-01-07 18:00:00  POSITIVE
55  2021-01-07 21:00:00   NEUTRAL
56  2021-01-08 00:00:00  NEGATIVE
57  2021-01-08 03:00:00  NEGATIVE
58  2021-01-08 06:00:00   NEUTRAL
59  2021-01-08 09:00:00   NEUTRAL
60  2021-01-08 12:00:00  POSITIVE
61  2021-01-08 15:00:00  NEGATIVE
62  2021-01-08 18:00:00   NEUTRAL
63  2021-01-08 21:00:00  NEGATIVE
64  2021-01-09 00:00:00  NEGATIVE
65  2021-01-09 03:00:00  POSITIVE
66  2021-01-09 06:00:00  NEGATIVE
67  2021-01-09 09:00:00  POSITIVE
68  2021-01-09 12:00:00  NEGATIVE
69  2021-01-09 15:00:00  NEGATIVE
70  2021-01-09 18:00:00   NEUTRAL
71  2021-01-09 21:00:00  POSITIVE
72  2021-01-10 00:00:00   NEUTRAL
73  2021-01-10 03:00:00  POSITIVE
74  2021-01-10 06:00:00  POSITIVE
75  2021-01-10 09:00:00  NEGATIVE
76  2021-01-10 12:00:00  POSITIVE
77  2021-01-10 15:00:00  NEGATIVE
78  2021-01-10 18:00:00   NEUTRAL
79  2021-01-10 21:00:00  POSITIVE
80  2021-01-11 00:00:00  NEGATIVE
81  2021-01-11 03:00:00  NEGATIVE
82  2021-01-11 06:00:00   NEUTRAL
83  2021-01-11 09:00:00   NEUTRAL
84  2021-01-11 12:00:00   NEUTRAL
85  2021-01-11 15:00:00  NEGATIVE
86  2021-01-11 18:00:00  NEGATIVE
87  2021-01-11 21:00:00  NEGATIVE
88  2021-01-12 00:00:00  NEGATIVE
89  2021-01-12 03:00:00   NEUTRAL
90  2021-01-12 06:00:00  NEGATIVE
91  2021-01-12 09:00:00  POSITIVE
92  2021-01-12 12:00:00  POSITIVE
93  2021-01-12 15:00:00  NEGATIVE
94  2021-01-12 18:00:00  POSITIVE
95  2021-01-12 21:00:00   NEUTRAL
96  2021-01-13 00:00:00   NEUTRAL
97  2021-01-13 03:00:00  POSITIVE
98  2021-01-13 06:00:00  POSITIVE
99  2021-01-13 09:00:00  NEGATIVE
100 2021-01-13 12:00:00   NEUTRAL
101 2021-01-13 15:00:00  POSITIVE
102 2021-01-13 18:00:00   NEUTRAL
103 2021-01-13 21:00:00  POSITIVE

相关问题 更多 >