如何使用Pandas计算每个日期的“中性”、“阳性”和“阴性”?

2024-05-16 09:23:13 发布

您现在位置:Python中文网/ 问答频道 /正文

如何计算每个日期的“中性”、“积极”和“消极”?我首先对日期进行分组,然后使用“value_counts()”,但并没有得到预期的结果。我应该解决代码的哪一部分

df01['date'] = pd.to_datetime(df01['date'], dayfirst=True)
df01_sentiment = df01.groupby("date")["new_sentiment"].value_counts()

电流输出

date                 new_sentiment
2020-12-01 08:18:29  NEUTRAL          1
2020-12-01 14:53:17  NEUTRAL          1
2020-12-01 17:29:13  NEUTRAL          1
2020-12-02 17:00:01  NEUTRAL          1
2020-12-02 18:09:52  NEUTRAL          1
                                     ..
2020-12-30 22:19:22  NEUTRAL          2
2020-12-30 22:48:58  NEGATIVE         1
2020-12-31 01:00:00  POSITIVE         1
2020-12-31 03:27:44  NEUTRAL          1
2020-12-31 06:38:52  NEUTRAL          1

预期产量

date       new_sentiment
2020-12-01   NEUTRAL          3
2020-12-02   NEUTRAL          2
                              ..
2020-12-30  NEUTRAL           2
2020-12-31  POSITIVE          1
2020-12-31  NEUTRAL           2

Tags: to代码newdatetimedatevaluepd消极
3条回答

在您当前的实现中,您试图根据时间对其进行分组,这就是为什么在您当前的输出中,您得到的计数是基于时间而不是基于日期的

以第一个记录为例:

date                 new_sentiment
2020-12-01 08:18:29  NEUTRAL          1

There is only one entry with new_sentiment = NEUTRAL at time 2020-12-01 08:18:29 that's why count is 1.

要获取预期计数,您可以将日期列中的日期格式从YYYY-MM-DD HH:MM:SS更改为YYYY-MM-DD,然后使用相同的代码获取计数

首先确保列是datetime64

df = pd.DataFrame({
    "date":["2020-12-01 08:18:29", "2020-12-01 14:53:17", "2020-12-01 17:29:13", "2020-12-02 17:00:01", "2020-12-02 18:09:52"],
    "new_sentiment":["NEUTRAL", "NEUTRAL", "NEGATIVE", "NEUTRAL", "POSITIVE"],
    "unit":[1, 2, 1, 1, 1]
})

print(df.dtypes)

date                     object
new_sentiment            object
unit                      int64
dtype: object

列的最新类型

df["date"] = pd.to_datetime(df["date"])

print(df.dtypes)

date             datetime64[ns]
new_sentiment            object
unit                      int64
dtype: object

df

date    new_sentiment   unit
0   2020-12-01  NEUTRAL     1
1   2020-12-01  NEUTRAL     2
2   2020-12-01  NEGATIVE    1
3   2020-12-02  NEUTRAL     1
4   2020-12-02  POSITIVE    1

因此,如果需要根据日期计算new_sentiment

df.groupby("date")["new_sentiment"].value_counts()

date        new_sentiment
2020-12-01  NEUTRAL          2
            NEGATIVE         1
2020-12-02  NEUTRAL          1
            POSITIVE         1
Name: new_sentiment, dtype: int64

另一方面,如果需要计算列unit

df.groupby(["date", "new_sentiment"])["unit"].sum()

date        new_sentiment
2020-12-01  NEGATIVE         1
            NEUTRAL          3
2020-12-02  NEUTRAL          1
            POSITIVE         1
Name: unit, dtype: int64

使用^{}

df01_sentiment = df01.groupby(df01["date"].dt.date)["new_sentiment"].value_counts()

或:

df01['date'] = pd.to_datetime(df01['date'], dayfirst=True).dt.date
df01_sentiment = df01.groupby("date")["new_sentiment"].value_counts()

如果还需要所有日期时间(如果原始DataFrame中不存在):

df01_sentiment = df01.resample('d', on="date")["new_sentiment"].value_counts()

相关问题 更多 >