每30天迭代csv中的日期列以计算变量的增长率

2024-04-28 05:01:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个CSV文件,其中有一列日期和另一列Twitter关注者的数量。Twitter的粉丝数可能是一个月30天的增长率,但我不想算出一个月的增长率。所以,如果我有

  • 2016-03-10,200名追随者
  • 2016-02-08,195名追随者
  • 2016-01-01,105名追随者

我如何通过迭代生成逐月增长率?我试过用dateutil的rrule和熊猫一起工作,但是遇到了困难。我考虑过使用R来实现这一点,但我更愿意用Python来实现,因为我将把数据从Python输出到一个新的CSV中。在


Tags: 文件csv数据数量twitter粉丝dateutilrrule
3条回答

非常感谢你的回复。我设计了以下代码,实现了我想要的(我没想到能做到这一点,但碰巧在正确的时间找到了正确的函数):

import csv, datetime, string, os
import pandas as pd

df = pd.read_csv('file_name.csv', sep=',')
# This converts our date strings to date_time objects
df['Date'] = pd.to_datetime(df['Date'])
# But we only want the date, so we strip the time part
df['Date'] = df['Date'].dt.date

sep = ' '

# This allows us to iterate through the rows in a pandas dataframe
for index, row in df.iterrows():
    if index == 0:
        start_date = df.iloc[0]['Date']
        Present = df.iloc[0]['Count']
        continue
    # This assigns the date of the row to the variable end_date
    end_date = df.iloc[index]['Date']
    delta = start_date - end_date

    # If the number of days is >= to 30
    if delta >= 30:
        print "Start Date: {}, End Date: {}, delta is {}".format(start_date, end_date, delta)
        Past = df.iloc[index]['Count']
        percent_change = ((Present-Past)/Past)*100

        df.set_value(index, 'MoM', percent_change)
        # Sets a new start date and new TW FW count
        start_date = df.iloc[index]['Date']
        Present = df.iloc[index]['Count']

下面是一个使用defaultdict的方法

import csv
from collections import defaultdict
from datetime import datetime

path = "C:\\Users\\USER\\Desktop\\YOUR_FILE_HERE.csv"
with open(path, "r") as f:
    d = defaultdict(int)
    rows = csv.reader(f)
    for dte, followers in rows:
        dte = datetime.strptime(dte, "%Y-%m-%d")
        d[dte.year, dte.month] += int(followers)
print d

to_date_followers = 0
for (year, month) in sorted(d):
    last_month_and_year = (12, year-1) if month == 1 else (month-1, year)
    old_followers = d.get(last_month_and_year, 0)
    new_followers = d[year, month]
    to_date_followers += new_followers
    print "%d followers gained in %s, %s resulting in a %.2f%% increase from %s (%s followers to date)" % (
        new_followers-old_followers, month, year, new_followers*100.0/to_date_followers, ', '.join(str(x) for x in last_month_and_year), to_date_followers
    )

对于以下输入:

^{pr2}$

它打印:

defaultdict(<type 'int'>, {(2015, 12): 20, (2016, 1): 105, (2016, 3): 400, 

(2017, 3): 200, (2016, 2): 195})
20 followers gained in 12, 2015 resulting in a 100.00% increase from 11, 2015 (20 followers to date)
105 followers gained in 1, 2016 resulting in a 84.00% increase from 12, 2015 (125 followers to date)
195 followers gained in 2, 2016 resulting in a 60.94% increase from 1, 2016 (320 followers to date)
400 followers gained in 3, 2016 resulting in a 55.56% increase from 2, 2016 (720 followers to date)
200 followers gained in 3, 2017 resulting in a 21.74% increase from 2, 2017 (920 followers to date)

我和我的团队使用下面的函数来解决这个难题。 代码如下:

def compute_mom(data_list):
   list_tuple = zip(data_list[1:],data_list)
   raw_mom_growth_rate = [((float(nxt) - float(prev))/float(prev))*100 for nxt, prev in list_tuple]
   return [round(mom, 2) for mom in raw_mom_growth_rate]

希望这有帮助。。在

相关问题 更多 >