更新每一行自第一个R起的分钟数

2024-03-29 00:15:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个有一百万条推特的文件。第一条tweet出现了2013-04-15 20:17:18 UTC。我想用第一条tweet发布后的分钟数来更新每个tweet行。你知道吗

我在datetime here和转换time here方面找到了帮助,但是当我把这两者放在一起时,我没有得到正确的时间。它可以是在每个published_at值的末尾带有UTC字符串的内容。你知道吗

它抛出的错误是:

tweets['minsSince'] = tweets.apply(timesince,axis=1)
...
TypeError: ('string indices must be integers, not str', u'occurred at index 0')

谢谢你的帮助。你知道吗

#Import stuff
from datetime import datetime
import time
import pandas as pd
from pandas import DataFrame

#Read the csv file
tweets = pd.read_csv('BostonTWEETS.csv')
tweets.head()

#The first tweet's published_at time
starttime = datetime (2013, 04, 15, 20, 17, 18)

#Run through the document and calculate the minutes since the first tweet
def timesince(row):
    minsSince = int()
    tweetTime = row['published_at']
    ts = time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(tweetTime['published_at'], '%Y-%m-%d %H:%M:%S %UTC'))
    timediff = (tweetTime - starttime)
    minsSince.append("timediff")
    return ",".join(minsSince)

tweets['minsSince'] = tweets.apply(timesince,axis=1)

df = DataFrame(tweets)

print(df)

前5行的csvfile示例。你知道吗


Tags: csvtheimportdatetimeheretimetweetsat
1条回答
网友
1楼 · 发布于 2024-03-29 00:15:33
#Import stuff
from datetime import datetime
import time
import pandas as pd
from pandas import DataFrame

#Read the csv file
tweets = pd.read_csv('sample.csv')
tweets.head()

#The first tweet's published_at time
starttime = tweets.published_at.values[0]
starttime = datetime.strptime(starttime, '%Y-%m-%d %H:%M:%S UTC')

#Run through the document and calculate the minutes since the first tweet
def timesince(row):
    ts = datetime.strptime(row, '%Y-%m-%d %H:%M:%S UTC')
    timediff = (ts- starttime)
    timediff = divmod(timediff.days * 86400 + timediff.seconds, 60)
    return timediff[0]

tweets['minSince'] = 0
tweets.minSince = tweets.published_at.map(timesince)

df = DataFrame(tweets)

print(df)

我希望这就是你要找的。你知道吗

相关问题 更多 >