如何使用Python将csv文件中的datetime对象分割为30分钟数组并导出为新csv文件

0 投票

1 回答

1040 浏览

提问于 2025-04-18 12:38

我有一个10赫兹的.csv数据文件，里面的日期和时间都在同一列，格式是这样的：2014-07-05 12:01:00.0到2014-07-06 12:00:59.9，这代表文件里有一天的数据。我需要把这些数据分成30分钟一个块，在24小时内分成48个块，比如12:01:00.0到12:29:59.9，12:30:00.0到12:59:59.9，依此类推。每个30分钟的块最好能单独导出成一个文本文件。我刚开始学Python（今天是第三天），尽力而为，但感觉有点无从下手。我在实习中，真的需要帮助。我不是程序员，只是一个化学家，想用Python完成这个任务。我试着按行分割（30分钟大约18000行），但因为我的数据不是从12:00:01开始的，所以行分割的计算就出错了，没法得到准确的30分钟分割。我听说需要用到日期时间对象来转换成字符串。任何指导或帮助都将非常感激。提前谢谢你们。下面是我开始修改的代码，想把它改成日期时间对象，但我真的需要一些指导：

import csv
import re
import os
import datetime
import numpy as np
filename = 'C:\Users\Jason\Documents\Flux Data Files\HL14_175.csv'
f = open('C:\Users\Jason\Documents\Flux Data Files\Output Flux Split 30 mins Data    Files\HL14_175_split0.csv','wb')
writer = csv.writer(f,delimiter = ',')
with open(filename,"r") as datafile:
    r = csv.reader(datafile,delimiter = ",")
    timestamp = datetime.datetime.strptime("2014-07-05", "%Y-%m-%d %H:%M:%S:%f")
    recordnumber = []
    sonic1 = []
    sonic2 = []
    sonic3 = []
    temperature = []
    for row in r:
        timestamp.append((row[0]))
        recordnumber.append(float(row[1]))
        sonic1.append(float(row[2]))
        sonic2.append(float(row[3]))
        sonic3.append(float(row[4]))
        temperature.append(float(row[5]))
    timestamp = np.array(timestamp)
    recordnumber = np.array(recordnumber)
    sonic1 = np.array(sonic1)
    sonic2 = np.array(sonic2)
    sonic3 = np.array(sonic3)
    temperature = np.array(temperature)
datetime.strptime(date_string, format) 
#row_count = 863998
row_count = sum(1 for row in csv.reader(open(filename)))
lines = row_count/18001.0
timestamp_split = np.array_split(timestamp,lines)
recordnumber_split = np.array_split(recordnumber,lines)
sonic1_split = np.array_split(sonic1,lines)
sonic2_split = np.array_split(sonic2,lines)
sonic3_split = np.array_split(sonic3,lines)
temperature_split = np.array_split(temperature,lines)
dataout =  np.column_stack((timestamp_split[0],recordnumber_split[0],sonic1_split[0],sonic2_split[0],sonic3_split[0],temperature_split[0]))
writer.writerows(dataout)
f.close()
print('Flux Data Split Complete')

这是一个示例数据文件：

6/24/2014 0:01,3583014,-59,-62,-9,296.51
01:00.1,3583015,-69,-68,16,296.54
01:00.2,3583016,-62,-59,36,296.56
01:00.3,3583017,-77,-45,26,296.56
01:00.4,3583018,-47,-50,36,296.56
01:00.5,3583019,-48,-70,27,296.51
01:00.6,3583020,-71,-60,28,296.54
01:00.7,3583021,-69,-73,24,296.52
01:00.8,3583022,-61,-69,15,296.49
01:00.9,3583023,-56,-68,8,296.52
6/24/2014 0:01,3583024,-65,-42,-5,296.56
01:01.1,3583025,-71,-33,-11,296.56

文件操作数据分割时间序列分析数据预处理数据导出 CSV处理 datetime对象 48个时间块

1 个回答

这不是一个完整的解决方案，因为在正确转换日期方面仍然存在问题。

我使用文本数据来模拟从csv文件中读取数据。

我还是个pandas的新手，所以可能有人能做得更好。

import pandas as pd
import StringIO

data = '''6/24/2014 0:01,3583014,-59,-62,-9,296.51
01:00.1,3583015,-69,-68,16,296.54
01:00.2,3583016,-62,-59,36,296.56
01:00.3,3583017,-77,-45,26,296.56
01:00.4,3583018,-47,-50,36,296.56
01:00.5,3583019,-48,-70,27,296.51
01:00.6,3583020,-71,-60,28,296.54
01:00.7,3583021,-69,-73,24,296.52
01:00.8,3583022,-61,-69,15,296.49
01:00.9,3583023,-56,-68,8,296.52
6/24/2014 0:01,3583024,-65,-42,-5,296.56
01:01.1,3583025,-71,-33,-11,296.56
6/24/2014 0:31,3583014,-59,-62,-9,296.51
31:00.1,3583015,-69,-68,16,296.54
31:00.2,3583016,-62,-59,36,296.56
31:00.3,3583017,-77,-45,26,296.56
31:00.4,3583018,-47,-50,36,296.56
31:00.5,3583019,-48,-70,27,296.51
31:00.6,3583020,-71,-60,28,296.54
31:00.7,3583021,-69,-73,24,296.52
31:00.8,3583022,-61,-69,15,296.49
31:00.9,3583023,-56,-68,8,296.52
6/24/2014 0:31,3583024,-65,-42,-5,296.56
31:01.1,3583025,-71,-33,-11,296.56'''

# reading from CSV

df = pd.DataFrame.from_csv(StringIO.StringIO(data), index_col=None, header=None)

#print df

# converting "wierd" date format - still can be problem

date = None
minut = None
second = 59

def change_date(line):
    global date, minut, second
    a = line.split(':')
    if len(a[0]) > 2:
        if a[0] != date or a[1] != minut:
           second = 59
        date = a[0]
        minut = a[1]
        second = (second + 1) % 60
        return "%s:%02d.0" % (line, second)
        #return line
    else:
        return date + ":" + line

df[0] = df[0].map(change_date)

#print df
#print df.dtypes

# converting string with date and time to object datetime

df[0] = pd.DatetimeIndex(df[0])

#print df.dtypes

# groub by date (year,month,day,hour) and minute (minute<30)

g = df.groupby( df[0].map(lambda t:(t.strftime("%Y_%m_%d_%H_") + ("00" if t.minute<30 else "30") )) )

# print groups

for name, group in g:
    print 'name:', name
    print group
    group.to_csv(name + ".csv") # write groups to files

结果

 name: 2014_01_24_00_00
                            0        1   2   3   4       5
0         2014-06-24 00:01:01  3583014 -59 -62  -9  296.51
1  2014-06-24 00:01:00.100000  3583015 -69 -68  16  296.54
2  2014-06-24 00:01:00.200000  3583016 -62 -59  36  296.56
3  2014-06-24 00:01:00.300000  3583017 -77 -45  26  296.56
4  2014-06-24 00:01:00.400000  3583018 -47 -50  36  296.56
5  2014-06-24 00:01:00.500000  3583019 -48 -70  27  296.51
6  2014-06-24 00:01:00.600000  3583020 -71 -60  28  296.54
7  2014-06-24 00:01:00.700000  3583021 -69 -73  24  296.52
8  2014-06-24 00:01:00.800000  3583022 -61 -69  15  296.49
9  2014-06-24 00:01:00.900000  3583023 -56 -68   8  296.52
10        2014-06-24 00:01:02  3583024 -65 -42  -5  296.56
11 2014-06-24 00:01:01.100000  3583025 -71 -33 -11  296.56

 name: 2014_01_24_00_30
                            0        1   2   3   4       5
12        2014-06-24 00:31:03  3583014 -59 -62  -9  296.51
13 2014-06-24 00:31:00.100000  3583015 -69 -68  16  296.54
14 2014-06-24 00:31:00.200000  3583016 -62 -59  36  296.56
15 2014-06-24 00:31:00.300000  3583017 -77 -45  26  296.56
16 2014-06-24 00:31:00.400000  3583018 -47 -50  36  296.56
17 2014-06-24 00:31:00.500000  3583019 -48 -70  27  296.51
18 2014-06-24 00:31:00.600000  3583020 -71 -60  28  296.54
19 2014-06-24 00:31:00.700000  3583021 -69 -73  24  296.52
20 2014-06-24 00:31:00.800000  3583022 -61 -69  15  296.49
21 2014-06-24 00:31:00.900000  3583023 -56 -68   8  296.52
22        2014-06-24 00:31:04  3583024 -65 -42  -5  296.56
23 2014-06-24 00:31:01.100000  3583025 -71 -33 -11  296.56

回答于 2025-04-18 由 Python大师

分享举报

如何使用Python将csv文件中的datetime对象分割为30分钟数组并导出为新csv文件

1 个回答

撰写回答