ValueError:无法将DatetimeIndex强制转换为dtype datetime64[us]

import numpy as np import pandas as pd from datetime import datetime, date, time, timedelta from dateutil import parser from sqlalchemy import create_engine # Query all15 engine = create_engine('postgresql://user:passwd@localhost:5432/stocks') new15Df = (pd.read_sql_query("SELECT dt, o, h, l, c, v FROM all15 WHERE (instr = 'SPY') AND (date(dt) BETWEEN '2016-06-27' AND '2016-07-15');", engine)).sort_values('dt') # Correct for Time Zone. new15Df['dt'] = (new15Df['dt'].copy()).apply(lambda d: d + timedelta(hours=-4)) # spy0030Df contains the 15-minute data at 00 & 30 minute time points # spy1545Df contains the 15-minute data at 15 & 45 minute time points spy0030Df = (new15Df[new15Df['dt'].apply(lambda d: d.minute % 30) == 0]).reset_index(drop=True) spy1545Df = (new15Df[new15Df['dt'].apply(lambda d: d.minute % 30) == 15]).reset_index(drop=True) high = pd.concat([spy1545Df['h'], spy0030Df['h']], axis=1).max(axis=1) low = pd.concat([spy1545Df['l'], spy0030Df['l']], axis=1).min(axis=1) volume = spy1545Df['v'] + spy0030Df['v'] # spy30Df assembled and pushed to PostgreSQL as table spy30new spy30Df = pd.concat([spy0030Df['dt'], spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1) spy30Df.columns = ['d', 'o', 'h', 'l', 'c', 'v'] spy30Df.set_index(['dt'], inplace=True) spy30Df.to_sql('spy30new', engine, if_exists='append', index_label='dt')

u = (spy0030Df['dt']).tolist() timesAsPyDt = np.asarray(map((lambda d: d.to_pydatetime()), u)) spy30Df = pd.concat([spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1) newArray = np.c_[timesAsPyDt, spy30Df.values] colNames = ['dt', 'o', 'h', 'l', 'c', 'v'] newDf = pd.DataFrame(newArray, columns=colNames) newDf.set_index(['dt'], inplace=True) newDf.to_sql('spy30new', engine, if_exists='append', index_label='dt')

3条回答

网友

1楼 · 编辑于 2024-05-23 07:10:24

实际上，这是我的数据框。

                              Biomass  Fossil Brown coal/Lignite  Fossil Coal-derived gas  Fossil Gas  Fossil Hard coal  Fossil Oil  Geothermal  Hydro Pumped Storage  Hydro Run-of-river and poundage  Hydro Water Reservoir  Nuclear   Other  Other renewable    Solar  Waste  Wind Offshore  Wind Onshore
2018-02-02 00:00:00+01:00   4835.0                    16275.0                    446.0      1013.0            4071.0       155.0         5.0                   7.0                           1906.0                   35.0   8924.0  3643.0            142.0      0.0  595.0         2517.0       19999.0
2018-02-02 00:15:00+01:00   4834.0                    16272.0                    446.0      1010.0            3983.0       155.0         5.0                   7.0                           1908.0                   71.0   8996.0  3878.0            142.0      0.0  594.0         2364.0       19854.0
2018-02-02 00:30:00+01:00   4828.0                    16393.0                    446.0      1019.0            4015.0       155.0         5.0

我试图插入到SQL数据库中，但得到了与上述问题相同的错误。我所做的是，将数据帧的索引转换为带有标签“index”的列。

df.reset_index(level=0, inplace=True)

使用此代码将列名“index”重命名为“DateTime”。

df = df.rename(columns={'index': 'DateTime'})

将数据类型更改为“datetime64”。

df['DateTime'] = df['DateTime'].astype('datetime64')

使用这些代码将其存储在sql数据库中。

engine = create_engine('mysql+mysqlconnector://root:Password@localhost/generation_data', echo=True)
df.to_sql(con=engine, name='test', if_exists='replace')

网友

2楼 · 编辑于 2024-05-23 07:10:24

对每个工作的元素使用pd.to_datetime（）。选项4不起作用，将pd.to_datetime（）应用于整个序列。或许Postgres驱动程序理解python的datetime，但不理解pandas&numpy中的datetime64。选项4产生了正确的输出，但是我在将DF发送到Postgres时得到ValueError（参见标题）

timesAsPyDt = (spy0030Df['dt']).apply(lambda d: pd.to_datetime(str(d)))

网友

3楼 · 编辑于 2024-05-23 07:10:24

我也有同样的问题，在每个元素上应用pd.to_datetime()也可以。但它比在整个序列上运行pd.to_datetime()要慢几个数量级。对于超过100万行的数据帧：

(df['Time']).apply(lambda d: pd.to_datetime(str(d)))

大约需要70秒

以及

pd.to_datetime(df['Time'])

大约需要0.01秒

实际的问题是正在包括时区信息。要删除它：

t = pd.to_datetime(df['Time'])
t = t.tz_localize(None)

这应该快得多！

相关问题更多 >

编程相关推荐

热门问题

热门文章