使用xarray对非标准CFTimeIndex日历（360day，noleapyear）进行重采样以供Pandas使用的方法

<xarray.Dataset> Dimensions: (bnds: 2, rlat: 412, rlon: 424, time: 1800) Coordinates: lat (rlat, rlon) float64 ... lon (rlat, rlon) float64 ... * rlat (rlat) float64 -23.38 -23.26 -23.16 ... 21.61 21.73 21.83 * rlon (rlon) float64 -28.38 -28.26 -28.16 ... 17.93 18.05 18.16 * time (time) object 2011-01-01 12:00:00 ... 2015-12-30 12:00:00 Dimensions without coordinates: bnds Data variables: pr (time, rlat, rlon) float32 ... rotated_pole |S1 ... time_bnds (time, bnds) object ... Attributes: CDI: Climate Data Interface version 1.3.2 Conventions: CF-1.6 NCO: 4.4.2 CDO: Climate Data Operators version 1.3.2 (htt... contact: Fredrik Boberg, Danish Meteorological Ins... creation_date: 2019-11-16 14:39:25 experiment: Scenario experiment using HadGEM as drivi... experiment_id: rcp45 driving_experiment: MOHC-HadGEM2-ES,rcp45,r1i1p1 driving_model_id: MOHC-HadGEM2-ES driving_model_ensemble_member: r1i1p1 driving_experiment_name: rcp45 frequency: day institution: Danish Meteorological Institute institute_id: DMI model_id: DMI-HIRHAM5 rcm_version_id: v2 project_id: CORDEX CORDEX_domain: EUR-11 product: output tracking_id: hdl:21.14103/158e462e-499c-4d6e-8462-ac3e... c3s_disclaimer: This data has been produced in the contex...

<xarray.DataArray 'time' (time: 1800)> array([cftime.Datetime360Day(2011-01-01 12:00:00), cftime.Datetime360Day(2011-01-02 12:00:00), cftime.Datetime360Day(2011-01-03 12:00:00), ..., cftime.Datetime360Day(2015-12-28 12:00:00), cftime.Datetime360Day(2015-12-29 12:00:00), cftime.Datetime360Day(2015-12-30 12:00:00)], dtype=object) Coordinates: * time (time) object 2011-01-01 12:00:00 ... 2015-12-30 12:00:00 Attributes: standard_name: time long_name: time bounds: time_bnds

ds = xarray.open_dataset('data/mohc_hadgem2_es.nc') def cft_to_string(cfttime_obj): month = str(cfttime_obj.month) day = str(cfttime_obj.day) # This is awful but there were no two-digit months/days by default month = '0'+month if len(month)==1 else month day = '0'+day if len(day)==1 else day return f'{cfttime_obj.year}-{month}-{day}' # Apply above function ds_time_strings = list(map(cft_to_string, ds['time'])) # Get precipitation values only (to use in pandas dataframe) # Suppose the data are from multiple pixels (for whole of Europe) # - that's why the mean(axis=(1,2)) precipitation = ds['pr'].values.mean(axis=(1,2)) # To dataframe df = pd.DataFrame(index=ds_time_strings, data={'precipitation': precipitation}) # Coerce erroneous dates df.index = pd.to_datetime(df.index, errors='coerce') # Now, dates such as 2011-02-30 are omitted

precipitation 2011-01-01 0.000049 2011-01-02 0.000042 2011-01-03 0.000031 2011-01-04 0.000030 2011-01-05 0.000038 ... ... 2011-02-28 0.000041 NaT 0.000055 NaT 0.000046 2011-03-01 0.000031 ... ... 2015-12-26 0.000028 2015-12-27 0.000034 2015-12-28 0.000028 2015-12-29 0.000025 2015-12-30 0.000024 1800 rows × 1 columns

1条回答

网友
1楼 · 发布于 2024-05-16 03:07:04

感谢您提供的详细示例！如果您的分析可以接受每月平均数的时间序列，我认为最干净的方法是重新采样到“月开始”频率，然后协调日期类型，例如，对于由CFTimeIndex索引的数据集，类似于：
resampled = ds.resample(time="MS").mean() resampled["time"] = resampled.indexes["time"].to_datetimeindex()
这基本上是你的第二个要点，但有一个小小的改变。重新采样到月开始频率可以避免360天日历包含标准日历（例如2月30日）中不存在的月末的问题

问题

到目前为止我所尝试的

相关问题更多 >

编程相关推荐

热门问题

热门文章