合并具有相同范围但不同空间分辨率的xarray数据集

2024-06-06 18:00:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个基于卫星的太阳诱导荧光(SIF)数据集和一个模拟降水数据集。我想在我研究区域的每像素基础上比较降水量和SIF。我的两个数据集属于同一个区域,但空间分辨率略有不同。当我取整个区域的平均值时,我可以成功地跨时间绘制这些值并相互比较,但是我很难在每个像素的基础上创建散点图。你知道吗

老实说,我不确定这是否是最好的方法来比较这两个价值观时,寻找对SIF的影响precip,所以我对不同的方法的想法持开放态度。至于合并当前我正在使用的数据^{},但它给了我一个错误,我已经在下面描述过了。我也可以通过将netcdf转换成geotiff,然后使用rasterio对它们进行扭曲来实现这一点,但这似乎是一种效率低下的比较方法。到目前为止,我掌握的情况如下:

import netCDF4
import numpy as np
import dask
import xarray as xr

rainy_bbox = np.array([
    [-69.29519955115512,-13.861261028444734],
    [-69.29519955115512,-12.384786628185896],
    [-71.19583431678012,-12.384786628185896],
    [-71.19583431678012,-13.861261028444734]])
max_lon_lat = np.max(rainy_bbox, axis=0)
min_lon_lat = np.min(rainy_bbox, axis=0)

# this dataset is available here: ftp://fluo.gps.caltech.edu/data/tropomi/gridded/
sif = xr.open_dataset('../data/TROPO_SIF_03-2018.nc')

# the dataset is global so subset to my study area in the Amazon
rainy_sif_xds = sif.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))  

# this data can all be downloaded from NASA Goddard here either manually or with wget but you'll need an account on https://disc.gsfc.nasa.gov/: https://pastebin.com/viZckVdn
imerg_xds = xr.open_mfdataset('../data/3B-DAY.MS.MRG.3IMERG.201803*.nc4')

# spatial subset
rainy_imerg_xds = imerg_xds.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))  

# I'm not sure the best way to combine these datasets but am trying this
combo_xds = xr.combine_by_coords([rainy_imerg_xds, rainy_xds])

目前我在最后一行得到了一个似乎毫无帮助的RecursionError: maximum recursion depth exceeded in comparison。当我添加参数join='left'时,来自rainy_imerg_xds数据集的数据在combo_xds中,当我做join='right'时,rainy_xds数据存在,如果我做join='inner'则不存在数据。我假设这个函数有一些内部插值,但似乎没有。你知道吗


Tags: 数据import区域datanpsliceminmax
1条回答
网友
1楼 · 发布于 2024-06-06 18:00:20

这个documentation from xarray简单地概述了这个问题的解决方案。xarray允许您在多个维度中进行插值,并指定另一个数据集的x和y维度作为输出维度。所以在这种情况下,它是用

# interpolation based on http://xarray.pydata.org/en/stable/interpolation.html
# interpolation can't be done across the chunked dimension so we have to load it all into memory
rainy_sif_xds.load()

#interpolate into the higher resolution grid from IMERG
interp_rainy_sif_xds = rainy_sif_xds.interp(lat=rainy_imerg_xds["lat"], lon=rainy_imerg_xds["lon"])

# visualize the output
rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Initial') +\
interp_rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Interpolated')

enter image description here

# now that our coordinates match, in order to actually merge we need to convert the default CFTimeIndex to datetime to merge dataset with SIF data because the IMERG rainfall dataset was CFTime and the SIF was datetime
rainy_imerg_xds['time'] = rainy_imerg_xds.indexes['time'].to_datetimeindex()

# now the merge can easily be done with
merged_xds = xr.combine_by_coords([rainy_imerg_xds, interp_rainy_sif_xds], coords=['lat', 'lon', 'time'], join="inner")

# now visualize the two datasets together // multiply SIF by 30 because values are so ow
merged_xds.HQprecipitation.rolling(time=7, center=True).sum().mean(dim=('lat', 'lon')).hvplot().relabel('Precip') * \
(merged_xds.dcSIF.mean(dim=('lat', 'lon'))*30).hvplot().relabel('SIF')

enter image description here

相关问题 更多 >