如何“拉伸”数据帧并在现有值之间插入

In[1]: import pandas as pd t = pd.DataFrame({'D18O': [-0.47, -0.12, 0.55, 0.72, 1.8 , 1.1 , 0.43, -0.29, -0.55, -0.6 , -0.32, 0.28, 0.72, 1.1 , 1.34, 1.32, 1.11, 0.46, 0.09, 0.02]}) Out[2]: 1 -0.47 2 -0.12 3 0.55 4 0.72 5 1.80 6 1.10 7 0.43 8 -0.29 9 -0.55 10 -0.60 11 -0.32 12 0.28 13 0.72 14 1.10 15 1.34 16 1.32 17 1.11 18 0.46 19 0.09 20 0.02 Name: D18O, dtype: float64

In[3]: t['D18O'] Out[3]: 0 0.47 2.13157 NaN 2.26315 NaN ... ... 21.5 -0.12 22.63157 NaN 23.76315 NaN ... ... ... ... 430 0.02 Name: D18O, dtype: float64

2条回答

网友

1楼 · 编辑于 2024-04-18 16:48:40

可以将参数ffill与limit一起在^{}中使用，但问题是第一个值重复，因此可能的解决方案是将第一个助手值close 0添加到索引中，reindex，将其删除iloc，最后interpolate：

r = pd.RangeIndex(0, 430, 1)

t.loc[-0.001] = 0
t = t.sort_index()
new_idx = np.linspace(t.index[0], t.index[-1], len(r))
print (t.reindex(new_idx, method='ffill', limit=1).iloc[1:].interpolate())

               D18O
0.043291  -0.470000
0.087583  -0.454091
0.131874  -0.438182
0.176166  -0.422273
0.220457  -0.406364
0.264748  -0.390455
0.309040  -0.374545
0.353331  -0.358636
0.397622  -0.342727
0.441914  -0.326818
0.486205  -0.310909
0.530497  -0.295000
0.574788  -0.279091
0.619079  -0.263182
0.663371  -0.247273
0.707662  -0.231364
0.751953  -0.215455
...
...

网友

2楼 · 编辑于 2024-04-18 16:48:40

我现在使用了一种更通用的方法将数据插值到某个索引。我只想列出我的方法供将来参考：

import numpy as np
import pandas as pd 
from scipy.interpolate import interp1d

# Example data 5 numeric columns
i = pd.RangeIndex(0, 430, 1)
df1 = pd.DataFrame([-0.47, -0.12, 0.55, 0.72, 1.8, 1.1, 0.43, -0.29, 
                    -0.55, -0.6, -0.32, 0.28, 0.72, 1.1 , 1.34, 1.32,
                    1.11, 0.46, 0.09, 0.02], [-0.47, -0.12, 0.55, 0.72, 1.8, 1.1, 0.43, -0.29, 
                    -0.55, -0.6, -0.32, 0.28, 0.72, 1.1 , 1.34, 1.32,
                    1.11, 0.46, 0.09, 0.02], [-0.47, -0.12, 0.55, 0.72, 1.8, 1.1, 0.43, -0.29, 
                    -0.55, -0.6, -0.32, 0.28, 0.72, 1.1 , 1.34, 1.32,
                    1.11, 0.46, 0.09, 0.02])

# Select numeric columns
nums = df1.select_dtypes([np.number])
old_idx = df.index
# Calculate new index
len_idx = env.shape[0]
mi, ma = old_idx.min(), old_idx.max()
new_idx = np.linspace(mi, ma, len_idx)

# Plot to compare interpolation to original values
fig, ax = plt.subplots(1, 1)
ax.plot(old_idx, df1.iloc[:, 0], 'k ')

def interpol(column):
    ```Interpolation function```    
    interpolant = interp1d(old_idx, column)
    interpolated = interpolant(new_idx)
    return interpolated

# Interpolate data to match index length of enviromental data
inter_nums = pd.DataFrame(index=new_idx)
for col in nums:
    inter = interpol(nums[col])
    inter_nums[col] = inter

# Plot after interpolation. Same curve? good!      
ax.plot(inter_nums_iloc[:; 0], c='r')

相关问题更多 >

编程相关推荐

热门问题

热门文章