单变量样条插值数据存在明显差异

0 投票
2 回答
860 浏览
提问于 2025-04-18 18:37

我正在尝试在Python中创建一个单变量样条插值,以便拟合一大组数据。但是当我把结果画出来时,发现两者之间差异很大。我尝试了很多不同的平滑因子值(包括零,这样就必须经过每一个数据点),但在绘图时,结果依然差异很大。

##
# Univariate Spline Interpolation
##

## This function interpolates the data by creating multiple times the amount of points in the data set and fitting a spline to it
## Input:
# dataX - X axis that you corresponds to dataset
# dataY - Y axis of data to fit spline on (must be same size as dataX)
# multiple - the multiplication factor, default is 2 ( <1 - Less points, 1 - same amount of points, >1 - more points)
# order - order of spline, default is 4 (3 - Cubic, 4 - Quartic)
## Output
# splinedDataX - splined X Axis
# splinedDataY - splined Y Axis

def univariate_spline_interpolation( dataX, dataY, multiple=2, order=4):

    #Libraries
    from numpy import linspace,exp
    from numpy.random import randn
    import matplotlib.pyplot as plt
    from scipy.interpolate import UnivariateSpline, LSQUnivariateSpline


    #Find sizes of x and y axis for comparison and multiple
    sizeX = len(dataX)
    sizeY = len(dataY)


    #Error catching
    if(sizeX != sizeY):
        print "Data X axis and Y axis must have same size"
        return

    if(multiple <= 0):
        print "Order must be greater than 0"
        return

    if(order < 1 or order >5):
        print "Order must be 1 <= order <= 5"
        return

    #Create Spline
    s = UnivariateSpline(dataX, dataY, k=3, s=0)   
    # s is smoothing factor so spline doesn't shoot off in between data points
    #Positive smoothing factor used to choose the number of knots.
    #Number of knots will be increased until the smoothing condition is satisfied:
    # sum((w[i]*(y[i]-s(x[i])))**2,axis=0) <= s
    #If None (default), s=len(w) which should be a good value if 1/w[i] is an estimate of the standard deviation of y[i].
    #If 0, spline will interpolate through all data points.

    #Create new axis based on numPoints
    numPoints = sizeX * multiple   #Find mumber of points for spline
    startPt = dataX[1]   #find value of first point on x axis
    endPt = dataX[-1]   #find value of last point on x axis
    splinedDataX = linspace(startPt, endPt, numPoints)   #create evenly spaced points on axis based on start, end, and number of desired data points

    #Create Y axis of splined Data
    splinedDataY = s(splinedDataX)   #Create new Y axis with numPoints etnries of data splined to fit the original data

    return splinedDataX, splinedDataY

##
# Text Cubic Spline
##

splinedX, splinedY = univariate_spline_interpolation(sensorTimestamp, filteredData[1], multiple=1)

print "old x data"
print "length", len(sensorTimestamp)
print "Starts", sensorTimestamp[0]
print "Ends", sensorTimestamp[-1]
print ""

print "new x data"
print "length", len(splinedX)
print "multiple", len(splinedX)/len(filteredData[1])
print "Starts", splinedX[0]
print "Ends", splinedX[-1]
print ""

print "old y data"
print "length", len(filteredData[1])
print "Starts", filteredData[1][0]
print "Ends", filteredData[1][-1]
print ""

print "new y data"
print "length", len(splinedY)
print "Starts", splinedY[0]
print "Ends", splinedY[-1]


difference = []
for i in splinedY:
    difference.append(splinedY[i] - filteredData[1][i])

plt.figure(figsize=(20,3))
plt.plot(sensorTimestamp, filteredData[1], label='Non-Splined', marker='*')
plt.plot(splinedX, splinedY, label='Splined')
plt.plot(sensorTimestamp, difference, label='Difference', ls='--')
plt.title(' BW Filtered Data from LED 1')
plt.axis([19, 30, -300, 300])
plt.legend(loc='best')
plt.show()

输出打印:

old x data
length 14690
Starts 0.0
Ends 497.178565979

new x data
length 14690
multiple 1.0
Starts 0.0555429458618
Ends 497.178565979

old y data
length 14690
Starts 50.2707843894
Ends 341.661410048

new y data
length 14690
Starts 416.803282313
Ends 341.661410048

这是输出图表

正如你所看到的,差异非常明显,但在图表上,数据看起来几乎是完全相同的点(或者非常接近)。

2 个回答

0

我觉得通过

splinedY[i] - filteredData[1][i]

来计算差值的方式是有问题的。'splinedY[i]' 是通过在 univariate_spline_interpolation() 函数中使用均匀间隔的 X 值(也就是 splinedDataX)计算出来的。而 splinedDataX 和 'sensorTimestamp' 是不一样的,所以比较它们对应的 Y 值是没有意义的。

我在网上查了查关于如何在 Python 中写 for 循环的内容,我觉得问题出在这句

for i in splinedY:

这里的 'i' 实际上是数组元素的值,而不是索引。正确的写法应该是

for i in range(len(splinedY)):

0
splinedY[i] - filteredData[1][i]

看起来当你用下面的代码计算差值时,数据在x轴上没有对齐,因此你减去的值并不在x轴的同一个点上。经过插值处理的数据时间戳稍微向右偏移,因为在 univariate_spline_interpolation 函数中,

startPt = dataX[1]

虽然点的数量和输入的x数据是一样的。这行代码可能需要改成

startPt = dataX[0]

撰写回答