无法理解Pandas的预测算法

import pandas as pd import quandl import math df = quandl.get('WIKI/GOOGL') df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume',]] df['HL_PCT'] = (df['Adj. High'] - df['Adj. Close'])/ df['Adj. Close']*100.0 df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0 df = df[['Adj. Close','HL_PCT','PCT_change','Adj. Volume']] forecast_col = 'Adj. Close' #filling the NAN datas df.fillna(-99999,inplace=True) // this line i am unable to understand forecast_out = int(math.ceil(0.02*len(df))) // this line i am unable to understand df['label'] = df[forecast_col].shift(-forecast_out) df.dropna(inplace=True) print(df.head())

2条回答

网友

1楼 · 编辑于 2024-04-23 13:51:54

在机器学习中，通常会有数据样本，每个样本都有特性和标签（许多api都希望这样，比如scikitlearn）。在您的例子中，每个示例都是数据帧的一行。要预测的值是forecast_col。既然你在看股票数据，你就想预测未来会发生什么。“预测”现在发生的事情是没有意义的（你可以观察它）。forecast_out值是一些任意值，在本例中，它用来表示您将提前多久预测'Adj'。“关闭”。在

shift方法将观测值与要预测的未来值对齐。然后有了这个数据帧，你就可以很容易地使用scikit学习如何拟合模型。在

lr = sklearn.linear_model.LinearRegression()
lr.fit(df[['HL_PCT','PCT_change','Adj. Volume']], df[forecast_col])

这个模型将根据当前的观测值来预测forecast_out天后将发生什么。在

网友

2楼 · 编辑于 2024-04-23 13:51:54

我是按照同样的教程来做的，而我却陷入了同样的问题这里是我如何解决的： math_ceil（）：四舍五入到最高数字，例如：

  math_ceil(4.5)

四舍五入到：

^{pr2}$

则代码将乘以：

(0.02*len(df))

len（df）基本上是数据集的大小，在本例中是3424

(print (len(df))

换言之，我们将跟踪3424天的数据，现在我们将预测未来的情况，但显然，我们不会在3424天的时间范围内进行，但我们将对未来进行一个小范围的研究，在我们的例子中，它将是69天（占我们总数据的2%）超过我们在分类器中的最后一个数据，看看这段时间的价格是多少。在

所以总结一下：

 forecast_out = int(math.ceil(0.02*len(df)))

等于69

现在我们将使用变量forcast_out来确定标签：

  df['label'] = df[forecast_col].shift(-forecast_out)

这个公式意味着，我们正在将数据集的列向上移动，因此我们的愿景是69天后的股价。在

下面是代码的更多细节，您可以尝试使用它。在

forecast_col ='Adj. Close'
df.fillna(-99999,inplace=True)

forecast_out=int(math.ceil(0.02*len(df)))
print ("Dataset= " + str(len(df)))
print ("Forecasting after how many days = " + str(forecast_out))
df['label']=df[forecast_col].shift(-forecast_out)
df.dropna(inplace=True)
print(df.tail())

相关问题更多 >

编程相关推荐

热门问题

热门文章