回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>首先,我检查了关于这个错误的不同帖子,没有一个能解决我的问题。</p>
<p>所以我使用了RandomForest,我能够生成森林并进行预测,但有时在生成森林的过程中,我会得到以下错误。</p>
<blockquote>
<p>ValueError: Input contains NaN, infinity or a value too large for dtype('float32').</p>
</blockquote>
<p>同一数据集发生此错误。有时数据集在训练过程中会产生错误,但大多数情况下不会。错误有时发生在训练的开始,有时发生在训练的中间。</p>
<p>这是我的代码:</p>
<pre><code>import pandas as pd
from sklearn import ensemble
import numpy as np
def azureml_main(dataframe1 = None, dataframe2 = None):
# Execution logic goes here
Input = dataframe1.values[:,:]
InputData = Input[:,:15]
InputTarget = Input[:,16:]
limitTrain = 2175
clf = ensemble.RandomForestClassifier(n_estimators = 10000, n_jobs = 4 );
features=np.empty([len(InputData),10])
j=0
for i in range (0,14):
if (i == 1 or i == 4 or i == 5 or i == 6 or i == 8 or i == 9 or i == 10 or i == 11 or i == 13 or i == 14):
features[:,j] = (InputData[:, i])
j += 1
clf.fit(features[:limitTrain,:],np.asarray(InputTarget[:limitTrain,1],dtype = np.float32))
res = clf.predict_proba(features[limitTrain+1:,:])
listreu = np.empty([len(res),5])
for i in range(len(res)):
if(res[i,0] > 0.5):
listreu[i,4] = 0;
elif(res[i,1] > 0.5):
listreu[i,4] = 1;
elif(res[i,2] > 0.5):
listreu[i,4] = 2;
else:
listreu[i,4] = 3;
listreu[:,0] = features[limitTrain+1:,0]
listreu[:,1] = InputData[limitTrain+1:,2]
listreu[:,2] = InputData[limitTrain+1:,3]
listreu[:,3] = features[limitTrain+1:,1]
# Return value must be of a sequence of pandas.DataFrame
return pd.DataFrame(listreu),
</code></pre>
<p>我在本地和<code>Azure ML</code>Studio上运行代码,两种情况下都会发生错误。</p>
<p>我确信这不是因为我的数据集,因为大多数时候我没有得到错误,我是从一个不同的输入自己生成数据集。</p>
<p>这是一个<a href="http://%20https://pastebin.com/mC76uq8P" rel="nofollow noreferrer">part of the dataset I use</a></p>
<p><strong>编辑</strong>我可能喜欢我有0值,而不是真正的0值。这些价值观就像</p>
<blockquote>
<p>3.0x10^-314</p>
</blockquote>