<p>我正在为文本分类训练一个python(2.7.11)分类器,在运行时,我收到一条不推荐的警告消息,我不知道是代码中的哪一行导致了它!错误/警告。但是,代码运行良好,并给出结果。。。</p>
<blockquote>
<p>\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.</p>
</blockquote>
<p>我的代码:</p>
<pre><code>def main():
data = []
folds = 10
ex = [ [] for x in range(0,10)]
results = []
for i,f in enumerate(sys.argv[1:]):
data.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(csv.DictReader(open(f,'r'),delimiter='\t'))
for f in data:
for i,datum in enumerate(f):
ex[i % folds].append(datum)
#print ex
for held_out in range(0,folds):
l = []
cor = []
l_test = []
cor_test = []
vec = []
vec_test = []
for i,fold in enumerate(ex):
for line in fold:
if i == held_out:
l_test.append(line['label'].rstrip("\n"))
cor_test.append(line['text'].rstrip("\n"))
else:
l.append(line['label'].rstrip("\n"))
cor.append(line['text'].rstrip("\n"))
vectorizer = CountVectorizer(ngram_range=(1,1),min_df=1)
X = vectorizer.fit_transform(cor)
for c in cor:
tmp = vectorizer.transform([c]).toarray()
vec.append(tmp[0])
for c in cor_test:
tmp = vectorizer.transform([c]).toarray()
vec_test.append(tmp[0])
clf = MultinomialNB()
clf .fit(vec,l)
result = accuracy(l_test,vec_test,clf)
print result
if __name__ == "__main__":
main()
</code></pre>
<p>你知道哪条线发出这个警告吗?
另一个问题是,使用不同的数据集运行此代码可以获得相同的准确度,而我无法找出导致这种情况的原因?
如果我想在另一个python进程中使用这个模型,我查看了文档,发现了一个使用pickle库的示例,但不是针对<a href="https://www.cnpython.com/pypi/joblib" class="inner-link">joblib</a>。所以,我试图遵循相同的代码,但这给了我错误:</p>
<pre><code>clf = joblib.load('model.pkl')
pred = clf.predict(vec);
</code></pre>
<p>另外,如果我的数据是CSV文件,则格式为:“label\t text\n”
测试数据中的label列应该是什么?</p>
<p>提前谢谢</p>