我试着用不同的方法接近:https://www.kaggle.com/aaron7sun/stocknews 下面的代码给出了这个错误:“ValueError:('序列的真值不明确。使用a.empty、a.bool()、a.item()、a.any()或a.all(),u'出现在索引0')”
import panda as pd
from bs4 import BeautifulSoup
import re
import nltk
nltk.download()
from nltk.corpus import stopwords
data = pd.read_csv("/Users/s7c/Documents/Untitled Folder/Combined_News_DJIA.csv/Combined_News_DJIA.csv")
data.info()
def news_to_words(reddit_news):
# 1. Remove HTML
newstxt = BeautifulSoup(reddit_news).get_text()
#remove non-letters
ltrs = re.sub("[^a-zA-Z]", ' ', newstxt)
#convert to lower case/split into individual words
wrd = ltrs.lower().split()
#In Python, searching a set is much faster than searching
# a list, so convert the stop words to a set
st = set(stopwords.words("english"))
#Remove stop words
meaningful_words = [w for w in wrd if not w in st]
#Join the words back into one string separated by space,
# and return the result.
return(" ".join( meaningful_words))
train = data[data['Date'] < '2015-01-01']
test = data[data['Date'] > '2014-12-31']
#method of combining all headlines
train_comb=train.iloc[:,2:27].apply(lambda row: ''.join(str(row.values)), axis=1)
test_comb=test.iloc[:,2:27].apply(lambda row: ''.join(str(row.values)), axis=1)
new_train_comb = []
for i in range(0, len(train_comb)):
new_train_comb.append(news_to_words(train_comb))
当我试图迭代函数时,错误出现了。。。 你能帮我吗
愚蠢的错误。我忘了train_comb系列的迭代因子
相关问题 更多 >
编程相关推荐