当我运行下面的程序时,我可以打印单词出现的频率,如何将其保存为数据帧。如何在dataframe中保存标记字及其计数
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(analyzer = "word", \
tokenizer = None, \
preprocessor = None, \
stop_words = None, \
min_df = 0, \
max_features = 50)
text = ["Hello I am going to I with hello am"]
# Count
train_data_features = vectorizer.fit_transform(text)
vocab = vectorizer.get_feature_names()
# Sum up the counts of each vocabulary word
dist = np.sum(train_data_features.toarray(), axis=0)
# For each, print the vocabulary word and the number of times it
# appears in the training set
for tag, count in zip(vocab, dist):
print count, tag
输出
2 am
1 going
2 hello
1 to
1 with
只需结合vocab和dict,并使用pandas将它们转换成数据帧。你知道吗
相关问题 更多 >
编程相关推荐