转换为数据帧

import numpy as np from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(analyzer = "word", \ tokenizer = None, \ preprocessor = None, \ stop_words = None, \ min_df = 0, \ max_features = 50) text = ["Hello I am going to I with hello am"] # Count train_data_features = vectorizer.fit_transform(text) vocab = vectorizer.get_feature_names() # Sum up the counts of each vocabulary word dist = np.sum(train_data_features.toarray(), axis=0) # For each, print the vocabulary word and the number of times it # appears in the training set for tag, count in zip(vocab, dist): print count, tag

1条回答

网友

1楼 · 发布于 2024-05-19 23:25:31

只需结合vocab和dict，并使用pandas将它们转换成数据帧。你知道吗

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd

vectorizer = CountVectorizer(analyzer = "word",   \
                         tokenizer = None,    \
                         preprocessor = None, \
                         stop_words = None,   \
                         min_df = 0,          \
                         max_features = 50) 

text = ["Hello I am going to I with hello am"]

# Count
train_data_features = vectorizer.fit_transform(text)
vocab = vectorizer.get_feature_names()

# Sum up the counts of each vocabulary word
dist = np.sum(train_data_features.toarray(), axis=0)

l=list(zip(vocab,dist))
df=pd.DataFrame(l, columns=['count','tag'])

相关问题更多 >

编程相关推荐

热门问题

热门文章