如何从NGRAM列表中加载计数矢量器?

2024-04-25 19:21:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我犯了一个愚蠢的错误,没有酸洗我的计数矢量器,相反,我有一个它生产的所有ngram的列表,比如3500个特性

现在我的问题是,我需要从这个NGRAM列表中加载一个countVectorizer模型,不管怎样,我可以这样做吗?当前列表位于pd.dataframe中

我希望我能像这样做

CV=计数向量器(“loadMyListofnGrams”)

任何帮助都将不胜感激


Tags: 模型dataframe列表矢量错误特性向量cv
1条回答
网友
1楼 · 发布于 2024-04-25 19:21:34

您可以通过使用n-gram列表训练CountVectorizer来实现这一点

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

ngrams = ['coffee', 'darkly', 'darkly colored', 'bitter', 'stimulating',
          'drinks', 'stimulating drinks']

new_docs = [
           'Coffee is darkly colored, bitter, slightly acidic and \
            has a stimulating effect in humans, primarily due to its \
            caffeine content.[3] ',
            'It is one of the most popular drinks \
            in the world,[4] and it can be prepared and presented in a \
            variety of ways (e.g., espresso, French press, caffè latte). '
            ]

# Instantiate CountVectorizer and train it with your ngrams
cv = CountVectorizer(ngram_range=(1, 2))
cv.fit(ngrams)
cv.vocabulary_

# Apply the vectorizer to new documents and display the dense matrix
counts = cv.transform(new_docs)
counts.A

# Turn the results into a data frame
counts_df = pd.DataFrame(counts.A, columns=cv.get_feature_names())
counts_df

输出

cv.vocabulary_
Out[10]: 
{'coffee': 1,
 'darkly': 3,
 'colored': 2,
 'darkly colored': 4,
 'bitter': 0,
 'stimulating': 6,
 'drinks': 5,
 'stimulating drinks': 7}

counts_df
Out[12]: 
   bitter  coffee  colored  darkly  darkly colored  drinks  stimulating  \
0       1       1        1       1               1       0            1   
1       0       0        0       0               0       1            0   

   stimulating drinks  
0                   0  
1                   0  

相关问题 更多 >