如何为数据帧中的列创建反向索引?

2024-04-16 17:15:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我为废弃的数据创建了一个数据框,删除了标点符号、停止字并将其标记化。 如何为name和brand列创建反向索引?你知道吗

import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
import pandas as pd
tokens = RegexpTokenizer(r'\w+')
macys_df['name'] = macys_df['name'].apply(lambda x: tokens.tokenize(x.lower()))
macys_df.head()
stop_words = set(stopwords.words('english')) 
stop_words = stop_words.union(",","(",")","[","]","{","}","#","@","!",":",";",".","?")

macys_df['name'] = macys_df['name'].apply(lambda x: [item for item in x if item not in stop_words])
Output - 
macys_df['name'].head()
0    [versa, 2, black, elastomer, strap, touchscree...
1    [men, digital, black, resin, strap, watch, 50,...
2      [versa, lite, white, strap, smart, watch, 39mm]
3    [access, mkgo, black, silicone, strap, touchsc...
4    [inspire, black, strap, activity, tracker, 19,...

Tags: 数据namefromimportdfitemblackstop