如何在python中创建包含单词及其出现频率的csv文件。你知道吗
我删除了停止词,对文本数据进行了符号化和计数向量化
我的代码
data['Clean_addr'] = data['Adj_Addr'].apply(lambda x: ' '.join([item.lower() for item in x.split()]))
data['Clean_addr']=data['Clean_addr'].apply(lambda x:"".join([item.lower() for item in x if not item.isdigit()]))
data['Clean_addr']=data['Clean_addr'].apply(lambda x:"".join([item.lower() for item in x if item not in string.punctuation]))
data['Clean_addr'] = data['Clean_addr'].apply(lambda x: ' '.join([item.lower() for item in x.split() if item not in (new_stop_words)]))
cv = CountVectorizer( max_features = 200,analyzer='word')
cv_addr = cv.fit_transform(data.pop('Clean_addr'))
我正在使用的文件的示例转储
https://www.dropbox.com/s/allhfdxni0kfyn6/Test.csv?dl=0
**Expected output**
Word Freq
Industry 40
Limited 23
House 45
flat 56
您可以先创建
DataFrame
,然后再创建sum
:相关问题 更多 >
编程相关推荐