我有2个csv文件,其中一个包含如下所示的句子
how are you
I want to die
I was home
I went to sleep at work
he has a bad reputation
it was me who went to him
have a good sleep home
另一个csv文件包含频率如下所示的单词
word freq
and 500
you 450
me 300
have 250
your 240
sleep 200
work 150
home 100
die 50
我正试图根据频率在300到100之间的单词将句子提取到一个新的csv文件中,并在从主csv文件中提取后删除该句子,因为有时在搜索新关键字或单词时会出现重复,这是我设法构建的代码,但没有给出我想要的输出:
import sys
import pandas as pd
import re
import string
if len(sys.argv) == 1:
print("please provide a CSV file to analys")
else:
fileinput = sys.argv[1]
dic = sys.argv[2]
wdata = pd.read_csv(fileinput, nrows=0).columns[0]
skip = int(wdata.count(' ') == 0)
wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip)
data = wdata['sentences'].str.replace('[^\w\s]', ' ')
keywords=pd.read_csv(dic)
keywords=keywords.loc[keywords['freq'].between(100, 300, inclusive=False), 'word']
df1 = data[data['sentences'].str.split(expand=True).isin(keywords).any(axis=1)]
#deleted rows by keywords
df2 = data[~data['sentences'].str.split(expand=True).isin(keywords).any(axis=1)]
print(df1)
我不知道如何在提取后删除主文件中的句子,我期望的输出是这样的
我认为您需要^{} 来选择关键字:
然后通过^{} 和^{} 选择^{}
相关问题 更多 >
编程相关推荐