通过在tex块中查找关键字来过滤dataframe

0 [ #bbcqt Remoaners on about post Brexit racial... 1 [@sarahwollaston Shut up, you like all remoane... 2 [ what have the Brextremists ever done for us ... 3 [ Remoaner in bizarre outburst ] 4 [ Anyone who disagrees with brexit is called n... 5 [ @SkyNewsBreak They forecasted if the vote wa... 6 [ but we ARE LEAVING THE #EU, even the #TORIES... 7 [ Can unelected Remoaner peers not see how abs... 8 [@sizjam68 @LeaveEUOfficial @johnredwood It wo... 9 [ Hey @BBC have you explained why when award w... Name: text, dtype: object

2条回答

网友

1楼 · 编辑于 2024-06-16 12:48:44

问题是要在中查找子字符串remoaners的字符串包含在每个单元格的list中。在执行str.contains之前，您需要通过执行str[0]来访问此字符串，例如：

# input
time_plus_text = pd.DataFrame({'text':[['#bbcqt Remoaners on about post Brexit racial...'], 
                                       ['@sarahwollaston Shut up, you like all remoaners...'],
                                       ['what have the Brextremists ever done for us ...']]})
print (time_plus_text["text"].str[0].str.contains("remoaners", case=False, na=False))
0     True
1     True
2    False
Name: text, dtype: bool

所以你应该：

remoaners_only = time_plus_text[time_plus_text["text"].str[0]\
                                             .str.contains("remoaners", case=False, na=False)]

网友

2楼 · 编辑于 2024-06-16 12:48:44

你的代码行得通。因此，您需要检查您的输入数据或pandas错误修复版本，即0.24.1 vs 0.24.2。你知道吗

0.24.2
   index                                               text
0      0     [ #bbcqt Remoaners on about post Brexit rac...

import pandas as pd
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

print(pd.__version__)

csvdata = StringIO("""0,   [ #bbcqt Remoaners on about post Brexit racial...
1,   [@sarahwollaston Shut up, you like all remoane...
2,   [ what have the Brextremists ever done for us ...
3,                    [ Remoaner in bizarre outburst ]
4,   [ Anyone who disagrees with brexit is called n...
5,   [ @SkyNewsBreak They forecasted if the vote wa...
6,   [ but we ARE LEAVING THE #EU, even the #TORIES...
7,   [ Can unelected Remoaner peers not see how abs...
8,   [@sizjam68 @LeaveEUOfficial @johnredwood It wo...
9,   [ Hey @BBC have you explained why when award w...""")

df = pd.read_csv(csvdata, names=["index", "text"], sep=",")

result = df[df["text"].str.contains("remoaners", case=False, na=False)]

# results
print(result)

相关问题更多 >

编程相关推荐

热门问题

热门文章