Python Pandas过滤和groupby

2024-04-25 04:26:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv在熊猫工作-前十排

print frame1.head(10)

      alert         Subject    filetype type      country   status
0  33965790    44676 aba     Attachment  doc  RU,RU,RU,RU  deleted
1  33965786    44676 rcrump  Attachment  zip          NaN  deleted
2  33965771            3aba  Attachment  zip          NaN  deleted
3  33965770             NaN  Attachment   js           ,,  deleted
4  33965766             NaN  Attachment   js           ,,  deleted
5  33965761             NaN  Attachment  zip          NaN  deleted
6  33965760             NaN  Attachment  zip          NaN  deleted
7  33965757             NaN  Attachment  zip          NaN  deleted
8  33965751  35200     3aba  Attachment  doc     RU,RU,RU  deleted
9  33965747  35200   INVaba  Attachment  zip          NaN  deleted

我需要获取subject列并计算所有以“aba”作为子字符串的行。在

^{pr2}$

甚至是这样的结果

^{3}$

这是我的密码-

targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject')
print (targeted.to_string(header=False))

“无法使用”attributer“方法获取”attributer“字符串,错误:无法使用”attributer“方法”

******注意:我在前面对不同的文件类型进行了计数,这样可以-

filetype = frame1.groupby('filetype').size()
###clean up the printing
print "Delivered in Email"
print (filetype.to_string(header=False))

给了我-

Delivered in Email
Attachment    32647
Header          131
URL            9236

Tags: 字符串falseattachmentdocrujsnanzip
3条回答

要获得完整的计数,只需使用^{}后跟^{}。在

>>> df.Subject.str.contains('aba', case=False, na=False).count()
10

然后要获得包含'aba'的唯一字符串的计数,可以访问contains找到的值,然后使用^{}。在

^{pr2}$

对于您建议的第一个输出,您可以执行以下操作:

containts_aba = frame1[frame1['Subject'].str.contains('aba', case=False)
print("Occurrences of aba-",len(contains_aba))

它会根据您的条件创建另一个数据帧,然后该数据帧的长度将是出现的次数,这样您就可以打印它了。在

targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject').size()
print (targeted.to_string(header=False))

给予

^{pr2}$

相关问题 更多 >