在di中满足条件时添加数据框架列

2024-04-26 04:38:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试向pandas.DataFrame添加一列。如果DataFrame中的字符串有一个或多个单词作为dict中的键,但它给了我一个错误,我不知道哪里出错了。有人能帮忙吗?你知道吗

数据帧:

tw_test.head()

    tweet   
0   living the dream. #cameraman #camera #camerac...    
1   justin #trudeau's reasons for thanksgiving. to...   
2   @themadape butt…..butt…..we’re allergic to l... 
3   2 massive explosions at peace march in #turkey...   
4   #mulcair suggests there’s bad blood between hi...   

口述:

party={}
{'#mulcair': 'NDP', '#cdnleft': 'liberal', '#LiberalExpress': 'liberal', '#ThankYouStephenHarper': 'Conservative ', '#pmjt': 'liberal'...}

我的代码:

tw_test["party"]=tw_test["tweet"].apply(lambda x: party[x.split(' ')[1].startswith("#")[0]])

Tags: to数据字符串testdataframepandasparty错误
1条回答
网友
1楼 · 发布于 2024-04-26 04:38:41

我相信你的麻烦是因为你想把太多的东西塞进一个lambda。执行查找的函数非常简单:

代码:

party_tags = {
    '#mulcair': 'NDP',
    '#cdnleft': 'liberal',
    '#LiberalExpress': 'liberal',
    '#ThankYouStephenHarper': 'Conservative ',
    '#pmjt': 'liberal'
}

def party(tweet):
    for tag in [t for t in tweet.split() if t.startswith('#')]:
        if tag in party_tags:
            return party_tags[tag]

测试代码:

import pandas as pd
tw_test = pd.DataFrame([x.strip() for x in u"""
    living the dream. #cameraman #camera #camerac
    justin #trudeau's reasons for thanksgiving. to
    @themadape butt…..butt…..we’re allergic to
    2 massive explosions at peace march in #turkey
    #mulcair suggests there’s bad blood between
""".split('\n')[1:-1]], columns=['tweet'])

tw_test["party"] = tw_test["tweet"].apply(party)
print(tw_test)

结果:

                                            tweet party
0  living the dream. #cameraman #camera #camerac  None
1  justin #trudeau's reasons for thanksgiving. to  None
2      @themadape butt…..butt…..we’re allergic to  None
3  2 massive explosions at peace march in #turkey  None
4     #mulcair suggests there’s bad blood between   NDP

相关问题 更多 >