python通过regex修改字符串列表

2024-03-29 04:49:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串列表

query_var = ["VENUE_CITY_NAME == 'Bangalore' & EVENT_GENRE == 'ROMANCE' & count_EVENT_GENRE >= 1","VENUE_CITY_NAME == 'Jamshedpur' & EVENT_GENRE == 'HORROR' & count_EVENT_GENRE >= 1"]
len(query_var)   #o/p 2

我想修改此列表以获得

 query_var = ["df['VENUE_CITY_NAME'] == 'Bangalore' & df['EVENT_GENRE'] == 'ROMANCE' & df['count_EVENT_GENRE'] >= 1","df['VENUE_CITY_NAME'] == 'Jamshedpur' & df['EVENT_GENRE'] == 'HORROR' & df['count_EVENT_GENRE'] >= 1"]

这是我的尝试:

for res in query_var:
    res = [x for x in re.split('[&)]',res)]
    print(res)
    res =  [x.strip() for x in res]
    print(res)
    res = [d.replace(d.split(' ', 1)[0], "df['"+d.split(' ', 1)[0]+"']") for d in res]
    print(res)

产生输出:

 ["VENUE_CITY_NAME == 'Bangalore' ", " EVENT_GENRE == 'ROMANCE' ", ' count_EVENT_GENRE >= 1']
 ["VENUE_CITY_NAME == 'Bangalore'", "EVENT_GENRE == 'ROMANCE'", 'count_EVENT_GENRE >= 1']
 ["df['VENUE_CITY_NAME'] == 'Bangalore'", "df['EVENT_GENRE'] == 'ROMANCE'", "df['count_EVENT_GENRE'] >= 1"]
 ["VENUE_CITY_NAME == 'Jamshedpur' ", " EVENT_GENRE == 'HORROR' ", ' count_EVENT_GENRE >= 1']
 ["VENUE_CITY_NAME == 'Jamshedpur'", "EVENT_GENRE == 'HORROR'", 'count_EVENT_GENRE >= 1']
 ["df['VENUE_CITY_NAME'] == 'Jamshedpur'", "df['EVENT_GENRE'] == 'HORROR'", "df['count_EVENT_GENRE'] >= 1"]

正如预期的那样,但是当我打印query_var时,它没有改变

query_var
Out[47]: 
   ["VENUE_CITY_NAME == 'Bangalore' & EVENT_GENRE == 'ROMANCE' & count_EVENT_GENRE >= 1","VENUE_CITY_NAME == 'Jamshedpur' & EVENT_GENRE == 'HORROR' & count_EVENT_GENRE >= 1"]

如您所见,我的代码没有产生所需的输出。有没有更好的方法,例如用列表理解?你知道吗


Tags: nameeventcitydfforvarcountres
1条回答
网友
1楼 · 发布于 2024-03-29 04:49:30

下面是一个正则表达式/列表理解解决方案:

>>> [re.sub('(\w+)\s*(==|>=)', r"df['\1'] \2", s) for s in query_var]
["df['VENUE_CITY_NAME'] == 'Bangalore' & df['EVENT_GENRE'] == 'ROMANCE' & df['count_EVENT_GENRE'] >= 1", "df['VENUE_CITY_NAME'] == 'Jamshedpur' & df['EVENT_GENRE'] == 'HORROR' & df['count_EVENT_GENRE'] >= 1"]

根据需要调整它以获得更一般的数据,例如允许“<;=”。你知道吗

编辑回复评论:

[re.sub('(\w+)(\s*(==|>=).*?)(\s*&|$)', r"(df['\1']\2)\4", s) for s in query_var]

相关问题 更多 >