如何使用正则表达式语法从给定列中的文本中删除“省略号”?

2024-06-16 11:56:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用此代码,但它没有删除“省略号”:

Column Review包含1500行文本

Df["Reviews"] = Df['Reviews'].apply(lambda x : " ".join(re.findall('[\w\.]+',x)))

示例文本将是:“经销商说,它不偿还经销商的贷款或租金。。。所以,如果他们生产出有问题的汽车,而你却在帮助客户,那为什么还要做经销商呢?”


Tags: lambda代码文本re示例dfcolumnreview
3条回答

你可以尝试以下任何一种方法-

REGEX

import pandas as pd
pd.set_option('max_colwidth', 400)
df = pd.DataFrame({'Reviews':['dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers']})
df['Reviews'] = df['Reviews'].replace('\.+','.',regex=True)
print(df)

REGEX

^{pr2}$

REGEX

import re
regex = r"(\W)\1+"
test_str = "dealer said it does not reimburse dealers for loaners or rentals... so why even be a dealership if they make faulty cars and you re on the line to help customers"
subst = "\\1"
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)   
if result:
    print (result)

如果您想从每一行中删除这个特定的单词,那么就不需要使用RegEx。您可以使用str.replace,如下所示:How to strip a specific word from a string?

Df["Reviews"] = Df['Reviews'].apply(lambda x:x.replace("ellipsis",""))

在系列结构更换适用于简单表达式:

df.Reviews.str.replace("...", "")

相关问题 更多 >