从数据帧中删除特定字符串

2条回答

网友

1楼 · 编辑于 2024-04-19 14:00:53

只需使用df.replace和regex模式进行字符串匹配

df[['Mileage','Engine','Power']] = df[['Mileage','Engine','Power']].replace(to_replace=r'([a-z/]+|[A-Z/]+)', value='', regex=True)

印刷品：

  Year Fuel_Type  Mileage  Engine   Power
0  2010       LPG   26.6    998     58.16 
1  2011    Diesel  19.67    1582    126.2

网友

2楼 · 编辑于 2024-04-19 14:00:53

您可以尝试使用regular expression来执行此操作

这里有一个关于如何做到这一点的快速示例。另外，我假设您已经知道阅读您右侧的数据集，所以您可以做的就是获取列并对其进行迭代，然后应用我提供的正则表达式示例代码

当涉及到读取数据集时，我个人喜欢使用pandas

import re

l = ["26.6 km/kg","19.67 kmpl","998 CC","58.16 bhp"]


for i in l:
    t = re.sub(r'\D+$','',i)
    print(t)

输出：

26.6
19.67
998
58.16
[Finished in 0.2s]

如果你对\D+$感到好奇，这就是它的意思

\D Returns a match where the string DOES NOT contain digits "\D"

The + sign basically means any occurrences of it 1 or more.

$ means ends with

阅读有关正则表达式here的更多信息