我有这样的数据集
import pandas as pd
df = pd.DataFrame({'word': ['abs e learning ', 'abs e-learning', 'abs e&learning', 'abs elearning']})
我想得到
word
0 abs elearning
1 abs elearning
2 abs elearning
3 abs elearning
我做如吼
re_map = {r'\be learning\b': 'elearning', r'\be-learning\b': 'elearning', r'\be&learning\b': 'elearning'}
import re
for r, map in re_map.items():
df['word'] = re.sub(r, map, df['word'])
和错误
TypeError Traceback (most recent call last)
<ipython-input-42-fbf00d9a0cba> in <module>()
3 s = df['word']
4 for r, map in re_map.items():
----> 5 df['word'] = re.sub(r, map, df['word'])
C:\Users\Edward\Anaconda3\lib\re.py in sub(pattern, repl, string, count, flags)
180 a callable, it's passed the match object and must return
181 a replacement string to be used."""
--> 182 return _compile(pattern, flags).sub(repl, string, count)
183
184 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
我可以这样做
for r, map in re_map.items():
df['word'] = re.sub(r, map, str(df['word']))
没有错,但我不能像我希望的那样得到pd.dataFrame
word
0 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
1 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
2 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
3 0 0 0 abs elearning \n1 abs elearning\...\n1 0 0 abs elearning \n1 abs elearning\...\n2 0 0 abs elearning \n1 abs ele...
如何改进?
df['word']
是一个列表。转换成字符串只会破坏您的列表。您需要对每个成员应用regex:
不理解列表的经典替代方法:
顺便说一下,您可以大大简化regex列表:
通过这样做,您只有一个正则表达式,这就变成了一行:
甚至可以通过为所有替换预编译regex来加快速度:
相关问题 更多 >
编程相关推荐