如何使用正则表达式从pandas dataframe中提取特定内容？

In [114]: df['movie_title'].head() Out[114]: 0 Toy Story (1995) 1 GoldenEye (1995) 2 Four Rooms (1995) 3 Get Shorty (1995) 4 Copycat (1995) ... Name: movie_title, dtype: object

2条回答

网友

1楼 · 编辑于 2024-06-06 05:33:11

您应该像下面这样用()分配文本组来捕获它的特定部分。

new_df['just_movie_titles'] = df['movie_title'].str.extract('(.+?) \(')
new_df['just_movie_titles']

pandas.core.strings.StringMethods.extract
StringMethods.extract(pat, flags=0, **kwargs)
Find groups in each string using passed regular expression

网友

2楼 · 编辑于 2024-06-06 05:33:11

您可以尝试^{}和^{}，但最好使用^{}，因为在电影名称中也可以是数字。下一个解决方案是通过regex和^{}前导和尾随空格^{}括号的内容：

#convert column to string
df['movie_title'] = df['movie_title'].astype(str)

#but it remove numbers in names of movies too
df['titles'] = df['movie_title'].str.extract('([a-zA-Z ]+)', expand=False).str.strip()
df['titles1'] = df['movie_title'].str.split('(', 1).str[0].str.strip()
df['titles2'] = df['movie_title'].str.replace(r'\([^)]*\)', '').str.strip()
print df
          movie_title      titles      titles1      titles2
0  Toy Story 2 (1995)   Toy Story  Toy Story 2  Toy Story 2
1    GoldenEye (1995)   GoldenEye    GoldenEye    GoldenEye
2   Four Rooms (1995)  Four Rooms   Four Rooms   Four Rooms
3   Get Shorty (1995)  Get Shorty   Get Shorty   Get Shorty
4      Copycat (1995)     Copycat      Copycat      Copycat

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用正则表达式从pandas dataframe中提取特定内容？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >