从带图案的字符串列表中提取字符串,并在python中将其转换为dataFrame

2024-05-23 16:19:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个列表,其中包含如下样式的字符串:

['"Bandcamp" (2014)\t\t\t\t\ttv-mini-series',
'"ByMySide" (2012){The Happening (#1.3)}\t\t\t\t\ttwitter-hashtag-in-title',
'"Elmira" (2014)\t\t\t\t\telmira-new-york',
'"Elmira" (2014){The Happening (#1.3)}\t\t\tfriend',
...]

现在,我尝试从每行中提取子字符串,并将它们制作成如下数据帧:

Movie    Year Keyword
Bandcamp 2014 tv-mini-series
ByMySide 2012 twitter-hashtag-in-title
Elmira   2014 elmira-new-york
Elmira   2014 friend
...

Tags: the字符串in列表newtitle样式series
1条回答
网友
1楼 · 发布于 2024-05-23 16:19:26

给你:

>>> a
['"Bandcamp" (2014)\t\t\t\t\ttv-mini-series', '"ByMySide" (2012){The Happening (#1.3)}\t\t\t\t\ttwitter-hashtag-in-title', '"Elmira" (2014)\t\t\t\t\telmira-new-york', '"Elmira" (2014){The Happening (#1.3)}\t\t\tfriend']
>>> data = []
>>> for x in a:
...     data.append(re.findall("\"(\w+)\" \((\d+)\).*\t{2,5}(\S+)", x)[0])
... 
>>> import pandas as pd
>>> pd.DataFrame(data, columns=['Movie', 'Year', 'Keyword'])
      Movie  Year                   Keyword
0  Bandcamp  2014            tv-mini-series
1  ByMySide  2012  twitter-hashtag-in-title
2    Elmira  2014           elmira-new-york
3    Elmira  2014                    friend    

相关问题 更多 >