如何从dataframe列中提取信息

2024-04-25 08:28:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的dataframe,我想从列A中提取一些信息,然后创建其他列,根据它们的类型添加它们。 下面是一个例子来说明这一点

In [0]: df
Out[0]: 
          A                  
0 1258GA 25/01/20 TABLE 090626  038272
1 GOODIES 762088 A714816
2 TABLE AA88547 734963 GOODIES
3 WATER 02/450 FROM TOMORROW 48246
4 02H12 ALSCA 00548246B GOODIES

我想得到下面的结果

In [1]: df
Out[1]: 
          A                               Category             Date      Hour
0 1258GA 25/01/20 TABLE 090626  038272    TABLE           25/01/20
1 GOODIES 762088 A714816                  GOODIES 
2 TABLE AA88547 734963 GOODIES            TABLE GOODIES
3 WATER 02/450 FROM TOMORROW 48246        WATER 
4 02H12 ALSCA 00548246B GOODIES           GOODIES                        02H12

我试过很多方法,但都没有得到结果


Tags: infrom信息类型dataframedftableout
2条回答

您当然可以使用Series.str方法来实现这一点

Series.str.extract()返回:

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.


Series.str.findall()返回:

Find all occurrences of pattern or regular expression in the Series/Index.

下面是代码片段

编辑:

df["Category"] = df['A'].str.findall(r"(\b[A-Za-z]+\b)").str.join(' ')
df["Date"] = df['A'].str.extract(r"(\b[0-9]+/[0-9]+/[0-9]+\b)")
df["Hour"] = df['A'].str.extract(r"(\b[0-9]+H[0-9]+\b)")

产量将是,

                                      A             Category      Date   Hour
0  1258GA 25/01/20 TABLE 090626  038272                TABLE  25/01/20    NaN
1                GOODIES 762088 A714816              GOODIES       NaN    NaN
2          TABLE AA88547 734963 GOODIES        TABLE GOODIES       NaN    NaN
3      WATER 02/450 FROM TOMORROW 48246  WATER FROM TOMORROW       NaN    NaN
4         02H12 ALSCA 00548246B GOODIES        ALSCA GOODIES       NaN  02H12

也许这有助于:

df['A'].str.findall(r'\b[A-Z]+\b').str.join(' ')

0                  TABLE
1                GOODIES
2          TABLE GOODIES
3    WATER FROM TOMORROW
4          ALSCA GOODIES

相关问题 更多 >