如何使用几个条件(regex?)修改pandas中的字符串列

2024-03-28 22:19:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个df看起来像这样

    Identifier      Ticker      ISIN        Relationship Country .......
0   5.037663e+09    BTGGg.F     D10080162   Supplier     Germany .......
1   4.295870e+09    IVXG.DE     NaN         Supplier     Germany .......
2   5.043321e+09    SAPG.DE     D66992104   Customer     Germany
3   4.295869e+09    BMWG.DE     D12096109   Customer     Germany    
4   4.295870e+09    DTEGn.DE    D2035M136   Customer     Germany
5   4.295870e+09    IFXGn.DE    D35415104   Supplier     Germany
6   4.295869e+09    NSUG.DE     D04312100   Customer     Germany
7   5.000074e+09    EVKn.DE     D2R90Y117   Customer     Germany
8   4.295869e+09    LHAG.DE     D1908N106   Customer     Germany
9   4.295869e+09    MTXGn.DE    D5565H104   Supplier     Germany
10  4.295869e+09    SIEGn.DE    D69671218   Supplier     Germany
11  4.295870e+09    TKAG.DE     D8398Q119   Supplier     Germany
12  5.059963e+09    BNRGn.DE    D12459117   Customer     Germany
13  4.295869e+09    RHMG.DE     D65111102   Supplier     Germany
14  5.001195e+09    GBFG.DE     D11648108   Supplier     Germany
15  4.295869e+09    NXUG.DE     D5650J106   Customer     Germany
16  4.295870e+09    DPWGn.DE    D19225107   Supplier     Germany
17  4.295870e+09    ILM1k.DE    D22430116   Supplier     Germany
18  4.295869e+09    ADSGn.DE    D0066B185   Customer     Germany
19. 5.125125e+12    DBS.SG      D12300523.  Supplier     SG
........................................................................

在df['Country']=='Germany'所在的行中

我想执行两个功能

职能1: 我想隔离有小写字母的行,“.”之前的任何小写字母,我想删除它,这样BTGGg.F将变成BTGG.F,1LM1k.DE将变成1LM1.DE,但NXUG.DE将不受影响

在第一个函数之后使用新的数据帧

职能2: 然后对于“.”前面有大写字母G的行,我想删除G,这样RHMG.DE将变成RHM.DE,但1LM1.DE将不受影响

如果我只想删除“.”前面的字母,那么函数将非常简单,如x = x.replace(x[x.find(".")-1],"")

但我不知道如何在数据帧中实现这一点,也不知道如何应用我提到的条件。可以这样做吗?如果可以,怎么做

我想它可能看起来像这样,但这显然不起作用,我已经试过了

df.loc[df['Country'].eq('Germany'),'Ticker'] = df.loc[df['Country'].eq('Germany'),'Ticker'].str.replace((df['Ticker'][df['Ticker'].find(".")-1],"") if df['Ticker'][df['Ticker'].find(".")-1] == '([a-z])')

df.loc[df['Country'].eq('Germany'),'Ticker'] = df.loc[df['Country'].eq('Germany'),'Ticker'].str.replace((df['Ticker'][df['Ticker'].find(".")-1],"") if df['Ticker'][df['Ticker'].find(".")-1] == 'G')

以下是第一轮后的输出结果,从Country==Germany,删除“.”前面的第一个小写字母:

    Identifier      Ticker      ISIN        Relationship Country .......
0   5.037663e+09    BTGG.F      D10080162   Supplier     Germany .......
1   4.295870e+09    IVXG.DE     NaN         Supplier     Germany .......
2   5.043321e+09    SAPG.DE     D66992104   Customer     Germany
3   4.295869e+09    BMWG.DE     D12096109   Customer     Germany    
4   4.295870e+09    DTEG.DE     D2035M136   Customer     Germany
5   4.295870e+09    IFXG.DE     D35415104   Supplier     Germany
6   4.295869e+09    NSUG.DE     D04312100   Customer     Germany
7   5.000074e+09    EVK.DE      D2R90Y117   Customer     Germany
8   4.295869e+09    LHAG.DE     D1908N106   Customer     Germany
9   4.295869e+09    MTXG.DE     D5565H104   Supplier     Germany
10  4.295869e+09    SIEG.DE     D69671218   Supplier     Germany
11  4.295870e+09    TKAG.DE     D8398Q119   Supplier     Germany
12  5.059963e+09    BNRG.DE     D12459117   Customer     Germany
13  4.295869e+09    RHMG.DE     D65111102   Supplier     Germany
14  5.001195e+09    GBFG.DE     D11648108   Supplier     Germany
15  4.295869e+09    NXUG.DE     D5650J106   Customer     Germany
16  4.295870e+09    DPWG.DE     D19225107   Supplier     Germany
17  4.295870e+09    ILM1.DE     D22430116   Supplier     Germany
18  4.295869e+09    ADSG.DE     D0066B185   Customer     Germany
19. 5.125125e+12    DBS.SG      D12300523   Supplier     SG
........................................................................

这是在第二轮之后,从国家==德国,删除“.”前面的第一个大写字母“G”:

    Identifier      Ticker      ISIN        Relationship Country .......
0   5.037663e+09    BTG.F       D10080162   Supplier     Germany .......
1   4.295870e+09    IVX.DE      NaN         Supplier     Germany .......
2   5.043321e+09    SAP.DE      D66992104   Customer     Germany
3   4.295869e+09    BMW.DE      D12096109   Customer     Germany    
4   4.295870e+09    DTE.DE      D2035M136   Customer     Germany
5   4.295870e+09    IFX.DE      D35415104   Supplier     Germany
6   4.295869e+09    NSU.DE      D04312100   Customer     Germany
7   5.000074e+09    EVK.DE      D2R90Y117   Customer     Germany
8   4.295869e+09    LHA.DE      D1908N106   Customer     Germany
9   4.295869e+09    MTX.DE      D5565H104   Supplier     Germany
10  4.295869e+09    SIE.DE      D69671218   Supplier     Germany
11  4.295870e+09    TKA.DE      D8398Q119   Supplier     Germany
12  5.059963e+09    BNR.DE      D12459117   Customer     Germany
13  4.295869e+09    RHM.DE      D65111102   Supplier     Germany
14  5.001195e+09    GBF.DE      D11648108   Supplier     Germany
15  4.295869e+09    NXU.DE      D5650J106   Customer     Germany
16  4.295870e+09    DPW.DE      D19225107   Supplier     Germany
17  4.295870e+09    ILM1.DE     D22430116   Supplier     Germany
18  4.295869e+09    ADS.DE      D0066B185   Customer     Germany
19. 5.125125e+12    DBS.SG      D12300523   Supplier     SG
........................................................................


Tags: dfdecustomerfindsgcountryloceq
2条回答
df.loc[df.Country == 'Germany', 'Ticker'] = df[df.Country == 'Germany']['Ticker'].str.replace(r'[a-z](?=\.)', '').str.replace(r'G(?=\.)', '')
print(df)

印刷品:

      Identifier   Ticker        ISIN Relationship  Country
0   5.037663e+09    BTG.F   D10080162     Supplier  Germany
1   4.295870e+09   IVX.DE         NaN     Supplier  Germany
2   5.043321e+09   SAP.DE   D66992104     Customer  Germany
3   4.295869e+09   BMW.DE   D12096109     Customer  Germany
4   4.295870e+09   DTE.DE   D2035M136     Customer  Germany
5   4.295870e+09   IFX.DE   D35415104     Supplier  Germany
6   4.295869e+09   NSU.DE   D04312100     Customer  Germany
7   5.000074e+09   EVK.DE   D2R90Y117     Customer  Germany
8   4.295869e+09   LHA.DE   D1908N106     Customer  Germany
9   4.295869e+09   MTX.DE   D5565H104     Supplier  Germany
10  4.295869e+09   SIE.DE   D69671218     Supplier  Germany
11  4.295870e+09   TKA.DE   D8398Q119     Supplier  Germany
12  5.059963e+09   BNR.DE   D12459117     Customer  Germany
13  4.295869e+09   RHM.DE   D65111102     Supplier  Germany
14  5.001195e+09   GBF.DE   D11648108     Supplier  Germany
15  4.295869e+09   NXU.DE   D5650J106     Customer  Germany
16  4.295870e+09   DPW.DE   D19225107     Supplier  Germany
17  4.295870e+09  ILM1.DE   D22430116     Supplier  Germany
18  4.295869e+09   ADS.DE   D0066B185     Customer  Germany
19  5.125125e+12   DBS.SG  D12300523.     Supplier       SG

为了处理你提到的功能, 您可以使用一些正则表达式来执行它们

re.sub(r'G?[a-z]?(?=\.)', '', 'HERE_YOUR_COUNTRY_THINGY')

你没有给更多的规则

相关问题 更多 >