基于逗号将一列拆分为几列

2024-06-01 00:11:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将一个列拆分为一个特定的列,比如城市和省份

我有一个数据框,看起来像:

df:
+----------------------------------------------------------------------------------------------------------+
|location                                                                                          
+----------------------------------------------------------------------------------------------------------+
| Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Putih, Kec. Sawangan, Kota Depok, Jawa Barat 16519, Indonesia   
| Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri, Bogor, Jawa Barat 16965, Indonesia                 
| Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. Pancoran Mas, Kota Depok, Jawa Barat 16434, Indonesia 
| Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. Bandung Kulon, Kota Bandung, Jawa Barat 40211, Indonesia
| 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec. Cibiru, Kota Bandung, Jawa Barat 40614, Indonesia
+----------------------------------------------------------------------------------------------------------+

我想把它摘录到另一个名为“城市和省”的专栏中

输出可能如下所示:

df:

+-------------+-------------------+------------+
| location    |  Cities           |  province  | 
+-------------+-------------------+------------+
|  .....      |  Kota Depok       | Jawa Barat |    
|  .....      |  Bogor            | Jawa Barat |      
|  .....      |  Kota Depok       | Jawa Barat |     
|  .....      |  Kota Bandung     | Jawa Barat |    
|  .....      |  Kota Bandung     | Jawa Barat |   
+-------------+------------+-------------------+

我试着用这个方法:

def extract_city_state(a):
    asplit = a.split(",")
    city = asplit[-3].split()
    state = asplit[-2].split()[0:1]
    return city, state

df.join(
    df['location'].apply(
        lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
    )
)

但它又回来了

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-29-64a945be5d02> in <module>
      1 df.join(
      2     df['location'].apply(
----> 3         lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
      4     )
      5 )

~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   4043             else:
   4044                 values = self.astype(object).values
-> 4045                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4046 
   4047         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-29-64a945be5d02> in <lambda>(x)
      1 df.join(
      2     df['location'].apply(
----> 3         lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
      4     )
      5 )

<ipython-input-22-f1d63ccd82dc> in extract_city_state(a)
      1 def extract_city_state(a):
      2     asplit = a.split(",")
----> 3     city = asplit[-3].split()
      4     state = asplit[-2].split()[0:1]
      5     return city, state

IndexError: list index out of range

如何克服这个问题


Tags: incitydfextractlocationsplitstatejl
2条回答

如果希望将其作为函数保留,只需将lambda函数结果存储为变量,然后再将其加入df即可:

city_state_split = df['location'].apply(
        lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
    )
df.join(city_state_split)

如果str[]索引-first^{}选择的值不匹配,则仅使用pandasstr函数来避免error的值,因为n=1参数:

s = df['location'].str.split(',')

df['city'] = s.str[-3]
df['province'] = s.str[-2].str.rsplit(n=1).str[0]
print (df)
                                            location           city  \
0  Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Pu...     Kota Depok   
1  Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri...          Bogor   
2  Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. P...     Kota Depok   
3  Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. B...   Kota Bandung   
4  1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec....   Kota Bandung   

      province  
0   Jawa Barat  
1   Jawa Barat  
2   Jawa Barat  
3   Jawa Barat  
4   Jawa Barat  

相关问题 更多 >