基于逗号将一列拆分为几列

df: +----------------------------------------------------------------------------------------------------------+ |location +----------------------------------------------------------------------------------------------------------+ | Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Putih, Kec. Sawangan, Kota Depok, Jawa Barat 16519, Indonesia | Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri, Bogor, Jawa Barat 16965, Indonesia | Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. Pancoran Mas, Kota Depok, Jawa Barat 16434, Indonesia | Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. Bandung Kulon, Kota Bandung, Jawa Barat 40211, Indonesia | 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec. Cibiru, Kota Bandung, Jawa Barat 40614, Indonesia +----------------------------------------------------------------------------------------------------------+

def extract_city_state(a): asplit = a.split(",") city = asplit[-3].split() state = asplit[-2].split()[0:1] return city, state df.join( df['location'].apply( lambda x: pd.Series(extract_city_state(x), index=["City", "State"]) ) )

--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-29-64a945be5d02> in <module> 1 df.join( 2 df['location'].apply( ----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"]) 4 ) 5 ) ~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds) 4043 else: 4044 values = self.astype(object).values -> 4045 mapped = lib.map_infer(values, f, convert=convert_dtype) 4046 4047 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() <ipython-input-29-64a945be5d02> in <lambda>(x) 1 df.join( 2 df['location'].apply( ----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"]) 4 ) 5 ) <ipython-input-22-f1d63ccd82dc> in extract_city_state(a) 1 def extract_city_state(a): 2 asplit = a.split(",") ----> 3 city = asplit[-3].split() 4 state = asplit[-2].split()[0:1] 5 return city, state IndexError: list index out of range

2条回答

网友

1楼 · 编辑于 2024-06-01 00:11:21

如果希望将其作为函数保留，只需将lambda函数结果存储为变量，然后再将其加入df即可：

city_state_split = df['location'].apply(
        lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
    )
df.join(city_state_split)

网友

2楼 · 编辑于 2024-06-01 00:11:21

如果str[]索引-first^{}选择的值不匹配，则仅使用pandasstr函数来避免error的值，因为n=1参数：

s = df['location'].str.split(',')

df['city'] = s.str[-3]
df['province'] = s.str[-2].str.rsplit(n=1).str[0]
print (df)
                                            location           city  \
0  Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Pu...     Kota Depok   
1  Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri...          Bogor   
2  Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. P...     Kota Depok   
3  Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. B...   Kota Bandung   
4  1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec....   Kota Bandung   

      province  
0   Jawa Barat  
1   Jawa Barat  
2   Jawa Barat  
3   Jawa Barat  
4   Jawa Barat

相关问题更多 >

编程相关推荐

热门问题

热门文章