我想将一个列拆分为一个特定的列,比如城市和省份
我有一个数据框,看起来像:
df:
+----------------------------------------------------------------------------------------------------------+
|location
+----------------------------------------------------------------------------------------------------------+
| Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Putih, Kec. Sawangan, Kota Depok, Jawa Barat 16519, Indonesia
| Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri, Bogor, Jawa Barat 16965, Indonesia
| Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. Pancoran Mas, Kota Depok, Jawa Barat 16434, Indonesia
| Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. Bandung Kulon, Kota Bandung, Jawa Barat 40211, Indonesia
| 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec. Cibiru, Kota Bandung, Jawa Barat 40614, Indonesia
+----------------------------------------------------------------------------------------------------------+
我想把它摘录到另一个名为“城市和省”的专栏中
输出可能如下所示:
df:
+-------------+-------------------+------------+
| location | Cities | province |
+-------------+-------------------+------------+
| ..... | Kota Depok | Jawa Barat |
| ..... | Bogor | Jawa Barat |
| ..... | Kota Depok | Jawa Barat |
| ..... | Kota Bandung | Jawa Barat |
| ..... | Kota Bandung | Jawa Barat |
+-------------+------------+-------------------+
我试着用这个方法:
def extract_city_state(a):
asplit = a.split(",")
city = asplit[-3].split()
state = asplit[-2].split()[0:1]
return city, state
df.join(
df['location'].apply(
lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
)
)
但它又回来了
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-29-64a945be5d02> in <module>
1 df.join(
2 df['location'].apply(
----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
4 )
5 )
~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
4043 else:
4044 values = self.astype(object).values
-> 4045 mapped = lib.map_infer(values, f, convert=convert_dtype)
4046
4047 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-29-64a945be5d02> in <lambda>(x)
1 df.join(
2 df['location'].apply(
----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
4 )
5 )
<ipython-input-22-f1d63ccd82dc> in extract_city_state(a)
1 def extract_city_state(a):
2 asplit = a.split(",")
----> 3 city = asplit[-3].split()
4 state = asplit[-2].split()[0:1]
5 return city, state
IndexError: list index out of range
如何克服这个问题
如果希望将其作为函数保留,只需将lambda函数结果存储为变量,然后再将其加入df即可:
如果} 选择的值不匹配,则仅使用pandas
str[]
索引-first^{str
函数来避免error
的值,因为n=1
参数:相关问题 更多 >
编程相关推荐