我正在尝试将以下数据帧拆分为单独的列。我希望所有的文本都在一列中,数字在空白处分开
df[0].head(10)
0 []
1 [Andaman and Nicobar, 194, 52, 142, 0]
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534]
3 [Arunachal Pradesh, 609, 431, 175, 3]
4 [Assam, 20,646, 6,490, 14,105, 51]
5 [Bihar, 23,589, 8,767, 14,621, 201]
6 [Chandigarh, 660, 169, 480, 11]
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23]
8 [Dadra and Nagar Haveli and Daman, 585, 182, 4...
9 [Daman and Diu, 0, 0, 0, 0]
Name: 0, dtype: object
如果仅在空白处拆分并展开,尽管数字被正确拆分,但文本被拆分为多列。由于不同观测值的文本跨越不同的列数,因此我无法再次对它们进行归纳。显然,解决方案是编写正确的“regex”并在其上拆分。我无法计算所需的正则表达式,因此请求输入
df1 = df[0].str.split(' ', expand= True)
df1.head(10)
0 1 2 3 4 5 6 7 8 9
0 [] None None None None None None None None None
1 [Andaman and Nicobar, 194, 52, 142, 0] None None None
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534] None None None None
3 [Arunachal Pradesh, 609, 431, 175, 3] None None None None
4 [Assam, 20,646, 6,490, 14,105, 51] None None None None None
5 [Bihar, 23,589, 8,767, 14,621, 201] None None None None None
6 [Chandigarh, 660, 169, 480, 11] None None None None None
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23] None None None None None
8 [Dadra and Nagar Haveli and Daman, 585, 182, 401, 2]
9 [Daman and Diu, 0, 0, 0, 0] None None None
我期望的结果如下:
0 1 2 3 4 5 6 7 8 9
0 [] None None None None None None None None None
1 [Andaman and Nicobar, 194, 52, 142, 0] None None None None None
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534] None None None None None
3 [Arunachal Pradesh, 609, 431, 175, 3] None None None None None
4 [Assam, 20,646, 6,490, 14,105, 51] None None None None None
5 [Bihar, 23,589, 8,767, 14,621, 201] None None None None None
6 [Chandigarh, 660, 169, 480, 11] None None None None None
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23] None None None None None
8 [Dadra and Nagar Haveli and Daman, 585, 182, 401, 2] None None None None None
9 [Daman and Diu, 0, 0, 0, 0] None None None None None
您可以使用
str.replace
和str.extract
来重新塑造您的数据框架相关问题 更多 >
编程相关推荐