text = '\n'.join([x.strip() for x in rows[0].get_text().split('\n')
if x.strip()]).replace(':\n', ': ')
data_dict = {k.strip(): v.strip() for k, v in
[x.split(':', 1) for x in text.split('\n')]}
怎么做的?:
这需要文本和
把它拆分成新行
删除所有空行
删除所有前导/尾随空格
将行重新连接到一个文本中
将以:结尾的任何行与下一行联接
然后:
用换行符再次拆分文本
按:拆分每行
删除:两侧文本结尾的任何空白
使用拆分文本作为dict的键和值
测试代码:
^{pr2}$
结果:
Award Amount Description \
0 $84,907 To strengthen the capacity of China's rights d...
1 $204,973 To provide an effective forum for free express...
2 $48,000 To promote religious freedom in China. The org...
3 $89,000 To educate and train civil society activists o...
4 $65,000 To encourage greater public discussion, transp...
Organization Name Project Country Project Focus \
0 NaN Mainland China Rule of Law
1 Princeton China Initiative Mainland China Freedom of Information
2 NaN Mainland China Rule of Law
3 NaN Mainland China Democratic Ideas and Values
4 NaN Mainland China Rule of Law
Project Region Project Title Year
0 Asia Empowering the Chinese Legal Community 2014
1 Asia Supporting Free Expression and Open Debate for... 2014
2 Asia Religious Freedom, Rights Defense and Rule of ... 2014
3 Asia Education on Civil Society and Democratization 2014
4 Asia Promoting Democratic Policy Change in China 2014
我建议以不同的方式处理表解析。所有信息都在每个表的第一行中可用。因此,您可以分析行的文本,如下所示:
代码:
怎么做的?:
这需要文本和
:
结尾的任何行与下一行联接然后:
:
拆分每行:
两侧文本结尾的任何空白dict
的键和值测试代码:
^{pr2}$结果:
相关问题 更多 >
编程相关推荐