BeautifulSoup在<b>之后和之前解析文本</br>

1条回答

网友

1楼 · 发布于 2024-06-10 17:14:29

我建议以不同的方式处理表解析。所有信息都在每个表的第一行中可用。因此，您可以分析行的文本，如下所示：

代码：

text = '\n'.join([x.strip() for x in rows[0].get_text().split('\n')
                  if x.strip()]).replace(':\n', ': ')
data_dict = {k.strip(): v.strip() for k, v in
             [x.split(':', 1) for x in text.split('\n')]}

怎么做的？：

这需要文本和

把它拆分成新行
删除所有空行
删除所有前导/尾随空格
将行重新连接到一个文本中
将以:结尾的任何行与下一行联接

然后：

用换行符再次拆分文本
按:拆分每行
删除:两侧文本结尾的任何空白
使用拆分文本作为dict的键和值

测试代码：

^{pr2}$

结果：

  Award Amount                                        Description  \
0      $84,907  To strengthen the capacity of China's rights d...   
1     $204,973  To provide an effective forum for free express...   
2      $48,000  To promote religious freedom in China. The org...   
3      $89,000  To educate and train civil society activists o...   
4      $65,000  To encourage greater public discussion, transp...   

            Organization Name Project Country                Project Focus  \
0                         NaN  Mainland China                  Rule of Law   
1  Princeton China Initiative  Mainland China       Freedom of Information   
2                         NaN  Mainland China                  Rule of Law   
3                         NaN  Mainland China  Democratic Ideas and Values   
4                         NaN  Mainland China                  Rule of Law   

  Project Region                                      Project Title  Year  
0           Asia             Empowering the Chinese Legal Community  2014  
1           Asia  Supporting Free Expression and Open Debate for...  2014  
2           Asia  Religious Freedom, Rights Defense and Rule of ...  2014  
3           Asia     Education on Civil Society and Democratization  2014  
4           Asia        Promoting Democratic Policy Change in China  2014

代码：

怎么做的？：

测试代码：

结果：

相关问题更多 >

编程相关推荐

热门问题

热门文章

BeautifulSoup在<b>之后和之前解析文本</br>

代码：

怎么做的？：

测试代码：

结果：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >