用靓汤选择文本数据

2024-05-13 23:44:23 发布

您现在位置:Python中文网/ 问答频道 /正文

好吧,我试着用python beautiful soup从下面的html中选择文本数据,但是我遇到了麻烦。基本上在<b>中有一个标题,但是我想要的是这个之外的数据。例如第一种是评估类型,但我只想要能力曲线。以下是我目前所掌握的情况:

modelinginfo = soup.find( "div", {"id":"genInfo"} ) # this is my raw data
rows=modelinginfo.findChildren(['p']) # this is the data displayed below
for row in rows:
    print(row)
    print('/n')
    cells = row.findChildren('p')
    for cell in cells:
         value = cell.string
         print("The value in this cell is %s" % value)


[<p><b>Assessment Type: </b>Capacity curve</p>,
 <p><b>Name: </b>Borzi et al (2008) - Capacity-Xdir 4Storeys InfilledFrame NonSismicallyDesigned</p>,
 <p><b>Category: </b>Structure specific - Building</p>,
 <p><b>Taxonomy: </b>CR/LFINF+DNO/HEX:4 (GEM)</p>,
 <p><b>Reference: </b>The influence of infill panels on vulnerability curves for RC buildings (Borzi B., Crowley H., Pinho R., 2008) - Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, China</p>,
 <p><b>Web Link: </b><a href="http://www.iitk.ac.in/nicee/wcee/article/14_09-01-0111.PDF" style="color:blue" target="_blank"> http://www.iitk.ac.in/nicee/wcee/article/14_09-01-0111.PDF</a></p>,
 <p><b>Methodology: </b>Analytical</p>,
 <p><b>General Comments: </b>Sample Data: A 4-storey building designed according to the 1992 Italian design code (DM, 1992), considering gravity loads only, and the Decreto Ministeriale 1996 (DM, 1996) when considering seismic action (the seismically designed building has been designed assuming a lateral force equal to 10% of the seismic weight, c=10%, and with a triangular distribution shape).

 The Y axis in the capacity curve represent the collapse multiplier: Base shear resistance over seismic weight.</p>,
 <p><b>Geographical Applicability: </b> Italy</p>]

Tags: ofthe数据inforisvaluecell
1条回答
网友
1楼 · 发布于 2024-05-13 23:44:23

1.)您可以迭代pchildren并打印除b标记之外的所有内容:

for cell in cells:
    for element in cell.children:
        if element.name != 'b':
            print("The value in this cell is %s" % element)

2.)您可以使用^{}方法来清理b标记不需要的内容:

^{pr2}$

相关问题 更多 >