在python中跳过特定的文本

2024-04-26 15:12:33 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的html文件

<tr>
<td>1</td>
<td style="font-weight: bold;"><a href="#" onclick="javascript:TollPlazaPopup(272);"> Kherki Daula </a></td> 
<td style="font-weight: bold;">60 <a onclick="return popitup(" https:="" www.google.co.in="" maps="" @28.395604,76.98176,17.52z="" data="!5m1!1e1?hl=en')'" href="https://www.google.co.in/maps/@28.395604,76.98176,17.52z/data=!3m1!1e3!5m1!1e1?hl=en" target="_Blank"> (Live Traffic)</a> &nbsp;&nbsp; - &nbsp;&nbsp; <a href="#" title="Click here to get estimated travel time." id="0-232X" onclick="javascript:TollPlazaTrafficTime(272,this);">ET</a>
</td>
</tr>
<tr>
<td>2</td>
<td style="font-weight: bold;"><a href="#" onclick="javascript:TollPlazaPopup(213);"> Shahjahanpur </a></td>
<td style="font-weight: bold;">125 <a onclick="return popitup(" https:="" www.google.co.in="" maps="" @27.99978,76.430522,17.52z="" data="!5m1!1e1?hl=en')'" href="https://www.google.co.in/maps/@27.99978,76.430522,17.52z/data=!3m1!1e3!5m1!1e1?hl=en" target="_Blank"> (Live Traffic)</a> &nbsp;&nbsp; - &nbsp;&nbsp; <a href="#" title="Click here to get estimated travel time." id="1-179X" onclick="javascript:TollPlazaTrafficTime(213,this);">ET</a>
</td>
</tr>

现在我在刮,所以结果就像

^{pr2}$

我想跳过现场交通

我的python代码是

tbody = soup('table' ,{"class":"tollinfotbl"})[0].find_all('tr')[3:]
for row in tbody:
    cols = row.findChildren(recursive=False)
    cols = [ele.text.contents[0] for ele in cols]
    if cols:
        sno = str(cols[0])
        Toll_plaza = str(cols[1])
        cost = str(cols[2])

        query = "INSERT INTO tryroute (sno,Toll_plaza, cost) VALUES (%s, %s, %s);"

当我使用.contents[0]时,我得到一个错误cols = [ele.text.content[0] for ele in cols] AttributeError: 'str' object has no attribute 'content'

任何帮助都将不胜感激。在


Tags: inhttpsstylewwwgooglejavascripttrtd
2条回答

您得到这个错误是因为您试图在str对象上使用“contents”,即电子文本在

ele.text # returns a string object (which in your case contains the whole text in that particular tag)

要获取标签的内容,您必须这样做

^{pr2}$

您可以使用re从原始数据中提取数据。您不需要得到content[],因为这很容易出错,因为您显式地给出了索引而不灵活。在

在复制下面的代码之前,在顶部添加import re。在

for row in tbody:
    cols = row.findChildren(recursive=False)
    cols = [ele.text for ele in cols]
    if cols:
        sno = str(cols[0])
        Toll_plaza = str(cols[1])
        cost_raw = str(cols[2])

        compiled = re.compile('^(\d+)\s*\(', flags=re.IGNORECASE | re.DOTALL)
        match = re.search(compiled, cost_raw)
        if match:
            cost = match.group(1)

        query = "INSERT INTO tryroute (sno,Toll_plaza, cost) VALUES (%s, %s, %s);"

如果你需要澄清请告诉我。在

相关问题 更多 >