从html检索文本不适用于python

for companyLIST in result[0:]: try: companyname = companyLIST.find('h3').contents[0] print("Company Name ",str(companyname) ) except Exception as e: print("errror",str(e)) try: companySt = companyLIST.find_all('li')[1].contents[0] print("Company St ",str(companySt) ) except Exception as e: print("errror",str(e)) try: companyCity = companyLIST.find_all('li')[2].contents[0] print("Company City ",str(companyCity) ) except Exception as e: print("errror",str(e)) try: companyPhone= companyLIST.find('li')[3].contents[0] print("Company Phone ",companyPhone ) except Exception as e: print("errror",str(e)) try: companyWeb = companyLIST.find('a')['href'] print("Company Web ",str(companyWeb) ) print(" " ) except Exception as e: print("errror",str(e))

2条回答

网友

1楼 · 编辑于 2024-06-16 11:40:37

替换

companyPhone= companyLIST.find('li')[3].contents[0]
            print("Company Phone ",companyPhone )

与

if "Phone" in companyLIST:                                                                                                 
    companyPhone = companyLIST.split(':')[-1].replace(' ','').replace('</li>','')

上面的代码按“：”字符分隔列表，选择最后一个元素，并删除无用的信息。最后我们只有电话号码作为一个单独的字符串。您可以对其余的行执行相同的操作，只需明智地选择拆分字符/字符串，并使用replace函数清除结果列表元素。你知道吗

希望有用。你知道吗

网友

2楼 · 编辑于 2024-06-16 11:40:37

我猜您正在使用beatifulsoup4库解析HTML。如果是，您可以从html获取电话号码，如下所示：

text = soup.find_all('li')[3].contents[1]
phone_number = re.sub(": ", "", text)

print(phone_number)

相关问题更多 >

编程相关推荐

热门问题

热门文章