AttributeError:找不到类时,“非类型”对象没有属性“文本”

2024-04-25 13:32:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的代码成功地从https://www.canpages.ca/business/AB/edmonton/restaurants/183-720200-p41.html"中刮取业务类别,但是在第42页有一家公司没有类别(在“结果id”类下的类是“结果\业务类别”)

这个特定的公司在html中显示为实际上没有该类,而其他结果则有。我不确定最好的方法是什么,因为我的程序一旦看到类不存在就会崩溃。 错误为“AttributeError:'NoneType'对象没有属性'text'”,代码如下:


import re #regex
import requests #fetches html page content
from requests import get 
from bs4 import BeautifulSoup #parses html page content
import pandas as pd
import numpy as np

#initialize empty list where we can store data
categories = []

#Get the contents of the page we're looking at by requesting the URL
results = requests.get("https://www.canpages.ca/business/AB/edmonton/restaurants/183-720200-p42.html", headers=headers)

soup = BeautifulSoup(results.text, "html.parser")

#grab the container of each company by result id
companies_div = soup.find_all('div', {'id': re.compile('result-id-.*')})

for x in companies_div:

    # Extract category class and split by white space.  Category should follow [City Category] but sometimes typos result in [Category]
    categoryChunk = x.find('div', class_='result__business-category').text.split()

    # if list does not have [City Category] format and therefore list length of 2, mark as "-"
    category = categoryChunk[1] if len(categoryChunk) == 2 else '-'
    categories.append(category)

#ininitalize pd dataframe
companies = pd.DataFrame({
    'category': categories,
    })

print(companies)

companies.to_csv('companiestest6.csv')


我不知道如何才能基本上告诉程序“如果找不到类,请将类别标记为“-”,并非常感谢任何帮助

更新

我已将代码更新如下:

categoryDiv = x.find('div', class_='result__business-category')

        if categoryDiv:
            categoryChunk = categoryDiv.text.split()
            if len(addressChunk) == 3:
                category = categoryChunk[1]
                categories.append(category)

            else:
                category = '-'
                categories.append(category)

        else:
            category = '-'
            categories.append(category)

这似乎很有效


Tags: thetextimportdividifhtmlresult
1条回答
网友
1楼 · 发布于 2024-04-25 13:32:42

似乎您应该能够相当简单地测试.find返回的内容

div = x.find('div', class_='result__business-category')

if div:
    categoryChunk = div.text.split()

    category = categoryChunk[1]

else:
    category = '-'

这不会显式地测试长度为2的情况,但我假设这只是为了在找不到的情况下尝试获取

相关问题 更多 >