尝试在Selenium中从嵌套div中提取数据

-1 投票
0 回答
26 浏览
提问于 2025-04-12 10:14

这是我的HTML代码:

<div class="AcuityResearchTerminalWidget__padding__S4eul undefined">
   <div class="EventInfo__wrapper__Gkjqn" style="background-color: rgb(23, 32, 42); color: rgb(255, 255, 255);">
      <div class="EventInfo__title__JBkYo">Consumer Price Index (MoM </div>
      <div class="EventInfo__detailsWrapper__vBTGX" style="border-color: rgb(13, 24, 33);">
      <div class="EventInfo__affectedAssetsWrapper__gjMXZ">
         <div class="EventInfo__title__JBkYo">What Assets does this event affect?</div><div class="EventInfo__affectedAssets__UNDjB">
         <div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(255, 255, 255); color: rgb(23, 32, 42);"><span>EURUSD</span></div>
         <div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURAUD</span></div>
         <div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURCAD</span></div>
         <div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURCHF</span></div>
         <div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURCZK</span></div>

我的代码或者逻辑好像出错了,因为我无法提取到受影响的资产(比如:EURUSD)。

这是我的代码:

def extract_name_and_assets(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    events = soup.find_all(class_='EventTile__wrapper__iAmCe')

    extracted_data = []

    for event in events:
        event_name = event.find(class_='EventTile__eventName__OBN72').text.strip()
        event_id = event['data-id']  # Extracting the data-id attribute
        extracted_data.append({'name': event_name, 'id': event_id})
        
        widget = event.find(class_="AcuityResearchTerminalWidget__padding__S4eul undefined")
        if widget:
            wrapper = widget.find(class_="EventInfo__affectedAssestsWrapper__gjMXZ")
            if wrapper:
                assets_wrapper = wrapper.find(class_="EventInfo__affectedAssets__UNDjB")
                if assets_wrapper:
                    expand_button_div = assets_wrapper.find('div', class_='EventInfo__expandButton__cDK9n')
                    if expand_button_div:
                        expand_button_div.div.next_sibling.string = "View All Assets"
                    assets = [asset.span.text.strip() for asset in assets_wrapper.find_all(class_="EventInfo__affectedAsset__iQlBc")]
                else:
                    assets = []
                extracted_data.append({'name': event_name, 'id': event_id, 'assets': assets})

    return extracted_data

每个事件都有多个资产,但我的代码似乎无法提取到任何资产。此外,一旦我添加了从widget = event.find...开始的那几行,输出的CSV文件就会变成空的(在那之前,event_name和event_id是正常工作的)。

我尝试通过下面的代码直接访问资产,但还是不行。

for event in events:
        event_name = event.find(class_='EventTile__eventName__OBN72').text.strip()
        event_id = event['data-id']  # Extracting the data-id attribute
        assets_wrapper = event.find(class_="EventInfo__affectedAssets__UNDjB")
        if assets_wrapper:
            assets = [asset.span.text.strip() for asset in assets_wrapper.find_all(class_="EventInfo__affectedAsset__iQlBc")]
        else:
            assets = []
        extracted_data.append({'name': event_name, 'id': event_id, 'assets': assets})

    return extracted_data

0 个回答

暂无回答

撰写回答