尝试在Selenium中从嵌套div中提取数据
这是我的HTML代码:
<div class="AcuityResearchTerminalWidget__padding__S4eul undefined">
<div class="EventInfo__wrapper__Gkjqn" style="background-color: rgb(23, 32, 42); color: rgb(255, 255, 255);">
<div class="EventInfo__title__JBkYo">Consumer Price Index (MoM </div>
<div class="EventInfo__detailsWrapper__vBTGX" style="border-color: rgb(13, 24, 33);">
<div class="EventInfo__affectedAssetsWrapper__gjMXZ">
<div class="EventInfo__title__JBkYo">What Assets does this event affect?</div><div class="EventInfo__affectedAssets__UNDjB">
<div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(255, 255, 255); color: rgb(23, 32, 42);"><span>EURUSD</span></div>
<div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURAUD</span></div>
<div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURCAD</span></div>
<div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURCHF</span></div>
<div role="button" class="EventInfo__affectedAsset__iQlBc" style="background: rgb(23, 32, 42); color: rgb(255, 255, 255);"><span>EURCZK</span></div>
我的代码或者逻辑好像出错了,因为我无法提取到受影响的资产(比如:EURUSD)。
这是我的代码:
def extract_name_and_assets(html_content):
soup = BeautifulSoup(html_content, 'html.parser')
events = soup.find_all(class_='EventTile__wrapper__iAmCe')
extracted_data = []
for event in events:
event_name = event.find(class_='EventTile__eventName__OBN72').text.strip()
event_id = event['data-id'] # Extracting the data-id attribute
extracted_data.append({'name': event_name, 'id': event_id})
widget = event.find(class_="AcuityResearchTerminalWidget__padding__S4eul undefined")
if widget:
wrapper = widget.find(class_="EventInfo__affectedAssestsWrapper__gjMXZ")
if wrapper:
assets_wrapper = wrapper.find(class_="EventInfo__affectedAssets__UNDjB")
if assets_wrapper:
expand_button_div = assets_wrapper.find('div', class_='EventInfo__expandButton__cDK9n')
if expand_button_div:
expand_button_div.div.next_sibling.string = "View All Assets"
assets = [asset.span.text.strip() for asset in assets_wrapper.find_all(class_="EventInfo__affectedAsset__iQlBc")]
else:
assets = []
extracted_data.append({'name': event_name, 'id': event_id, 'assets': assets})
return extracted_data
每个事件都有多个资产,但我的代码似乎无法提取到任何资产。此外,一旦我添加了从widget = event.find...
开始的那几行,输出的CSV文件就会变成空的(在那之前,event_name和event_id是正常工作的)。
我尝试通过下面的代码直接访问资产,但还是不行。
for event in events:
event_name = event.find(class_='EventTile__eventName__OBN72').text.strip()
event_id = event['data-id'] # Extracting the data-id attribute
assets_wrapper = event.find(class_="EventInfo__affectedAssets__UNDjB")
if assets_wrapper:
assets = [asset.span.text.strip() for asset in assets_wrapper.find_all(class_="EventInfo__affectedAsset__iQlBc")]
else:
assets = []
extracted_data.append({'name': event_name, 'id': event_id, 'assets': assets})
return extracted_data
0 个回答
暂无回答