选定的LinkedIn配置文件未使用Python在Selenium中完全加载

2024-04-26 14:22:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经写了一个代码来获取LinkedIn配置文件的详细信息,但有时一些用户配置文件没有加载整个HTML。你知道吗

我已经使用了经典的等待机制

driver.implicitly_wait(10)

time.sleep(10)

element_present = EC.presence_of_element_located((By.CLASS_NAME, '.pv-profile-section__card-item-v2.pv-profile-section.pv-position-entity.ember-view'))
WebDriverWait(driver, 300).until(element_present)

但似乎都没用。你知道吗

我的代码片段:

firstName = urllib.parse.quote(userFirstName)
lastName = urllib.parse.quote(userLastName)
company = urllib.parse.quote(userCompany)

driver.get('https://www.linkedin.com/search/results/people/?company='+company+'&firstName='+firstName+'&lastName='+lastName+'&origin=FACETED_SEARCH')

results = len(driver.find_elements_by_css_selector('.name.actor-name'))
for i in range(1):
    print(i)
    driver.find_elements_by_css_selector('.name.actor-name')[i].click()
    time.sleep(10)
    print(driver.current_url)

    content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
    driver.implicitly_wait(2)
    soup = BeautifulSoup(content, "html.parser")
    #print(soup)

    companyList = soup.findAll('section',{'class':'pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view'})
    print("Company list length: "+str(len(companyList)))

这段代码确实为许多用户提供了公司列表,但在某些情况下根本就失败了。我在浏览器上检查了这些配置文件,代码中的元素确实存在。你知道吗

任何帮助/过去的经验将不胜感激。我知道解决这个问题也需要努力,所以提前谢谢!你知道吗

附言:HTML的一部分(我关心的体验部分):

<ul class="pv-profile-section__section-info section-info pv-profile-section__section-info--has-no-more">
<li class="pv-entity__position-group-pager pv-profile-section__list-item ember-view" id="ember394"> <section class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view" id="ember396"> <div class="display-flex justify-space-between full-width">
<a class="full-width ember-view" data-control-name="background_details_company" href="/search/results/index/?keywords=Aditya%20Birla%20Direct" id="ember397"> <div class="pv-entity__company-details">
<div class="pv-entity__logo company-logo">
<img alt="Aditya Birla Direct" class="pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 lazy-image ember-view" id="ember399"/>
</div>
<div class="pv-entity__company-summary-info">
<h3 class="t-16 t-black t-bold">
<span class="visually-hidden">Company Name</span>
<span>Aditya Birla Direct</span>
</h3>
<h4 class="t-14 t-black t-normal">
<span class="visually-hidden">Total Duration</span>
<span>2 yrs 6 mos</span>
</h4>
</div>
</div>
</a>
<!-- --> </div>
<ul class="pv-entity__position-group mt2 ember-view" id="ember400"><li class="pv-entity__position-group-role-item sortable-item ember-view" id="ember402"> <div class="ember-view" id="ember403"><div class="pv-entity__role-details">
<span class="pv-entity__timeline-node"></span>
<div class="display-flex justify-space-between full-width">
<div class="pv-entity__role-container">
<div class="pv-entity__role-details-container pv-entity__role-details-container--timeline pv-entity__role-details-container--bottom-margin">
<div class="pv-entity__summary-info-v2 pv-entity__summary-info--background-section pv-entity__summary-info-margin-top">
<h3 class="t-14 t-black t-bold">
<span class="visually-hidden">Title</span>
<span>Product Designer</span>
</h3>
<!-- --> <div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>Jun 2018 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">1 yr 5 mos</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Location</span>
<span>Mumbai, Maharashtra, India</span>
</h4>
</div>
<!-- --> </div>
</div>
<!-- --> </div>
</div>
</div>
</li><li class="pv-entity__position-group-role-item sortable-item ember-view" id="ember405"> <div class="ember-view" id="ember406"><div class="pv-entity__role-details">
<span class="pv-entity__timeline-node"></span>
<div class="display-flex justify-space-between full-width">
<div class="pv-entity__role-container">
<div class="pv-entity__role-details-container">
<div class="pv-entity__summary-info-v2 pv-entity__summary-info--background-section pv-entity__summary-info-margin-top">
<h3 class="t-14 t-black t-bold">
<span class="visually-hidden">Title</span>
<span>UI/UX Designer</span>
</h3>
<!-- --> <div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>May 2017 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">2 yrs 6 mos</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Location</span>
<span>Mumbai, Maharashtra, India</span>
</h4>
</div>
<!-- --> </div>
</div>
<!-- --> </div>
</div>
</div>
</li>
</ul>
<!-- --></section>
</li><li class="pv-entity__position-group-pager pv-profile-section__list-item ember-view" id="ember408"> <section class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view" id="1192970710"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a class="full-width ember-view" data-control-name="background_details_company" href="/search/results/index/?keywords=improove%20technology%20pvt%20ltd" id="ember411"> <div class="pv-entity__logo company-logo">
<img alt="improove technology pvt ltd" class="pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 lazy-image ghost-company ember-view" id="ember413"/>
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section">
<h3 class="t-16 t-black t-bold">UI/UX Designer</h3>
<p class="visually-hidden">Company Name</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">improove technology pvt ltd</p>
<!-- -->
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>May 2015 – May 2017</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">2 yrs 1 mo</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Location</span>
<span>Delhi</span>
</h4>
</div>
</a>
<!-- --> </div>
<!-- --> </div>
</section>
</li>

我基本上需要的公司名称,角色和日期雇用。你知道吗


Tags: divviewidsectionprofileitemh4hidden
1条回答
网友
1楼 · 发布于 2024-04-26 14:22:11

根据您发布的更新的HTML,section元素可能已完全加载,但其内容未完全加载,这可能导致companyList如您所述为空。你知道吗

我宁愿等待比section更具体的东西:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Wait on ALL sections to load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//section[contains(@class, 'pv-profile-section')]")))

# Wait on Company Name labels to load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[contains(text(), 'Company Name')]")))

# Get company list
companyList = driver.find_elements_by_xpath("//section[contains(@class, 'pv-profile-section')]")

print(len(companyList))

此代码将等待所有section元素加载,同时也等待Company Name加载这可能会避免section已加载,但其内容尚未完全加载的问题。你知道吗

相关问题 更多 >