Python,Selenium:从返回的Lis中隔离项

2024-04-27 22:37:43 发布

您现在位置:Python中文网/ 问答频道 /正文

通过阅读、视频和社区的帮助,我能够使用Selenium和Python从Tessco.com中获取数据。你知道吗

这个网站需要一个联合国和PW。我已经在下面的代码中包含了这个,这是非必要的凭证,专门用来提问的。你知道吗

我的最终目标是在Excel零件号列表中循环,并搜索一组参数,包括price。在引入循环列表之前,我希望将所需的信息与废弃的信息隔离开来。你知道吗

我不确定如何过滤这些信息。你知道吗

代码如下:

    import time
    #Need Selenium for interacting with web elements
    from selenium import webdriver
    from selenium.webdriver.support import expected_conditions as EC
    #Need numpy/pandas to interact with large datasets
    import numpy as np
    import pandas as pd

    chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
    driver = webdriver.Chrome(chrome_path)
    driver.get("https://www.tessco.com/login")

    userName = "FirstName.SurName321123@gmail.com"
    password = "PasswordForThis123"

    #Set a wait, for elements to load into the DOM
    wait10 = WebDriverWait(driver, 10)
    wait20 = WebDriverWait(driver, 20)
    wait30 = WebDriverWait(driver, 30)

    elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID"))) 
    elem.send_keys(userName)

    elem = wait10.until(EC.element_to_be_clickable((By.ID, "password"))) 
    elem.send_keys(password)

    #Press the login button
    driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()

    #Expand the search bar
    searchIcon = wait10.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i"))) 
    searchIcon.click()

    searchBar = wait10.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input'))) 
    searchBar.click()

    #load in manufacture part number from a collection of components, via an Excel file

    #Enter information into the search bar
    searchBar.send_keys("HL4RPV-50" + '\n')

    # wait for the products information to be loaded
    products = wait30.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
    # create a dictionary to store product and price
    productInfo = {}
    # iterate through all products in the search result and add details to dictionary
    for product in products:
        # get product info such as OEM, Description and Part Number
        productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
        mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
        mfgName = product.find_element_by_tag_name("img").get_attribute("alt")

        # get price
        price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]

        # add details to dictionary
        productInfo[mfgPart, mfgName, productDescr] = price

    # print products information   
    print(productInfo)

输出为

{('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89', ('MFG PART #: HL4RPV-50B', 'CommScope', '1/2" Plenum Air Cable, Blue'): '$1.89', ('MFG PART #: L4HM-D', 'CommScope', '4.3-10 Male for 1/2" AL4RPV-50,LDF4-50A,HL4RPV-50'): '$19.94', ('MFG PART #: L4HR-D', 'CommScope', '4.3-10M RA for 1/2" AL4RPV-50, LDF4-50A, HL4RPV-50'): '$39.26', ('MFG PART #: UPL-4MT-12', 'JMA Wireless', '4.3-10 Male Connector for 1/2” Plenum Cables'): '$32.99', ('MFG PART #: UPL-4F-12', 'JMA Wireless', '4.3-10 Female Connector for 1/2" Plenum'): '$33.33', ('MFG PART #: UPL-4RT-12', 'JMA Wireless', '4.3-10 R/A Male Connector for 1/2" Plenum'): '$42.82', ('MFG PART #: L4HF-D', 'CommScope', '4.3-10 Female for 1/2 in AL4RPV-50, LDF4-50A'): '$20.30'}

我只需要在自动搜索中引用的内容,因此对于这个示例,我将查找

('MFG PART #: HL4RPV-50', 'CommScope', '1/2" Plenum Air Cable, Off White'): '$1.89'

最终,我计划用一个项目列表来替换HL4RPV-50标签,但是现在,我相信我应该过滤需要的东西。你知道吗

我怀疑这个逻辑是正确的,但我已经尝试打印出任何一个部件的产品信息,等于搜索要求,如下所示。你知道吗

for item in mfgPart:
    if mfgPart == "HL4RPV-50":
        print(productInfo)

但上面的代码只是像以前一样打印了所有输出。你知道吗

然后我尝试导入itertools并运行以下命令:

print(dict(itertools.islice(productInfo.items(), 1)))

它实际上返回了我想要的商品,但不能保证第一个返回的商品就是我要找的。如果我能根据给定的零件号筛选出准确的搜索结果,那将是最好的。你知道吗

有什么方法可以根据输入过滤结果吗?你知道吗

如有任何提示,我们将不胜感激。你知道吗


Tags: thetoimportdivfordriverelementproduct
3条回答

其他答案似乎检查零件号是否在mfg零件字符串中,但我看到有些项目可能包含相同的零件号,例如HL4RPV-50HL4RPV-50B。如果您想隔离零件号,以便准确地知道您正在查看的零件,我建议您遍历字典,并在冒号处拆分mfg零件字符串以获得ID。您还可以获取项目的其他部分,以便更清晰地打印信息,如下面的示例所示。你知道吗

for (mfg_part, comm_scope, name), price in productInfo.items():
    mfg_id = mfg_part.split(': ')[1]
    if mfg_id == 'HL4RPV-50':
        print('Part #:', mfg_id)
        print('Company:', comm_scope)
        print('Name:', name)
        print('Price:', price)

您可以将此筛选代码用于Python字典

 searchedProduct = dict(filter(lambda item: "HL4RPV-50" in item[0], productInfo.items()))
 print(searchedProduct)

您最初的示例非常接近,我们只需循环检查每一项,以及字典关键部分中的列表。如果您不介意嵌套,这就可以做到:)您只需要适当地调整关键字。你知道吗

注意:

如果使用python2.X,您可能必须使用productinfo.iteritems(),在本例中假设为3.X。你知道吗

示例:

def main():

""" Get our key from our dictionary """
for key in productinfo.items():

    """ Retrieve our list of strings """
    for stringList in key[0]:

        """ For every new line in our list of strings """
        for newline in stringList.splitlines():

            """ Lets split by each word in our line """
            for string in newline.split(' '):

                """ Check each string against our keyword """
                if string == "HL4RPV-50B":
                    print(key)

if __name__ == '__main__':
    main()

相关问题 更多 >