如何使用Selenium和Python从选择每个下拉选项的表中提取信息?

2024-04-29 19:59:14 发布

您现在位置:Python中文网/ 问答频道 /正文

试图帮助为非营利组织工作的人。目前正试图从STL县议会/委员会网站(https://boards.stlouisco.com/)获取信息

出现问题的原因如下:

我打算尝试使用BeautifulSoup,但在您从上面的下拉栏中选择一个董事会/委员会之前,实际数据甚至不会显示,因此我已切换到Selenium,这是我的新手

这项任务可能吗?当我查看站点的html代码时,我看到信息并没有存储在页面中,而是从另一个位置提取出来,并根据从下拉菜单中选择的选项显示在站点上

function ShowMemberList(selectedBoard) {
        ClearMeetingsAndMembers();
        var htmlString = "";
        var boardsList = [{"id":407,"name":"Aging Ahead","isActive":true,"description":"... ...1.","totalSeats":14}];
        var totalMembers = boardsList[$("select[name='BoardsList'] option:selected").index() - 1].totalSeats;
        $.get("/api/boards/" + selectedBoard + "/members", function (data) {
            if (data.length > 0) {
                htmlString += "<table id=\"MemberTable\" class=\"table table-hover\">";
                htmlString += "<thead><th>Member Name</th><th>Title</th><th>Position</th><th>Expiration Date</th></thead><tbody>";
                for (var i = 0; i < totalMembers; i++) {
                    if (i < data.length) {
                        htmlString += "<tr><td>" + FormatString(data[i].firstName) + " " + FormatString(data[i].lastName) + "</td><td>" + FormatString(data[i].title) + "</td><td>" + FormatString(data[i].position) + "</td><td>" + FormatString(data[i].expirationDate) + "</td></tr>";
                    } else {
                        htmlString += "<tr><td colspan=\"4\">---Vacant Seat---</td></tr>" 
                    }
                }
                htmlString += "</tbody></table>";
            } else {
                htmlString = "<span id=\"MemberTable\">There was no data found for this board.</span>";
            }
            $("#Results").append(htmlString);
        });
    }

到目前为止,我有这个(不是很多),它进入页面并从列表中选择每个板:

driver = webdriver.Chrome()
driver.get("https://boards.stlouisco.com/")
select = Select(wait(driver, 10).until(EC.presence_of_element_located((By.ID, 'BoardsList'))))
options = select.options

for board in options:
    select.select_by_visible_text(board.text)

从这里,我希望能够从MemberTable中获取信息,但我不知道如何前进/这是否在我的能力范围内,或者甚至硒是否有可能

我曾尝试通过几个不同的元素使用find_来单击members表,但遇到了错误。我还尝试在选择后调用memberstable,但它找不到该元素。任何提示/指示/建议都将不胜感激


Tags: boardidfordatavardrivertableselect
2条回答

要从下拉列表中选择每个/委员会,并刮取页面,您必须为element_to_be_clickable()诱导WebDriverWait,您可以使用以下Locator Strategies

代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://boards.stlouisco.com/")
select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'BoardsList'))))
for option in select.options:
    option.click()
    print("Scrapping :"+option.text)

控制台输出:

Scrapping : -Choose a Board -
Scrapping :Aging Ahead
Scrapping :Aging Ahead Advisory Council
Scrapping :Air Pollution & Noise Control Appeal Board
Scrapping :Animal Care & Control Advisory Board
Scrapping :Bi-State Development Agency (Metro)
Scrapping :Board Of Examiners For Mechanical Licensing
Scrapping :Board of Freeholders
Scrapping :Boundary Commission
Scrapping :Building Code Review Committee
Scrapping :Building Commission & Board Of Building Appeals
Scrapping :Business Advisory Council
Scrapping :Center for Educational Media
Scrapping :Civil Service Commission
Scrapping :Commission On Disabilities
Scrapping :County Health Advisory Board
Scrapping :Domestic And Family Violence Council
Scrapping :East-West Gateway Council of Governments Board of Directors
Scrapping :Economic Development Collaborative Advisory Board
Scrapping :Economic Rescue Team
Scrapping :Electrical Code Review Committee
Scrapping :Electrical Examiners, Board Of
Scrapping :Emergency Communications System Commission
Scrapping :Equalization, Board Of
Scrapping :Fire Standards Commission
Scrapping :Friends of the Kathy J. Weinman Shelter for Battered Women, Inc.
Scrapping :Fund Investment Advisory Committee
Scrapping :Historic Building Commission
Scrapping :Housing Authority
Scrapping :Housing Resources Commission
Scrapping :Human Relations Commission
Scrapping :Industrial Development Authority Board
Scrapping :Justice Services Advisory Board
Scrapping :Lambert Airport Eastern Perimeter Joint Development Commission
Scrapping :Land Clearance For Redevelopment Authority
Scrapping :Lemay Community Improvement District
Scrapping :Library Board
Scrapping :Local Emergency Planning Committee
Scrapping :Mechanical Code Review Committee
Scrapping :Metropolitan Park And Recreation District Board Of Directors (Great Rivers Greenway)
Scrapping :Metropolitan St. Louis Sewer District
Scrapping :Metropolitan Taxicab Commission
Scrapping :Metropolitan Zoological Park and Museum District Board
Scrapping :Municipal Court Judges
Scrapping :Older Adult Commission
Scrapping :Parks And Recreation Advisory Board
Scrapping :Planning Commission
Scrapping :Plumbing Code Review Committee
Scrapping :Plumbing Examiners, Board Of
Scrapping :Police Commissioners, Board Of
Scrapping :Port Authority Board Of Commissioners
Scrapping :Private Security Advisory Committee
Scrapping :Productive Living Board
Scrapping :Public Transportation Commission of St. Louis County
Scrapping :Regional Arts Commission
Scrapping :Regional Convention & Sports Complex Authority
Scrapping :Regional Convention & Visitors Commission
Scrapping :REJIS Commission
Scrapping :Restaurant Commission
Scrapping :Retirement Board Of Trustees
Scrapping :St. Louis Airport Commission
Scrapping :St. Louis County Children's Service Fund Board
Scrapping :St. Louis County Clean Energy Development Board (PACE)
Scrapping :St. Louis County Workforce Development Board
Scrapping :St. Louis Economic Development Partnership
Scrapping :St. Louis Regional Health Commission
Scrapping :St. Louis-Jefferson Solid Waste Management District
Scrapping :Tax Increment Financing Commission of St. Louis County
Scrapping :Transportation Board
Scrapping :Waste Management Commission
Scrapping :World Trade Center - St. Louis
Scrapping :Zoning Adjustment,  Board of
Scrapping :Zoo-Museum District - Art Museum Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Botanical Garden Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Missouri History Museum Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - St. Louis Science Center Subdistrict Board of Commissioners
Scrapping :Zoo-Museum District - Zoological Park Subdistrict Board of Commissioners

参考资料

您可以在以下内容中找到一些相关讨论:

您可以使用此脚本将所有板中的所有成员保存到csv:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://boards.stlouisco.com/'
members_url = 'https://boards.stlouisco.com/api/boards/{}/members'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for o in soup.select('#BoardsList option[value]'):
    print(o['value'], o.text)
    data = requests.get(members_url.format(o['value'])).json()
    for d in data:
        all_data.append(dict(board=o.text, **d))

df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv')

印刷品:

                                                 board  boardMemberId  memberId boardName  ...   lastName                                  title                                           position expirationDate
0                                          Aging Ahead          39003     27007      None  ...   Anderson                                   None               ST. LOUIS COUNTY EXECUTIVE APPOINTEE      10/1/2020
1                                          Aging Ahead          38963     27797      None  ...     Bauers                                   None  St. Charles County Community Action Agency App...           None
2                                          Aging Ahead          39004     27815      None  ...  Berkowitz                                   None               ST. LOUIS COUNTY EXECUTIVE APPOINTEE      10/1/2020
3                                          Aging Ahead          38964     27798      None  ...     Biehle                                   None  Jefferson County Community Action Corp. Appointee           None
4                                          Aging Ahead          38581     27597      None  ...     Bowers                                   None               Franklin County Commission Appointee           None
..                                                 ...            ...       ...       ...  ...        ...                                    ...                                                ...            ...
725  Zoo-Museum District - Zoological Park Subdistr...          38863     26745      None  ...       Seat               (Robert R. Hermann, Jr.)                                   St. Louis County     12/31/2019
726  Zoo-Museum District - Zoological Park Subdistr...          38864     26745      None  ...       Seat                        (Winthrop Reed)                                   St. Louis County     12/31/2016
727  Zoo-Museum District - Zoological Park Subdistr...          38669     26745      None  ...       Seat                      (Lawrence Thomas)                                   St. Louis County     12/31/2018
728  Zoo-Museum District - Zoological Park Subdistr...          38670     26745      None  ...       Seat  (Peggy Ritter ) Advisory Commissioner                        Non-Voting St. Louis County     12/31/2019
729  Zoo-Museum District - Zoological Park Subdistr...          38394     27512      None  ...     Wilson                  Advisory Commissioner                       Non-Voting City of St. Louis           None

[730 rows x 9 columns]

并将data.csv与所有董事会/成员一起保存(LibreOffice的屏幕截图):

enter image description here

相关问题 更多 >