带有动态修改表格的选择器的网页刮削表格

page = requests.get('https://www.opentable.com/state-of-industry') soup = BeautifulSoup(page.content, 'html.parser') tables = soup.find_all("table") table = tables[0] tab_data = [[cell.text for cell in row.find_all(["th","td"])] for row in table.find_all("tr")] df = pd.DataFrame(tab_data)

1条回答

网友

1楼 · 发布于 2024-05-16 06:34:40

请尝试下面使用Selenium实现的代码。下面的脚本首先打开浏览器，然后等待下拉列表被定位，循环遍历所有3个下拉选项。但是，有一个按钮显示下载数据集，当您单击它时，将在一个csv文件中提供所有3个选项数据。我已经实现了下面的脚本，它现在通过单击按钮来模拟和下载数据集，但您可以将其用于其他用途或需求，因为您要求提供一个示例

Make sure you donwload the chrome driver and mention the path chromedriver.exe of your system here on this line webdriver.Chrome('/chromedriver/chromedriver.exe',chrome_options=chrome_options)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import time

def scrape_all_data():
    url = 'https://www.opentable.com/state-of-industry'

    chrome_options = Options()
    chrome_options.add_argument(" start-maximized")
    print("Opening Chrome Browser..")
    driver = webdriver.Chrome('/chromedriver/chromedriver.exe',chrome_options=chrome_options) #download chrome drivier and mention the path of .exe 

    driver.get(url)

    wait1 = WebDriverWait(driver, 200)
    wait1.until(EC.presence_of_element_located((By.XPATH, '//*[@id="content"]/div/div/main/section[2]/div[4]/div[1]/select'))) #wait till the select will be located.

    lst_to_traverse = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="content"]/div/div/main/section[2]/div[4]/div[1]/select')))) #selecting the select element for further looping or usage

    for option in lst_to_traverse.options:
        print(option.text) #print the selected option ex:- country, state or city
        lst_to_traverse.select_by_visible_text(option.text)
        time.sleep(1)
        driver.find_element_by_xpath('//*[@id="content"]/div/div/main/section[2]/div[4]/div[1]/button').click() #download the dataset
        time.sleep(1)
        break
    

scrape_all_data()

相关问题更多 >

编程相关推荐

热门问题

热门文章

带有动态修改表格的选择器的网页刮削表格

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >