带有动态修改表格的选择器的网页刮削表格

2024-04-29 13:06:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从一个网站上刮一张桌子。 问题是它有一个选择器来选择表是按国家、州还是按城市聚合的,这会更改显示的数据。默认情况下,该表显示“国家”级别的数据,但我希望在“州”级别上对数据进行刮取。这是网站:

https://www.opentable.com/state-of-industry

我使用的代码如下:

page = requests.get('https://www.opentable.com/state-of-industry')
soup = BeautifulSoup(page.content, 'html.parser')
tables = soup.find_all("table")
table = tables[0]
tab_data = [[cell.text for cell in row.find_all(["th","td"])]
                        for row in table.find_all("tr")]
df = pd.DataFrame(tab_data)

这给了我一个“国家”表,我怎样才能得到“国家”表呢

谢谢


Tags: of数据httpscom网站wwwpagetable
1条回答
网友
1楼 · 发布于 2024-04-29 13:06:22

请尝试下面使用Selenium实现的代码。下面的脚本首先打开浏览器,然后等待下拉列表被定位,循环遍历所有3个下拉选项。但是,有一个按钮显示下载数据集,当您单击它时,将在一个csv文件中提供所有3个选项数据。我已经实现了下面的脚本,它现在通过单击按钮来模拟和下载数据集,但您可以将其用于其他用途或需求,因为您要求提供一个示例

Make sure you donwload the chrome driver and mention the path chromedriver.exe of your system here on this line webdriver.Chrome('/chromedriver/chromedriver.exe',chrome_options=chrome_options)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import time

def scrape_all_data():
    url = 'https://www.opentable.com/state-of-industry'

    chrome_options = Options()
    chrome_options.add_argument(" start-maximized")
    print("Opening Chrome Browser..")
    driver = webdriver.Chrome('/chromedriver/chromedriver.exe',chrome_options=chrome_options) #download chrome drivier and mention the path of .exe 

    driver.get(url)

    wait1 = WebDriverWait(driver, 200)
    wait1.until(EC.presence_of_element_located((By.XPATH, '//*[@id="content"]/div/div/main/section[2]/div[4]/div[1]/select'))) #wait till the select will be located.

    lst_to_traverse = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="content"]/div/div/main/section[2]/div[4]/div[1]/select')))) #selecting the select element for further looping or usage

    for option in lst_to_traverse.options:
        print(option.text) #print the selected option ex:- country, state or city
        lst_to_traverse.select_by_visible_text(option.text)
        time.sleep(1)
        driver.find_element_by_xpath('//*[@id="content"]/div/div/main/section[2]/div[4]/div[1]/button').click() #download the dataset
        time.sleep(1)
        break
    

scrape_all_data()

相关问题 更多 >