如何使用带有下拉列表的selenium来刮取历史数据？

from pandas.io.html import read_html from selenium import webdriver from operator import itemgetter #driver = webdriver.Firefox() from bs4 import BeautifulSoup options = webdriver.ChromeOptions() options.add_argument('headless') driver = webdriver.Chrome(chrome_options=options) driver.get('https://hourlypricing.comed.com/pricing-table-tomorrow/') table = driver.find_element_by_class_name('prices') tablehtml = table.get_attribute('outerHTML') soup = BeautifulSoup(tablehtml,'xml') table = soup.find("table", { "class" : "prices" }) #print(table) table_body = table.find('tbody') #print(table_body) data = [] rows = table_body.find_all('tr') for row in rows: cols = row.find_all('td') cols = [ele.text.strip() for ele in cols] cents = cols[1] cents = cents[:-1] cols[1] = cents data.append([ele for ele in cols if ele]) sortedData = sorted(data, key=itemgetter(1)) pprint(sortedData) driver.close()

2条回答

网友

1楼 · 编辑于 2024-04-26 14:14:51

而不是必须通过日历和选择每一天，因为这将是一个漫长的一天。相反，您可以直接转到信息源，将fetch（）的输出解析为beautiful soup，然后检索所有您想要的信息：）

我们正在计算一个月有多少天，将该列表传递到检索该天的GET请求中。都在12个月内。如果需要的话，你可以把它调整到很多年前。你知道吗

import requests
import calendar

def getDays(counter):

  b = calendar.monthcalendar(2018, counter)

  length = len(b)
  lengthCounter = 0
  days = []
  for x in b:
    lists = (b[lengthCounter])
    lengthCounter += 1
    for day in lists:
      if day > 0:
        days.append(day)
    else:
      pass
  return(days)

def fetch(days, month):
  if month < 10:
    month = "0" + str(month)

  for d in days:
    if d < 10:
        mod = "0" + str(d)
        re = requests.get("https://hourlypricing.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=2018" + str(month) + str(mod))
        source = re.content
        print(source)
    else:
      re = requests.get("https://hourlypricing.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=2018" + str(month) + str(d))
      source = re.content
      print(source)




months = 1
while months < 12:

    dayList = getDays(months)
    print(fetch(dayList, months))
    months +=1

网友

2楼 · 编辑于 2024-04-26 14:14:51

有历史价格信息的免费API。它允许您指定要为其检索值的范围。这是5分钟的价格，但有多种查询和不同的返回格式的选择

GET请求返回json的数据范围格式示例

https://hourlypricing.comed.com/api?type=5minutefeed&datestart=201712310000&dateend=201812310000

提供的日期格式为：yyyyMMddhhmm

API信息在此

https://hourlypricing.comed.com/hp-api/

JSON: returns an array of json objects with elements UTC millis and price.

[
{"millisUTC":"1434686700000","price":"2.0"},
{"millisUTC":"1434686100000″,"price”:"2.5"},
{"millisUTC":"1434685800000″,"price”:"2.5"}
]

相关问题更多 >

编程相关推荐

热门问题

热门文章