如何使用带有下拉列表的selenium来刮取历史数据?

2024-04-26 14:14:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从以下URL中获取历史和预测的每小时能源价格:https://hourlypricing.comed.com/pricing-table-today/

我可以这样做的另一个表,这是明天的预测价格https://hourlypricing.comed.com/pricing-table-tomorrow/

…到目前为止,处理下拉列表有点让我不知所措。你知道吗

我不太明白怎么会有一个约会挑拣者。我想做的是提取2018年全年的数据。 当我使用Selenium IDE记录要执行的步骤时 在录制模式下根本不增加年份,但在不录制的情况下更改日期时工作正常?任何关于如何解决这个问题的建议都将不胜感激。据我所知,我应该能够在IDE中记录命令,而不是用python编写相同的代码?你知道吗

from pandas.io.html import read_html
from selenium import webdriver
from operator import itemgetter
#driver = webdriver.Firefox()
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument('headless')

driver = webdriver.Chrome(chrome_options=options)

driver.get('https://hourlypricing.comed.com/pricing-table-tomorrow/')

table = driver.find_element_by_class_name('prices')
tablehtml = table.get_attribute('outerHTML')
soup = BeautifulSoup(tablehtml,'xml')
table = soup.find("table", { "class" : "prices" })
#print(table)
table_body = table.find('tbody')
#print(table_body)

data = []
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    cents = cols[1]
    cents = cents[:-1]
    cols[1] = cents
    data.append([ele for ele in cols if ele])

sortedData = sorted(data, key=itemgetter(1))


pprint(sortedData)

driver.close()

Tags: fromhttpsimportcomdrivertablefindoptions
2条回答

而不是必须通过日历和选择每一天,因为这将是一个漫长的一天。相反,您可以直接转到信息源,将fetch()的输出解析为beautiful soup,然后检索所有您想要的信息:)

我们正在计算一个月有多少天,将该列表传递到检索该天的GET请求中。都在12个月内。如果需要的话,你可以把它调整到很多年前。你知道吗

import requests
import calendar

def getDays(counter):

  b = calendar.monthcalendar(2018, counter)

  length = len(b)
  lengthCounter = 0
  days = []
  for x in b:
    lists = (b[lengthCounter])
    lengthCounter += 1
    for day in lists:
      if day > 0:
        days.append(day)
    else:
      pass
  return(days)

def fetch(days, month):
  if month < 10:
    month = "0" + str(month)

  for d in days:
    if d < 10:
        mod = "0" + str(d)
        re = requests.get("https://hourlypricing.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=2018" + str(month) + str(mod))
        source = re.content
        print(source)
    else:
      re = requests.get("https://hourlypricing.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=2018" + str(month) + str(d))
      source = re.content
      print(source)




months = 1
while months < 12:

    dayList = getDays(months)
    print(fetch(dayList, months))
    months +=1

有历史价格信息的免费API。它允许您指定要为其检索值的范围。 这是5分钟的价格,但有多种查询和不同的返回格式的选择

GET请求返回json的数据范围格式示例

https://hourlypricing.comed.com/api?type=5minutefeed&datestart=201712310000&dateend=201812310000

提供的日期格式为:yyyyMMddhhmm

API信息在此

https://hourlypricing.comed.com/hp-api/


JSON: returns an array of json objects with elements UTC millis and price.

[
{"millisUTC":"1434686700000","price":"2.0"},
{"millisUTC":"1434686100000″,"price”:"2.5"},
{"millisUTC":"1434685800000″,"price”:"2.5"}
]

相关问题 更多 >