如何将此forloop用于此webscraping库存模块?

2024-05-29 11:52:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个小问题,我不明白这个循环

我有一张股票代码的清单

Stocks = ['ADVANC.BK', 'AOT.BK', 'AWC.BK', 'BAM.BK', 'BBL.BK', 'BDMS.BK', 'BEM.BK', 'BGRIM.BK']

我使用模块parinya来获取股票价格的数据 e、 g

data = parinya.getHistStock('ADVANC.BK, interval='1d', day_begin='01-01-2018', day_end='10-01-2018')
print(data)


Output >>
              Open   High    Low  Close   Adj Close      Volume
Date                                                          
2018-01-03  191.5  197.5  191.0  196.5  176.633011  15405400.0
2018-01-04  196.5  197.0  194.0  194.0  174.385773   8361700.0
2018-01-05  195.0  197.0  194.5  195.0  175.284637  10294800.0
2018-01-08  195.0  197.0  194.5  196.0  176.183548   8141500.0

所以我决定使用for-loop来获取所有数据

Data_all_stock = []
for stock in Stocks:
    data = parinya.getHistStock(stock, interval='1d', day_begin='01-01-2018', day_end='10-01-2018')
    Data_all_stock.append(data)

print(Data_all_stock)

这个错误发生了

/usr/local/lib/python3.6/dist-packages/parinya/stock.py in getHistStock(stock, interval, day_begin, day_end)
     89             data = website.text.split('\n')[:-1]
     90             data = [d.split(',') for d in data]
---> 91             col = data[0]
     92             #print(col)
     93     data = pd.DataFrame(data[1:])

IndexError: list index out of range

我也试过这个

Data_all_stock = []
for stock in Stocks:
    data = parinya.getHistStock(f"'{stock}'", interval='1d', day_begin='01-01-2018', day_end='10-01-2018')
    Data_all_stock.append(data)

print(Data_all_stock)

这仍然是错误的

/usr/local/lib/python3.6/dist-packages/parinya/stock.py in _get_crumbs_and_cookies(stock)
     68         crumb = re.findall('"CrumbStore":{"crumb":"(.+?)"}', str(soup))
     69 
---> 70         return (header, crumb[0], website.cookies)
     71 
     72 

IndexError: list index out of range

所以,我决定看一下模块parinya,这是stock.py,如下所示:

import requests
from bs4 import BeautifulSoup
import re
import string
from datetime import datetime
from time import mktime
import pandas as pd
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from yahoo_fin import stock_info as si

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

def getSETSymbols(sector = 'ALL'):
    if sector == 'ALL':
        symbols = []
        url = 'https://www.set.or.th/set/commonslookup.do?language=th&country=TH&prefix={{key}}'
        key = ['NUMBER']
        key.extend(list(string.ascii_uppercase))
        for k in key:
            r = requests.get(url.replace('{{key}}',k), verify=False)
            soup = BeautifulSoup(r.content, "html.parser")
            for i in soup.findAll('a', href=re.compile('.*companyprofile.*')):
                symbols.append(i.text)
        return symbols

    symbols = []
    url = 'https://marketdata.set.or.th/mkt/sectorquotation.do?sector=SET100&language=th&country=TH'
    if sector != 'SET100':
        url = url.replace('SET100', sector)
    r = requests.get(url, verify=False)
    soup = BeautifulSoup(r.content, "html.parser")
    for i in soup.findAll('a', href=re.compile('.*symbol.*')):
        symbols.append(i.text.strip())
    return symbols

def getSET100Price():
    url = 'https://marketdata.set.or.th/mkt/sectorquotation.do?sector=SET100&language=th&country=TH'
    r = requests.get(url, verify=False)
    soup = BeautifulSoup(r.content, "html.parser")
    for i in soup.findAll('caption'):
        timestamp = i.text
        timestamp = timestamp[timestamp.find('ข้อมูลล่าสุด'):].split()
        temp = timestamp[1].split('/')
        temp = temp[0] + '-' + temp[1] + '-' + str(int(temp[2]) - 543)
        timestamp = datetime.strptime(temp + ' ' + timestamp[2], '%d-%m-%Y %H:%M:%S')

    data = []
    for i in soup.findAll('a', href=re.compile('.*symbol.*')):
        children = i.parent.parent.findChildren("td", recursive=False)
        symbol = i.text.strip()
        price = float(children[9].text)
        data.append((symbol, price))

    return data, timestamp

def _get_crumbs_and_cookies(stock):
    url = 'https://finance.yahoo.com/quote/{}/history'.format(stock)
    with requests.session():
        header = {'Connection': 'keep-alive',
                  'Expires': '-1',
                  'Upgrade-Insecure-Requests': '1',
                  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) \
                   AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'
                  }

        website = requests.get(url, headers=header, verify=False)
        soup = BeautifulSoup(website.text, 'lxml')
        crumb = re.findall('"CrumbStore":{"crumb":"(.+?)"}', str(soup))

        return (header, crumb[0], website.cookies)


def convert_to_unix(date):
    datum = datetime.strptime(date, '%d-%m-%Y')
    return int(mktime(datum.timetuple()))


def getHistStock(stock, interval='1d', day_begin='01-03-2018', day_end='28-03-2018'):
    day_begin_unix = convert_to_unix(day_begin)
    day_end_unix = convert_to_unix(day_end)
    col = [1]
    while len(col)==1:
        header, crumb, cookies = _get_crumbs_and_cookies(stock)
        with requests.session():
            url = 'https://query1.finance.yahoo.com/v7/finance/download/' \
                  '{stock}?period1={day_begin}&period2={day_end}&interval={interval}&events=history&crumb={crumb}' \
                .format(stock=stock, day_begin=day_begin_unix, day_end=day_end_unix, interval=interval, crumb=crumb)
            website = requests.get(url, headers=header, cookies=cookies, verify=False)
            data = website.text.split('\n')[:-1]
            data = [d.split(',') for d in data]
            col = data[0]
            #print(col)
    data = pd.DataFrame(data[1:])
    data.columns = col
    data.set_index('Date', inplace=True)
    return data.apply(pd.to_numeric, downcast='float', errors='coerce')

def getLivePrice(stock):
    return si.get_live_price(stock)



Tags: inimporturlfordatagetstockrequests
1条回答
网友
1楼 · 发布于 2024-05-29 11:52:39

原因:

我怀疑getHistStock对于无效的日期选择没有足够的错误处理。我怀疑您当前选择的day_begin='01-01-2018', day_end='10-01-2018'日期超出了以预期形式返回数据的有效期,至少您的一些股票代码;这将导致您的错误

如果我去finance.yahoo.com查找你的每一个股票代码,当我到达AWC.BK时,我看到网站显示Date shouldn't be prior to '2019-10-10'-aquick check for the IPO date给我'10/10/2019'作为浮动/上市日期,即有数据的第一个日期。您可以对^{}Date shouldn't be prior to '2019-12-16'^{}重复此操作

免责声明:您可以选择使用IPO检查器。我对股票一无所知,所以我只是简单地用雅虎财经的资料表来检查雅虎财经所显示的日期


可能的修复方法:

最基本的是,您可以在循环for stock in Stocks:中添加try except错误包装器;按照您认为合适的方式处理错误,例如在except中添加“无数据”列表项,然后继续循环

当然,您也可以提前检查每个股票代码的IPO日期,并相应地调整所请求的范围,但您可能仍然需要错误处理作为意外结果的良好实践


假设的测试案例:(未测试

预期失败-

data = parinya.getHistStock('AWC.BK', interval='1d', day_begin='01-01-2018', day_end='10-01-2018')
print(data)

预期通过-

data = parinya.getHistStock('AWC.BK', interval='1d', day_begin='10-10-2019', day_end='20-10-2019')
print(data)

相关问题 更多 >

    热门问题