刮网,回库通知

2024-04-24 09:20:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我想建立一个Python脚本,告诉我产品是否有货。目前,它抓取下面的url并解析网站的相关部分,但我无法理解如何将这个我称为stock的输出变量存储为另一个名为stock_history的变量,然后运行另一行询问stock是否等于stock_history

我也得到了一个EOL,同时扫描字符串文字错误,而试图存储html数据在股票历史。有更好的方法吗?在

import requests
from datetime import datetime 
from bs4 import BeautifulSoup
import csv
now = datetime.now()
#enter website address
url = requests.get('https://shop.bitmain.com/antminer_s9_asic_bitcoin_miner.htm')

soup = BeautifulSoup(url.content,'html')

stock = (soup.find("div", "buy-now-bar-con"))

stock_history = '<div class="buy-now-bar-con">
<a class="current" href="antminer_s9_asic_bitcoin_miner.htm?flag=overview">Overview</a>
<a href="antminer_s9_asic_bitcoin_miner.htm?flag=specifications">Specification</a>
<a href="antminer_s9_asic_bitcoin_miner.htm?flag=gallery">Gallery</a>
<a class="btn-buy-now" href="javascript:;" style="background:#a7a4a4; cursor:not-allowed;" target="_self" title="sold out!">Coming soon</a>
</div>'


print(stock)

if stock == stock_history 
    print("still not in stock")

Tags: importdivurldatetimestockbuyhistorynow
1条回答
网友
1楼 · 发布于 2024-04-24 09:20:27

首先,EOL代表“行尾”,如果python不喜欢您定义字符串的方式或使用了一些不可靠的字符,那么通常会出现这个错误。为了避免这种情况,可以在原始代码中对字符串进行三次引用,如下所示:

stock_history = '''<div class="buy-now-bar-con">
<a class="current" href="antminer_s9_asic_bitcoin_miner.htm?
flag=overview">Overview</a>
<a href="antminer_s9_asic_bitcoin_miner.htm?
flag=specifications">Specification</a>
<a href="antminer_s9_asic_bitcoin_miner.htm?flag=gallery">Gallery</a>
<a class="btn-buy-now" href="javascript:;" style="background:#a7a4a4; 
cursor:not-allowed;" target="_self" title="sold out!">Coming soon</a>
</div>'''

这很难看,所以我去掉了那根大绳子,因为它没必要。您只需从stock变量中获得产品是否售罄的信息。为此,可以将bs4.element.Tag转换为str类型,并使用正则表达式检查是否存在“卖完了!”子字符串。正则表达式确实在任何地方都很有用,无论您在进行抓取、处理文本数据或执行任何形式的XML或HTML解析,所以我鼓励您仔细阅读它们。在

更多信息:https://www.regular-expressions.info/

您可以在这里轻松测试python regex捕获:https://pythex.org/

这是修改后的代码,它可以完成您试图让它做的事情。在

^{pr2}$

试试看,有什么问题就告诉我!在

编辑:OP问如何定期检查网页和包括电子邮件通知。与原始解决方案相比,需要做一些更改,例如在requests headers字段中设置userAgent信息。还切换到html.parser而不是lxml,以便BeautifulSoup对象正确处理url.content中的javascript。在

import re
import time
import smtplib
import requests
from datetime import datetime 
from bs4 import BeautifulSoup

def stock_check(url):
    """Checks url for 'sold out!' substring in buy-now-bar-con"""
    soup = BeautifulSoup(url.content, "html.parser") #Need to use lxml parser
    stock = soup.find("div", "buy-now-bar-con") #Check the html tags for sold out/coming soon info.
    stock_status = re.findall(r"sold out!", str(stock)) #Returns list of captured substring if exists.
    return stock_status # returns "sold out!" from soup string.

def send_email(address, password, message):
    """Send an e-mail to yourself!"""
    server = smtplib.SMTP("smtp.gmail.com", 587) #e-mail server
    server.ehlo()
    server.starttls()
    server.login(address,password) #login
    message = str(message) #message to email yourself
    server.sendmail(address,address,message) #send the email through dedicated server
    return

def stock_check_listener(url, address, password, run_hours):
    """Periodically checks stock information."""
    listen = True # listen boolean
    start = datetime.now() # start time
    while(listen): #while listen = True, run loop
        if "sold out!" in stock_check(url): #check page
            now = datetime.now()
            print(str(now) + ": Not in stock.")
        else:
            message = str(now) + ": NOW IN STOCK!"
            print(message)
            send_email(address, password, message)
            listen = False

        duration = (now - start)
        seconds = duration.total_seconds()
        hours = int(seconds/3600)
        if hours >= run_hours: #check run time
            print("Finished.")
            listen = False

        time.sleep(30*60) #Wait N minutes to check again.    
    return

if __name__=="__main__":

    #Set url and userAgent header for javascript issues.
    page = "https://shop.bitmain.com/antminer_s9_asic_bitcoin_miner.htm"
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
    'Content-Type': 'text/html'}

    #URL request.
    url = requests.get(url=page,
                       headers=headers)

    #Run listener to stream stock checks.
    address = "user@gmail.com" #your email
    password = "user.password" #your email password
    stock_check_listener(url=url,
                         address=address,
                         password=password,
                         run_hours=1) 

现在,程序将启动一个while循环,该循环周期性地从网页请求信息。您可以通过更改run_hours变量来设置超时(以小时为单位)。您还可以通过在stock_check_listener内更改N来设置睡眠/等待时间(以分钟为单位)。在本例中,我使用了gmail,如果您在给自己发电子邮件时收到错误,那么您需要遵循以下链接:https://myaccount.google.com/lesssecureapps,并允许不太安全的应用程序(您的python程序)访问您的gmail帐户。在

相关问题 更多 >