刮网，回库通知

1条回答

网友

1楼 · 发布于 2024-04-24 09:20:27

首先，EOL代表“行尾”，如果python不喜欢您定义字符串的方式或使用了一些不可靠的字符，那么通常会出现这个错误。为了避免这种情况，可以在原始代码中对字符串进行三次引用，如下所示：

stock_history = '''<div class="buy-now-bar-con">
<a class="current" href="antminer_s9_asic_bitcoin_miner.htm?
flag=overview">Overview</a>
<a href="antminer_s9_asic_bitcoin_miner.htm?
flag=specifications">Specification</a>
<a href="antminer_s9_asic_bitcoin_miner.htm?flag=gallery">Gallery</a>
<a class="btn-buy-now" href="javascript:;" style="background:#a7a4a4; 
cursor:not-allowed;" target="_self" title="sold out!">Coming soon</a>
</div>'''

这很难看，所以我去掉了那根大绳子，因为它没必要。您只需从stock变量中获得产品是否售罄的信息。为此，可以将bs4.element.Tag转换为str类型，并使用正则表达式检查是否存在“卖完了！”子字符串。正则表达式确实在任何地方都很有用，无论您在进行抓取、处理文本数据或执行任何形式的XML或HTML解析，所以我鼓励您仔细阅读它们。在

您可以在这里轻松测试python regex捕获：https://pythex.org/

这是修改后的代码，它可以完成您试图让它做的事情。在

^{pr2}$

试试看，有什么问题就告诉我！在

编辑：OP问如何定期检查网页和包括电子邮件通知。与原始解决方案相比，需要做一些更改，例如在requests headers字段中设置userAgent信息。还切换到html.parser而不是lxml，以便BeautifulSoup对象正确处理url.content中的javascript。在

import re
import time
import smtplib
import requests
from datetime import datetime 
from bs4 import BeautifulSoup

def stock_check(url):
    """Checks url for 'sold out!' substring in buy-now-bar-con"""
    soup = BeautifulSoup(url.content, "html.parser") #Need to use lxml parser
    stock = soup.find("div", "buy-now-bar-con") #Check the html tags for sold out/coming soon info.
    stock_status = re.findall(r"sold out!", str(stock)) #Returns list of captured substring if exists.
    return stock_status # returns "sold out!" from soup string.

def send_email(address, password, message):
    """Send an e-mail to yourself!"""
    server = smtplib.SMTP("smtp.gmail.com", 587) #e-mail server
    server.ehlo()
    server.starttls()
    server.login(address,password) #login
    message = str(message) #message to email yourself
    server.sendmail(address,address,message) #send the email through dedicated server
    return

def stock_check_listener(url, address, password, run_hours):
    """Periodically checks stock information."""
    listen = True # listen boolean
    start = datetime.now() # start time
    while(listen): #while listen = True, run loop
        if "sold out!" in stock_check(url): #check page
            now = datetime.now()
            print(str(now) + ": Not in stock.")
        else:
            message = str(now) + ": NOW IN STOCK!"
            print(message)
            send_email(address, password, message)
            listen = False

        duration = (now - start)
        seconds = duration.total_seconds()
        hours = int(seconds/3600)
        if hours >= run_hours: #check run time
            print("Finished.")
            listen = False

        time.sleep(30*60) #Wait N minutes to check again.    
    return

if __name__=="__main__":

    #Set url and userAgent header for javascript issues.
    page = "https://shop.bitmain.com/antminer_s9_asic_bitcoin_miner.htm"
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
    'Content-Type': 'text/html'}

    #URL request.
    url = requests.get(url=page,
                       headers=headers)

    #Run listener to stream stock checks.
    address = "user@gmail.com" #your email
    password = "user.password" #your email password
    stock_check_listener(url=url,
                         address=address,
                         password=password,
                         run_hours=1)

现在，程序将启动一个while循环，该循环周期性地从网页请求信息。您可以通过更改run_hours变量来设置超时（以小时为单位）。您还可以通过在stock_check_listener内更改N来设置睡眠/等待时间（以分钟为单位）。在本例中，我使用了gmail，如果您在给自己发电子邮件时收到错误，那么您需要遵循以下链接：https://myaccount.google.com/lesssecureapps，并允许不太安全的应用程序（您的python程序）访问您的gmail帐户。在

相关问题更多 >

编程相关推荐

热门问题

热门文章