Python脚本可以工作,但编译后失败(Windows)

2024-04-26 04:15:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个脚本来抓取一个网站,问题是当我用解释器运行它时,它工作正常,但是在编译它(PyInstaller或Py2exe)失败后,它似乎是mechanize/请求都无法保持会话活动。你知道吗

我已经在这里隐藏了我的用户名和密码,但我确实在编译的代码中正确地输入了它们

import requests
from bs4 import BeautifulSoup as bs
from sys import argv
import re
import logging

url = argv[1]
payload = {"userName": "real_username", "password": "realpassword"}
session = requests.session()
resp = session.post("http://website.net/login.do", data=payload)
if "forgot" in resp.content:
    logging.error("Login failed")
    exit()

resp = session.get(url)
soup = bs(resp.content)
urlM = url[:url.find("?") + 1] + "page=(PLACEHOLDER)&" + \
url[url.find("?") + 1:]
# Get number of pages
regex = re.compile("\|.*\|\sof\s(\d+)")
script = str(soup.findAll("script")[1])
epNum = int(re.findall(regex, script)[0])  # Number of EPs
pagesNum = epNum // 50
links = []
# Get list of links
# If number of EPs > 50, more than one page
if pagesNum == 0:
    links = [url]
else:
    for i in range(1, pagesNum + 2):
        url = urlM.replace("(PLACEHOLDER)", str(i))
        links.append(url)
# Loop over the links and extract info: ID, NAME, START_DATE, END_DATE
raw_info = []
for pos, link in enumerate(links):
    print "Processing page %d" % (pos + 1)
    sp = bs(session.get(link).content)
    table = sp.table.table
    raw_info.extend(table.findAll("td"))
epURL = "http://www.website.net/exchange/viewep.do?operation"\
"=executeAction&epId="
# Final data extraction
raw_info = map(str, raw_info)
ids = [re.findall("\d+", i)[0] for i in raw_info[::4]]
names = [re.findall("<td>(.*)</td", i)[0] for i in raw_info[1::4]]
start_dates = [re.findall("<td>(.*)</td", i)[0] for i in raw_info[2::4]]
end_dates = [re.findall("<td>(.*)</td", i)[0] for i in raw_info[3::4]]
emails = []
eplinks = [epURL + str(i) for i in ids]
print names

错误发生在epNum变量的级别上,这意味着我发现HTML页面不是我请求的页面,但它在linux脚本和编译时正常工作,在widows上作为脚本工作,但在编译时失败。你知道吗


Tags: ofinimportreinfourlforraw