如何使用Python将网页保存为图片
我正在用Python创建一个网站的“收藏夹”部分。我的目标之一是抓取一张图片,放在用户链接旁边。具体来说,就是用户输入一个网址,我就去抓取那个页面的截图,然后把它显示在链接旁边。听起来很简单,对吧?
我目前下载了pywebshot,在我本地的终端上运行得很好。但是,当我把它放到服务器上时,出现了一个段错误,错误信息如下:
/usr/lib/pymodules/python2.6/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
warnings.warn(str(e), _gtk.Warning)
./pywebshot.py:16: Warning: invalid (NULL) pointer instance
self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:16: Warning: g_signal_connect_data: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed
self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:49: GtkWarning: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen)' failed
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_colormap_get_visual: assertion `GDK_IS_COLORMAP (colormap)' failed
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen)' failed
self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_window_new: assertion `GDK_IS_WINDOW (parent)' failed
self.parent.show_all()
Segmentation fault
我知道有些东西在pts环境下无法运行,但老实说,这对我来说有点复杂。如果我需要以某种方式假装我的pts连接是tty,我可以试试。但现在我甚至不太明白发生了什么,承认这对我来说有点难。任何帮助都会非常感激。
另外,如果有一个网络服务,我可以传一个网址并收到一张图片,那也可以。我并不一定非要用pywebshot。
我知道我所在的服务器正在运行X,并且安装了所有必要的Python模块。
提前谢谢你们。
4 个回答
1
from selenium import webdriver
from xvfbwrapper import Xvfb
d=Xvfb(width=400,height=400)
d.start()
browser=webdriver.Firefox()
url="http://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python"
browser.get(url)
destination="screenshot_filename.jpg"
if browser.save_screenshot(destination):
print "File saved in the destination filename"
browser.quit()
当然可以!请把你想要翻译的内容发给我,我会帮你用简单易懂的语言解释清楚。
2
这是我用来获取整个滚动网页截图的代码:
from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import logging
import os
import time
# Set default download folder for ChromeDriver
videos_folder = r"./download"
if not os.path.exists(videos_folder):
os.makedirs(videos_folder)
prefs = {"download.default_directory": videos_folder}
def open_url(address):
# SELENIUM SETUP
logging.getLogger('WDM').setLevel(logging.WARNING) # just to hide not so rilevant webdriver-manager messages
chrome_options = Options()
chrome_options.headless = True
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
driver.implicitly_wait(1)
driver.maximize_window()
driver.get(address)
driver.set_window_size(1920, 1080) # to set the screenshot width
save_screenshot(driver, '{}/Screenshot.png'.format(videos_folder))
driver.quit()
def save_screenshot(driver, file_name):
height, width = scroll_down(driver)
driver.set_window_size(width, height)
img_binary = driver.get_screenshot_as_png()
img = Image.open(BytesIO(img_binary))
img.save(file_name)
# print(file_name)
print("Screenshot saved!")
def scroll_down(driver):
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
rectangles.append((ii, i, top_width, top_height))
ii = ii + viewport_width
i = i + viewport_height
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
time.sleep(0.5)
# time.sleep(0.2)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
previous = rectangle
return total_height, total_width
open_url("https://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python")
这是得到的截图:
1
我发现了一个网站 websnapr.com,这是一个网络服务,只需要稍微动动手就能给你提供图片。
import subprocess
subprocess.Popen(['wget', '-O', MYFILENAME+'.png', 'http://images.websnapr.com/?url='+MYURL+'&size=s&nocache=82']).wait()
简单得像吃蛋糕一样。