我怎样才能使用Python截图/网站图像?

我想实现的是从任何网站获取一个网站的截图在Python中。

Env:Linux

在Mac上,有webkit2png ,在Linux + KDE上,可以使用khtml2png 。 我试过前者,效果很好,听说后者正在使用。

我最近遇到了QtWebKit ,它声称是跨平台的(Qt把WebKit放到他们的库中,我想)。 但我从来没有尝试过,所以我不能告诉你更多。

QtWebKit链接显示了如何从Python访问。 您应该至less可以使用subprocess来完成其他任务。

这是一个简单的解决scheme,使用webkit: http : //webscraping.com/blog/Webpage-screenshots-with-webkit/

import sys import time from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import * class Screenshot(QWebView): def __init__(self): self.app = QApplication(sys.argv) QWebView.__init__(self) self._loaded = False self.loadFinished.connect(self._loadFinished) def capture(self, url, output_file): self.load(QUrl(url)) self.wait_load() # set to webpage size frame = self.page().mainFrame() self.page().setViewportSize(frame.contentsSize()) # render image image = QImage(self.page().viewportSize(), QImage.Format_ARGB32) painter = QPainter(image) frame.render(painter) painter.end() print 'saving', output_file image.save(output_file) def wait_load(self, delay=0): # process app events until page loaded while not self._loaded: self.app.processEvents() time.sleep(delay) self._loaded = False def _loadFinished(self, result): self._loaded = True s = Screenshot() s.capture('http://webscraping.com', 'website.png') s.capture('http://webscraping.com/blog', 'blog.png') 

这是我的解决scheme,从各种来源获取帮助。 它采取完整的网页屏幕捕捉,它裁剪(可选),并从裁剪图像也生成缩略图。 以下是要求:

要求:

  1. 安装NodeJS
  2. 使用Node的包pipe理器安装phantomjs: npm -g install phantomjs
  3. 安装selenium(在你的virtualenv,如果你使用的话)
  4. 安装imageMagick
  5. 添加phantomjs到系统path(在Windows上)

 import os from subprocess import Popen, PIPE from selenium import webdriver abspath = lambda *p: os.path.abspath(os.path.join(*p)) ROOT = abspath(os.path.dirname(__file__)) def execute_command(command): result = Popen(command, shell=True, stdout=PIPE).stdout.read() if len(result) > 0 and not result.isspace(): raise Exception(result) def do_screen_capturing(url, screen_path, width, height): print "Capturing screen.." driver = webdriver.PhantomJS() # it save service log file in same directory # if you want to have log file stored else where # initialize the webdriver.PhantomJS() as # driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log') driver.set_script_timeout(30) if width and height: driver.set_window_size(width, height) driver.get(url) driver.save_screenshot(screen_path) def do_crop(params): print "Croping captured image.." command = [ 'convert', params['screen_path'], '-crop', '%sx%s+0+0' % (params['width'], params['height']), params['crop_path'] ] execute_command(' '.join(command)) def do_thumbnail(params): print "Generating thumbnail from croped captured image.." command = [ 'convert', params['crop_path'], '-filter', 'Lanczos', '-thumbnail', '%sx%s' % (params['width'], params['height']), params['thumbnail_path'] ] execute_command(' '.join(command)) def get_screen_shot(**kwargs): url = kwargs['url'] width = int(kwargs.get('width', 1024)) # screen width to capture height = int(kwargs.get('height', 768)) # screen height to capture filename = kwargs.get('filename', 'screen.png') # file name eg screen.png path = kwargs.get('path', ROOT) # directory path to store screen crop = kwargs.get('crop', False) # crop the captured screen crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture? thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image? screen_path = abspath(path, filename) crop_path = thumbnail_path = screen_path if thumbnail and not crop: raise Exception, 'Thumnail generation requires crop image, set crop=True' do_screen_capturing(url, screen_path, width, height) if crop: if not crop_replace: crop_path = abspath(path, 'crop_'+filename) params = { 'width': crop_width, 'height': crop_height, 'crop_path': crop_path, 'screen_path': screen_path} do_crop(params) if thumbnail: if not thumbnail_replace: thumbnail_path = abspath(path, 'thumbnail_'+filename) params = { 'width': thumbnail_width, 'height': thumbnail_height, 'thumbnail_path': thumbnail_path, 'crop_path': crop_path} do_thumbnail(params) return screen_path, crop_path, thumbnail_path if __name__ == '__main__': ''' Requirements: Install NodeJS Using Node's package manager install phantomjs: npm -g install phantomjs install selenium (in your virtualenv, if you are using that) install imageMagick add phantomjs to system path (on windows) ''' url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python' screen_path, crop_path, thumbnail_path = get_screen_shot( url=url, filename='sof.png', crop=True, crop_replace=False, thumbnail=True, thumbnail_replace=False, thumbnail_width=200, thumbnail_height=150, ) 

这些是生成的图像:

  • 完整的网页屏幕
  • 从捕获的屏幕裁剪图像
  • 裁剪图像的缩略图

我不能评论ars的答案,但我实际上已经使用QtWebkit运行Roland Tapken的代码 ,并且工作得很好。

只是想确认Roland在他的博客上发布的内容在Ubuntu上的效果如何。 我们的产品版本最终没有使用他写的任何东西,但是我们使用PyQt / QtWebKit绑定取得了很大的成功。

你没有提到你在运行什么环境,这是一个很大的变化,因为没有一个纯粹的Python Web浏览器能够呈现HTML。

但是,如果您使用的是Mac,我已经使用webkit2png取得了巨大的成功。 如果没有,正如其他人指出的,有很多select。

尝试这个..

 #!/usr/bin/env python import gtk.gdk import time import random while 1 : # generate a random time between 120 and 300 sec random_time = random.randrange(120,300) # wait between 120 and 300 seconds (or between 2 and 5 minutes) print "Next picture in: %.2f minutes" % (float(random_time) / 60) time.sleep(random_time) w = gtk.gdk.get_default_root_window() sz = w.get_size() print "The size of the window is %dx %d" % sz pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1]) pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1]) ts = time.time() filename = "screenshot" filename += str(ts) filename += ".png" if (pb != None): pb.save(filename,"png") print "Screenshot saved to "+filename else: print "Unable to get the screenshot."