asynchronous请求与Python请求

我尝试了Python的请求库的文档中提供的示例:

http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests

async.map(rs)我得到的响应代码,但我想获得每个页面的内容请求。

 out = async.map(rs) print out[0].content 

例如只是不工作。

注意

以下答案不适用于请求v0.13.0 +。 写完这个问题后,asynchronousfunction被转移到了grequests 。 不过,你可以用下面的grequestsreplacerequests ,它应该可以工作。

我已经离开了这个答案,以反映原来的问题是关于使用请求<v0.13.0。


asynchronous执行async.map多个任务,您必须:

  1. 为每个对象定义一个函数(你的任务)
  2. 将该函数添加为请求中的事件挂钩
  3. 在所有请求/操作的列表中调用async.map

例:

 from requests import async # If using requests > v0.13.0, use # from grequests import async urls = [ 'http://python-requests.org', 'http://httpbin.org', 'http://python-guide.org', 'http://kennethreitz.com' ] # A simple task to do to each response object def do_something(response): print response.url # A list to hold our things to do via async async_list = [] for u in urls: # The "hooks = {..." part is where you define what you want to do # # Note the lack of parentheses following do_something, this is # because the response will be used as the first argument automatically action_item = async.get(u, hooks = {'response' : do_something}) # Add the task to our list of things to do via async async_list.append(action_item) # Do our list of things to do via async async.map(async_list) 

async现在是一个独立的模块: grequests

看到这里: https : //github.com/kennethreitz/grequests

那里: 通过Python发送多个HTTP请求的理想方法?

安装:

 $ pip install grequests 

用法:

build立一个堆栈:

 import grequests urls = [ 'http://www.heroku.com', 'http://tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://kennethreitz.com' ] rs = (grequests.get(u) for u in urls) 

发送堆栈

 grequests.map(rs) 

结果看起来像

 [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>] 

grequests似乎没有设置对并发请求的限制,即当多个请求被发送到同一个服务器。

也许请求期货是另一种select。

 from requests_futures.sessions import FuturesSession session = FuturesSession() # first request is started in background future_one = session.get('http://httpbin.org/get') # second requests is started immediately future_two = session.get('http://httpbin.org/get?foo=bar') # wait for the first request to complete, if it hasn't already response_one = future_one.result() print('response one status: {0}'.format(response_one.status_code)) print(response_one.content) # wait for the second request to complete, if it hasn't already response_two = future_two.result() print('response two status: {0}'.format(response_two.status_code)) print(response_two.content) 

这也是build议在办公文件中 。 如果你不想涉及到gevent,这是一个很好的。

我testing了请求 – 期货grequests 。 Grequests更快,但带来了猴子补丁和依赖的附加问题。 请求 – 期货比grequests慢几倍。 我决定写我自己的,简单的包装请求到ThreadPollExecutor,它几乎和grequests一样快,但没有外部依赖。

 import requests import concurrent.futures def get_urls(): return ["url1","url2"] def load_url(url, timeout): return requests.get(url, timeout = timeout) with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor: future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: resp_err = resp_err + 1 else: resp_ok = resp_ok + 1 

我知道这已经closures了一段时间,但我认为这可能是有用的,促进另一个asynchronous解决scheme构build在请求库。

 list_of_requests = ['http://moop.com', 'http://doop.com', ...] from simple_requests import Requests for response in Requests().swarm(list_of_requests): print response.content 

文档在这里: http : //pythonhosted.org/simple-requests/

 threads=list() for requestURI in requests: t = Thread(target=self.openURL, args=(requestURI,)) t.start() threads.append(t) for thread in threads: thread.join() ... def openURL(self, requestURI): o = urllib2.urlopen(requestURI, timeout = 600) o... 

我一直在使用python请求asynchronous调用github的主要API。

有关示例,请参阅此处的代码:

https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72

这种python风格可能不是最明显的例子,但我可以向你保证,代码的作品。 让我知道这是否让你感到困惑,我会logging下来。

我也尝试了一些使用Python中的asynchronous方法的东西,如何使用双向asynchronous编程有更好的运气。 它有较less的问题,并有据可查。 这是一个类似于你正在扭曲的东西的链接。

http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html