Python请求和持久会话

我正在使用请求模块（版本0.10.0与Python 2.5）。我已经想出了如何将数据提交到网站上的login表单并检索会话密钥，但是我看不到在后续请求中使用此会话密钥的明显方法。有人可以在下面的代码填写省略号或build议另一种方法？

>>> import requests >>> login_data = {'formPosted':'1', 'login_email':'me@example.com', 'password':'pw'} >>> r = requests.post('https://localhost/login.py', login_data) >>> >>> r.text u'You are being redirected <a href="profilePage?_ck=1349394964">here</a>' >>> r.cookies {'session_id_myapp': '127-0-0-1-825ff22a-6ed1-453b-aebc-5d3cf2987065'} >>> >>> r2 = requests.get('https://localhost/profile_data.json', ...)

您可以使用以下方式轻松创build持久会话：

 s = requests.session()

之后，请继续按照您的要求：

 s.post('https://localhost/login.py', login_data) #logged in! cookies saved for future requests. r2 = s.get('https://localhost/profile_data.json', ...) #cookies sent automatically! #do whatever, s will keep your cookies intact :)

有关会话的更多信息，请访问： http : //docs.python-requests.org/en/latest/user/advanced/#session-objects

看看我的答案在这个类似的问题：

python：urllib2如何用urlopen请求发送cookie

 import urllib2 import urllib from cookielib import CookieJar cj = CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) # input-type values from the html form formdata = { "username" : username, "password": password, "form-id" : "1234" } data_encoded = urllib.urlencode(formdata) response = opener.open("https://page.com/login.php", data_encoded) content = response.read()

编辑：

我看到我已经得到了一个答案，但没有解释评论。我猜这是因为我指的是urllib库而不是requests 。我这样做是因为OP要求帮助requests或者有人build议另一种方法。

该文档说， get一个可选的cookies参数，允许您指定要使用的cookie：

从文档：

 >>> url = 'http://httpbin.org/cookies' >>> cookies = dict(cookies_are='working') >>> r = requests.get(url, cookies=cookies) >>> r.text '{"cookies": {"cookies_are": "working"}}'

http://docs.python-requests.org/en/latest/user/quickstart/#cookies

其他答案有助于了解如何维护这样一个会议。此外，我想提供一个类，使会话保持在脚本的不同运行（使用caching文件）。这意味着只有在需要时才执行正确的“login”（高速caching中没有会话或不存在会话）。此外，它支持通过后续调用“get”或“post”的代理设置。

它使用Python3进行testing。

用它作为你自己的代码的基础。以下代码片段随GPL v3发布

 class MyLoginSession: """ a class which handles and saves login sessions. It also keeps track of proxy settings. It does also maintine a cache-file for restoring session data from earlier script executions. """ def __init__(self, loginUrl, loginData, loginTestUrl, loginTestString, sessionFileAppendix = '_session.dat', maxSessionTimeSeconds = 30 * 60, proxies = None, userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1', debug = True): """ save some information needed to login the session you'll have to provide 'loginTestString' which will be looked for in the responses html to make sure, you've properly been logged in 'proxies' is of format { 'https' : 'https://user:pass@server:port', 'http' : ... 'loginData' will be sent as post data (dictionary of id : value). 'maxSessionTimeSeconds' will be used to determine when to re-login. """ urlData = urlparse(loginUrl) self.proxies = proxies self.loginData = loginData self.loginUrl = loginUrl self.loginTestUrl = loginTestUrl self.maxSessionTime = maxSessionTimeSeconds self.sessionFile = urlData.netloc + sessionFileAppendix self.userAgent = userAgent self.loginTestString = loginTestString self.debug = debug self.login() def modification_date(self, filename): """ return last file modification date as datetime object """ t = os.path.getmtime(filename) return datetime.datetime.fromtimestamp(t) def login(self, forceLogin = False): """ login to a session. Try to read last saved session from cache file. If this fails do proper login. If the last cache access was too old, also perform a proper login. Always updates session cache file. """ wasReadFromCache = False if self.debug: print('loading or generating session...') if os.path.exists(self.sessionFile) and not forceLogin: time = self.modification_date(self.sessionFile) # only load if file less than 30 minutes old lastModification = (datetime.datetime.now() - time).seconds if lastModification < self.maxSessionTime: with open(self.sessionFile, "rb") as f: self.session = pickle.load(f) wasReadFromCache = True if self.debug: print("loaded session from cache (last access %ds ago) " % lastModification) if not wasReadFromCache: self.session = requests.Session() self.session.headers.update({'user-agent' : self.userAgent}) res = self.session.post(self.loginUrl, data = self.loginData, proxies = self.proxies) if self.debug: print('created new session with login' ) self.saveSessionToCache() # test login res = self.session.get(self.loginTestUrl) if res.text.lower().find(self.loginTestString.lower()) < 0: raise Exception("could not log into provided site '%s'" " (did not find successful login string)" % self.loginUrl) def saveSessionToCache(self): """ save session to a cache file """ # always save (to update timeout) with open(self.sessionFile, "wb") as f: pickle.dump(self.session, f) if self.debug: print('updated session cache-file %s' % self.sessionFile) def retrieveContent(self, url, method = "get", postData = None): """ return the content of the url with respect to the session. If 'method' is not 'get', the url will be called with 'postData' as a post request. """ if method == 'get': res = self.session.get(url , proxies = self.proxies) else: res = self.session.get(url , data = postData, proxies = self.proxies) # the session has been updated on the server, so also update in cache self.saveSessionToCache() return res

使用上述类的代码片段可能如下所示：

 if __name__ == "__main__": # proxies = {'https' : 'https://user:pass@server:port', # 'http' : 'http://user:pass@server:port'} loginData = {'user' : 'usr', 'password' : 'pwd'} loginUrl = 'https://...' loginTestUrl = 'https://...' successStr = 'Hello Tom' s = MyLoginSession(loginUrl, loginData, loginTestUrl, successStr, #proxies = proxies ) res = s.retrieveContent('https://....') print(res.text)

在尝试上面的所有答案后，我发现使用RequestsCookieJar而不是常规的CookieJar来处理后续的请求。

 import requests import json authUrl = 'https://whatever.com/login' #The subsequent url testUrl = 'https://whatever.com/someEndpoint' #Whatever you are posting login_data = {'formPosted':'1', 'login_email':'me@example.com', 'password':'pw'} #The auth token or any other data that we will recieve from the authRequest. token = '' # Post the loginRequest loginRequest = requests.post(authUrl,login_data) print loginRequest.text # Save the request content to your variable. In this case I needed a field called token. token = str(json.loads(loginRequest.content)['token']) print token # Verify successfull login print loginRequest.status_code #Create your RequestsCookieJar for your subsequent requests and add the cookie jar = requests.cookies.RequestsCookieJar() jar.set('LWSSO_COOKIE_KEY', token) #Execute your next request(s) with the RequestCookieJar set r = requests.get(testUrl, cookies=jar) print(r.text) print(r.status_code)

snippet检索json数据，密码保护

 import requests username = "my_user_name" password = "my_super_secret" url = "https://www.my_base_url.com" the_page_i_want = "/my_json_data_page" session = requests.Session() # retrieve cookie value resp = session.get(url+'/login') csrf_token = resp.cookies['csrftoken'] # login, add referer resp = session.post(url+"/login", data={ 'username': username, 'password': password, 'csrfmiddlewaretoken': csrf_token, 'next': the_page_i_want, }, headers=dict(Referer=url+"/login")) print(resp.json())

Python请求和持久会话

在Python中查找数字的所有因素的最有效的方法是什么？

为什么在variables名称后添加尾随逗号使其成为一个元组？

Python日志logging（函数名称，文件名，行号）使用单个文件

如何在Python中find目标文件的完整（绝对path）符号链接或软链接

更新不同深度的嵌套字典的值

Python：builtin和builtins有什么区别？

Python（列表理解）：为每个项目返回两个（或更多）项目

如何在Python中自定义月份而不使用库增加date时间

Python3variables名称的简单区别可以改变代码的运行方式吗？

Python中的模块乘法逆函数

Python请求和持久会话

在Python中查找数字的所有因素的最有效的方法是什么？

为什么在variables名称后添加尾随逗号使其成为一个元组？

Python日志logging（函数名称，文件名，行号）使用单个文件

如何在Python中find目标文件的完整（绝对path）符号链接或软链接

更新不同深度的嵌套字典的值

Python：__builtin__和__builtins__有什么区别？

Python（列表理解）：为每个项目返回两个（或更多）项目

如何在Python中自定义月份而不使用库增加date时间

Python3variables名称的简单区别可以改变代码的运行方式吗？

Python中的模块乘法逆函数

Python：builtin和builtins有什么区别？