Python urllib2,基本的HTTPauthentication和tr.im

我正在玩耍,试图编写一些代码来使用tr.im API来缩短URL。

阅读http://docs.python.org/library/urllib2.html后 ,我试了一下:

TRIM_API_URL = 'http://api.tr.im/api' auth_handler = urllib2.HTTPBasicAuthHandler() auth_handler.add_password(realm='tr.im', uri=TRIM_API_URL, user=USERNAME, passwd=PASSWORD) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) response = urllib2.urlopen('%s/trim_simple?url=%s' % (TRIM_API_URL, url_to_trim)) url = response.read().strip() 

response.code是200(我认为应该是202)。 url是有效的,但基本的HTTPauthentication似乎并没有工作,因为缩短的URL不在我的URL列表(在http://tr.im/?page=1 )。

阅读http://www.voidspace.org.uk/python/articles/authentication.shtml#doing-it-properly我也试过:

  TRIM_API_URL = 'api.tr.im/api' password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, TRIM_API_URL, USERNAME, PASSWORD) auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) response = urllib2.urlopen('http://%s/trim_simple?url=%s' % (TRIM_API_URL, url_to_trim)) url = response.read().strip() 

但是我得到了同样的结果。 (response.code是200和url是有效的,但没有logging在我的帐户在http://tr.im/ 。)

如果我使用查询string参数而不是基本的HTTP身份validation,如下所示:

  TRIM_API_URL = 'http://api.tr.im/api' response = urllib2.urlopen('%s/trim_simple?url=%s&username=%s&password=%s' % (TRIM_API_URL, url_to_trim, USERNAME, PASSWORD)) url = response.read().strip() 

…不仅url有效,而且logging在我的tr.im帐户中。 (虽然response.code仍然是200.)

但是,我的代码一定有问题(而不是tr.im的API),因为

 $ curl -u yacitus:xxxx http://api.tr.im/api/trim_url.json?url=http://www.google.co.uk 

…的回报:

 {"trimpath":"hfhb","reference":"nH45bftZDWOX0QpVojeDbOvPDnaRaJ","trimmed":"11\/03\/2009","destination":"http:\/\/www.google.co.uk\/","trim_path":"hfhb","domain":"google.co.uk","url":"http:\/\/tr.im\/hfhb","visits":0,"status":{"result":"OK","code":"200","message":"tr.im URL Added."},"date_time":"2009-03-11T10:15:35-04:00"} 

…并且url出现在我的http://tr.im/?page=1url列表中。

如果我运行:

 $ curl -u yacitus:xxxx http://api.tr.im/api/trim_url.json?url=http://www.google.co.uk 

…再次,我得到:

 {"trimpath":"hfhb","reference":"nH45bftZDWOX0QpVojeDbOvPDnaRaJ","trimmed":"11\/03\/2009","destination":"http:\/\/www.google.co.uk\/","trim_path":"hfhb","domain":"google.co.uk","url":"http:\/\/tr.im\/hfhb","visits":0,"status":{"result":"OK","code":"201","message":"tr.im URL Already Created [yacitus]."},"date_time":"2009-03-11T10:15:35-04:00"} 

注意代码是201,并且消息是“tr.im URL已经创build[yacitus]”。

我不能正确地进行基本的HTTP身份validation(在任何尝试中)。 你能发现我的问题吗? 也许我应该看看通过电线发送了什么? 我从来没有这样做过。 有我可以使用的Python API(也许在pdb中)? 还是有另一个工具(最好用于Mac OS X)我可以使用?

这似乎工作得很好(从另一个线程采取)

 import urllib2, base64 request = urllib2.Request("http://api.foursquare.com/v1/user") base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '') request.add_header("Authorization", "Basic %s" % base64string) result = urllib2.urlopen(request) 

真是便宜的解决scheme:

 urllib.urlopen('http://user:xxxx@api.tr.im/api') 

(您可能会决定不适合多种原因,比如url的安全性)

Github API示例 :

 >>> import urllib, json >>> result = urllib.urlopen('https://personal-access-token:x-oauth-basic@api.github.com/repos/:owner/:repo') >>> r = json.load(result.fp) >>> result.close() 

看看这个SOpost的答案 ,也看看这个基本的身份validation教程从urllib2缺less手册 。

为了使urllib2基本authentication工作,http响应必须包含HTTP代码401 Unauthorized 一个"WWW-Authenticate"键值为"Basic"否则Python将不会发送您的login信息,您将需要使用请求 ,或urllib.urlopen(url)与您在URL中login,或者像在@ Flowpoke的 答案中添加标题。

您可以通过将urlopen放在try块中来查看错误:

 try: urllib2.urlopen(urllib2.Request(url)) except urllib2.HTTPError, e: print e.headers print e.headers.has_key('WWW-Authenticate') 

推荐的方法是使用requests模块 :

 #!/usr/bin/env python import requests # $ python -m pip install requests ####from pip._vendor import requests # bundled with python url = 'https://httpbin.org/hidden-basic-auth/user/passwd' user, password = 'user', 'passwd' r = requests.get(url, auth=(user, password)) # send auth unconditionally r.raise_for_status() # raise an exception if the authentication fails 

这是一个单一来源的Python 2/3兼容urllib2的变种:

 #!/usr/bin/env python import base64 try: from urllib.request import Request, urlopen except ImportError: # Python 2 from urllib2 import Request, urlopen credentials = '{user}:{password}'.format(**vars()).encode() urlopen(Request(url, headers={'Authorization': # send auth unconditionally b'Basic ' + base64.b64encode(credentials)})).close() 

Python 3.5+引入了HTTPPasswordMgrWithPriorAuth() ,它允许:

..为了消除不必要的401响应处理,或无条件发送第一个请求的凭据,以便与返回404响应的服务器进行通信,而不是如果未发送Authorization头,则返回401。

 #!/usr/bin/env python3 import urllib.request as urllib2 password_manager = urllib2.HTTPPasswordMgrWithPriorAuth() password_manager.add_password(None, url, user, password, is_authenticated=True) # to handle 404 variant auth_manager = urllib2.HTTPBasicAuthHandler(password_manager) opener = urllib2.build_opener(auth_manager) opener.open(url).close() 

在这种情况下,如果需要,用ProxyBasicAuthHandler()replaceHTTPBasicAuthHandler()是很容易的。

与Python urllib2基本身份validation问题相同的解决scheme适用。

请参阅https://stackoverflow.com/a/24048852/1733117 ; 您可以urllib2.HTTPBasicAuthHandler以将Authorization标头添加到每个匹配已知url的请求。

 class PreemptiveBasicAuthHandler(urllib2.HTTPBasicAuthHandler): '''Preemptive basic auth. Instead of waiting for a 403 to then retry with the credentials, send the credentials if the url is handled by the password manager. Note: please use realm=None when calling add_password.''' def http_request(self, req): url = req.get_full_url() realm = None # this is very similar to the code from retry_http_basic_auth() # but returns a request object. user, pw = self.passwd.find_user_password(realm, url) if pw: raw = "%s:%s" % (user, pw) auth = 'Basic %s' % base64.b64encode(raw).strip() req.add_unredirected_header(self.auth_header, auth) return req https_request = http_request 

我build议目前的解决scheme是使用我的包urllib2_prior_auth ,这很好地解决了这个问题(我的工作包括到标准库。

尝试python-request或python-grab