Python：从urllib2.urlopen调用获取HTTP标头？

urllib2是否在urlopen调用时获取整个页面？

我只想读取HTTP响应头，而不会得到页面。它看起来像urllib2打开HTTP连接，然后得到实际的HTML页面…或者它只是开始用urlopen调用缓冲页面？

 import urllib2 myurl = 'http://www.kidsidebyside.org/2009/05/come-and-draw-the-circle-of-unity-with-us/' page = urllib2.urlopen(myurl) // open connection, get headers html = page.readlines() // stream page

使用response.info()方法获取标题。

从urllib2文档：

urllib2.urlopen（url [，data] [，timeout]）

…

这个函数用两个附加的方法返回一个类文件对象：

geturl（） – 返回检索资源的URL，通常用于确定是否遵循redirect

info（） – 以httplib.HTTPMessage实例的forms返回页面的元信息，例如头文件（请参阅HTTP Headers的快速参考）

因此，对于您的示例，请尝试逐步浏览response.info().headers的结果以查找所需内容。

注意使用httplib.HTTPMessage的主要注意事项在python issue 4773中有logging 。

发送HEAD请求而不是普通的GET请求怎么样？以下剪切（从类似的问题复制）完全是这样的。

 >>> import httplib >>> conn = httplib.HTTPConnection("www.google.com") >>> conn.request("HEAD", "/index.html") >>> res = conn.getresponse() >>> print res.status, res.reason 200 OK >>> print res.getheaders() [('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

实际上，似乎urllib2可以做一个HTTP HEAD请求。

上面的@reto链接的问题显示了如何让urllib2执行HEAD请求。

这是我的承诺：

 import urllib2 # Derive from Request class and override get_method to allow a HEAD request. class HeadRequest(urllib2.Request): def get_method(self): return "HEAD" myurl = 'http://bit.ly/doFeT' request = HeadRequest(myurl) try: response = urllib2.urlopen(request) response_headers = response.info() # This will just display all the dictionary key-value pairs. Replace this # line with something useful. response_headers.dict except urllib2.HTTPError, e: # Prints the HTTP Status code of the response but only if there was a # problem. print ("Error code: %s" % e.code)

如果你使用Wiresharknetworking协议模拟器来检查，你可以看到它实际上是发出一个HEAD请求，而不是一个GET。

这是来自上面代码的HTTP请求和响应，由Wireshark捕获：

HEAD / doFeT HTTP / 1.1
接受编码：身份
主持人：bit.ly
连线：closures
User-Agent：Python-urllib / 2.7

HTTP / 1.1 301已移动
服务器：nginx
date：2012年2月19日（星期日）13:20:56 GMT
Content-Type：text / html; 字符集= utf-8的
caching控制：私人; 最大年龄= 90
位置： http : //www.kidsidebyside.org/? p= 445
MIME版本：1.0
内容长度：127
连线：closures
Set-Cookie：_bit = 4f40f738-00153-02ed0-421cf10a; domain = .bit.ly; expires = Fri Aug 17 13:20:56 2012; path = /; 仅Http

但是，如另一个问题的其中一个注释中提到的，如果问题中的URL包含redirect，那么urllib2将对目标执行GET请求，而不是HEAD。这可能是一个主要的缺点，如果你真的只想做HEAD请求。

上面的请求涉及redirect。这是Wireshark捕获的目标请求：

GET / 2009/05 /来与我们/ HTTP / 1.1的团结圈
接受编码：身份
主持人：www.kidsidebyside.org
连线：closures
User-Agent：Python-urllib / 2.7

使用urllib2的另一种方法是使用Joe Gregorio的httplib2库：

 import httplib2 url = "http://bit.ly/doFeT" http_interface = httplib2.Http() try: response, content = http_interface.request(url, method="HEAD") print ("Response status: %d - %s" % (response.status, response.reason)) # This will just display all the dictionary key-value pairs. Replace this # line with something useful. response.__dict__ except httplib2.ServerNotFoundError, e: print (e.message)

这对于初始HTTP请求和redirect请求都使用HEAD请求到目标URL的优点。

这是第一个要求：

HEAD / doFeT HTTP / 1.1
主持人：bit.ly
accept-encoding：gzip，deflate
user-agent：Python-httplib2 / 0.7.2（gzip）

这是第二个要求，到目的地：

HEAD / 2009/05 /来绘制与我们/ HTTP / 1.1的统一圈
主持人：www.kidsidebyside.org
accept-encoding：gzip，deflate
user-agent：Python-httplib2 / 0.7.2（gzip）

urllib2.urlopen做了一个HTTP GET（或者如果你提供了一个数据参数，那么是POST），而不是HTTP HEAD（当然，如果它做了，你不能对页面主体进行读取或其他访问）。

一内胆：

 $ python -c "import urllib2; print urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1)).open(urllib2.Request('http://google.com'))"

 def _GetHtmlPage(self, addr): headers = { 'User-Agent' : self.userAgent, ' Cookie' : self.cookies} req = urllib2.Request(addr) response = urllib2.urlopen(req) print "ResponseInfo=" print response.info() resultsHtml = unicode(response.read(), self.encoding) return resultsHtml

Python：从urllib2.urlopen调用获取HTTP标头？

我如何将domain.com转发到www.domain.com在godaddy s3托pipe的网站？

什么'git远程添加上游'有助于实现？

在Amazon Route53中设置基于DNS的URL转发

git rebase致命的：需要一个单一的修订

response.sendRedirect（）和request.getRequestDispatcher（）之间有什么区别？forward（request，response）

java.lang.IllegalStateException：在提交响应之后无法（转发|发送redirect|创build会话）

C编程：转发variables参数列表

为什么两个程序有前向引用错误，而第三个程序没有？

何时使用std :: forward来转发参数？

如何使用“stream浪ssh”ssh代理转发？