使用MultipartPostHandler使用Python发布表单数据

问题:使用Python的urllib2发布数据时,所有的数据都是URL编码的,并以Content-Type:application / x-www-form-urlencoded的forms发送。 当上传文件时,Content-Type应该被设置为multipart / form-data,并且内容被MIME编码。 这个问题的讨论在这里: http : //code.activestate.com/recipes/146306/

为了解决这个限制,一些尖锐的编码器创build了一个名为MultipartPostHandler的库,它创build一个可以和urllib2一起使用的OpenerDirector,主要是通过multipart / form-data自动POST。 这个图书馆的副本在这里: http : //peerit.blogspot.com/2007/07/multipartposthandler-doesnt-work-for.html

我是Python新手,无法使这个库工作。 我基本上写了下面的代码。 当我在本地HTTP代理中捕获它时,我可以看到数据仍然是URL编码,而不是多部分MIME编码。 请帮我弄清楚我做错了什么或更好的方法来完成这件事。 谢谢 :-)

FROM_ADDR = 'my@email.com' try: data = open(file, 'rb').read() except: print "Error: could not open file %s for reading" % file print "Check permissions on the file or folder it resides in" sys.exit(1) # Build the POST request url = "http://somedomain.com/?action=analyze" post_data = {} post_data['analysisType'] = 'file' post_data['executable'] = data post_data['notification'] = 'email' post_data['email'] = FROM_ADDR # MIME encode the POST payload opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler) urllib2.install_opener(opener) request = urllib2.Request(url, post_data) request.set_proxy('127.0.0.1:8080', 'http') # For testing with Burp Proxy # Make the request and capture the response try: response = urllib2.urlopen(request) print response.geturl() except urllib2.URLError, e: print "File upload failed..." 

编辑1:感谢您的回应。 我知道的ActiveState的httplib解决scheme(我链接到上面)。 我宁愿抽象掉问题,并用最less量的代码继续使用urllib2。 任何想法为什么开瓶器没有被安装和使用?

看来,解决这个问题的最简单和最兼容的方法是使用“海报”模块。

 # test_client.py from poster.encode import multipart_encode from poster.streaminghttp import register_openers import urllib2 # Register the streaming http handlers with urllib2 register_openers() # Start the multipart/form-data encoding of the file "DSC0001.jpg" # "image1" is the name of the parameter, which is normally set # via the "name" parameter of the HTML <input> tag. # headers contains the necessary Content-Type and Content-Length # datagen is a generator object that yields the encoded parameters datagen, headers = multipart_encode({"image1": open("DSC0001.jpg")}) # Create the Request object request = urllib2.Request("http://localhost:5000/upload_image", datagen, headers) # Actually do the request, and get the response print urllib2.urlopen(request).read() 

这工作完美,我不必与httplib烂掉。 该模块在这里可用: http : //atlee.ca/software/poster/index.html

发现这个配方直接使用httplib发布multipart(不涉及外部库)

 import httplib import mimetypes def post_multipart(host, selector, fields, files): content_type, body = encode_multipart_formdata(fields, files) h = httplib.HTTP(host) h.putrequest('POST', selector) h.putheader('content-type', content_type) h.putheader('content-length', str(len(body))) h.endheaders() h.send(body) errcode, errmsg, headers = h.getreply() return h.file.read() def encode_multipart_formdata(fields, files): LIMIT = '----------lImIt_of_THE_fIle_eW_$' CRLF = '\r\n' L = [] for (key, value) in fields: L.append('--' + LIMIT) L.append('Content-Disposition: form-data; name="%s"' % key) L.append('') L.append(value) for (key, filename, value) in files: L.append('--' + LIMIT) L.append('Content-Disposition: form-data; name="%s"; filename="%s"' % (key, filename)) L.append('Content-Type: %s' % get_content_type(filename)) L.append('') L.append(value) L.append('--' + LIMIT + '--') L.append('') body = CRLF.join(L) content_type = 'multipart/form-data; boundary=%s' % LIMIT return content_type, body def get_content_type(filename): return mimetypes.guess_type(filename)[0] or 'application/octet-stream' 

只要使用python-requests ,它会设置适当的标题,并为你上传:

 import requests files = {"form_input_field_name": open("filename", "rb")} requests.post("http://httpbin.org/post", files=files) 

我遇到了同样的问题,我需要做一个多部分forms后,而不使用外部库。 我写了一篇关于我遇到的问题的整篇博文 。

我结束了使用http://code.activestate.com/recipes/146306/的修改版本。; 该URL中的代码实际上只是将该文件的内容附加为string,这可能会导致二进制文件出现问题。 这是我的工作代码。

 import mimetools import mimetypes import io import http import json form = MultiPartForm() form.add_field("form_field", "my awesome data") # Add a fake file form.add_file(key, os.path.basename(filepath), fileHandle=codecs.open("/path/to/my/file.zip", "rb")) # Build the request url = "http://www.example.com/endpoint" schema, netloc, url, params, query, fragments = urlparse.urlparse(url) try: form_buffer = form.get_binary().getvalue() http = httplib.HTTPConnection(netloc) http.connect() http.putrequest("POST", url) http.putheader('Content-type',form.get_content_type()) http.putheader('Content-length', str(len(form_buffer))) http.endheaders() http.send(form_buffer) except socket.error, e: raise SystemExit(1) r = http.getresponse() if r.status == 200: return json.loads(r.read()) else: print('Upload failed (%s): %s' % (r.status, r.reason)) class MultiPartForm(object): """Accumulate the data to be used when posting a form.""" def __init__(self): self.form_fields = [] self.files = [] self.boundary = mimetools.choose_boundary() return def get_content_type(self): return 'multipart/form-data; boundary=%s' % self.boundary def add_field(self, name, value): """Add a simple field to the form data.""" self.form_fields.append((name, value)) return def add_file(self, fieldname, filename, fileHandle, mimetype=None): """Add a file to be uploaded.""" body = fileHandle.read() if mimetype is None: mimetype = mimetypes.guess_type(filename)[0] or 'application/octet-stream' self.files.append((fieldname, filename, mimetype, body)) return def get_binary(self): """Return a binary buffer containing the form data, including attached files.""" part_boundary = '--' + self.boundary binary = io.BytesIO() needsCLRF = False # Add the form fields for name, value in self.form_fields: if needsCLRF: binary.write('\r\n') needsCLRF = True block = [part_boundary, 'Content-Disposition: form-data; name="%s"' % name, '', value ] binary.write('\r\n'.join(block)) # Add the files to upload for field_name, filename, content_type, body in self.files: if needsCLRF: binary.write('\r\n') needsCLRF = True block = [part_boundary, str('Content-Disposition: file; name="%s"; filename="%s"' % \ (field_name, filename)), 'Content-Type: %s' % content_type, '' ] binary.write('\r\n'.join(block)) binary.write('\r\n') binary.write(body) # add closing boundary marker, binary.write('\r\n--' + self.boundary + '--\r\n') return binary 

我恰好相同,2年6个月前我创build了这个项目

https://pypi.python.org/pypi/MultipartPostHandler2 ,它为UTF-8系统修复了MultipartPostHandler。 我也做了一些小的改进,欢迎您来testing:)

为了回答OP为什么原始代码不起作用的问题,传入的处理程序不是一个类的实例。 该线

 # MIME encode the POST payload opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler) 

应该读

 opener = urllib2.build_opener(MultipartPostHandler.MultipartPostHandler())