Windows cmd编码更改导致Python崩溃

首先,我将Windows CMD编码更改为utf-8并运行Python解释器:

chcp 65001 python 

然后我尝试打印一个unicode sting里面,当我这样做Python以一种奇怪的方式崩溃(我只是在同一个窗口中得到一个cmd提示符)。

 >>> import sys >>> print u'ëèæîð'.encode(sys.stdin.encoding) 

任何想法,为什么发生,如何使其工作?

UPDsys.stdin.encoding返回'cp65001'

UPD2 :在我看来,这个问题可能与utf-8使用多字节字符集有关 (kcwu在这方面做了一个很好的说明)。 我试着用“windows-1250”来运行整个例子,并得到了“ëeaî?”。 Windows-1250使用单字符集,因此它适用于那些理解的字符。 但是我仍然不知道如何使'utf-8'在这里工作。

UPD3 :哦,我发现这是一个已知的Python错误 。 我猜会发生什么是Python将cmd编码复制为“cp65001”转换为“sys.stdin.encoding”,并尝试将其应用于所有input。 由于它不能理解“cp65001”,它会在任何包含非ASCII字符的input上崩溃。

以下是如何在不改变encodings\aliases.py情况下将cp65001别名为UTF-8:

 import codecs codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None) 

(恕我直言,不要关注cp65001http://bugs.python.org/issue6058#msg97731上的UTF-8不一样的问题,即使微软的编解码器有一些小问题也是一样的错误。);

下面是一些代码(为Tahoe-LAFS编写,tahoe-lafs.org),它使控制台输出工作, 不pipe chcp代码页,还读取Unicode命令行参数。 感谢Michael Kaplan提出这个解决scheme背后的想法。 如果stdout或stderr被redirect,它将输出UTF-8。 如果你想要一个字节顺序标记,你需要明确写出它。

[编辑:此版本使用WriteConsoleW而不是MSVC运行时库中的_O_U8TEXT标志,这是错误的。 相对于MS文档, WriteConsoleW也是buggy,但更less。]

 import sys if sys.platform == "win32": import codecs from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID original_stderr = sys.stderr # If any exception occurs in this code, we'll probably try to print it on stderr, # which makes for frustrating debugging if stderr is directed to our wrapper. # So be paranoid about catching errors and reporting them to original_stderr, # so that we can at least see them. def _complain(message): print >>original_stderr, message if isinstance(message, str) else repr(message) # Work around <http://bugs.python.org/issue6058>. codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None) # Make Unicode console output work independently of the current code page. # This also fixes <http://bugs.python.org/issue1602>. # Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx> # and TZOmegaTZIOY # <http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>. try: # <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx> # HANDLE WINAPI GetStdHandle(DWORD nStdHandle); # returns INVALID_HANDLE_VALUE, NULL, or a valid handle # # <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx> # DWORD WINAPI GetFileType(DWORD hFile); # # <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx> # BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode); GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32)) STD_OUTPUT_HANDLE = DWORD(-11) STD_ERROR_HANDLE = DWORD(-12) GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32)) FILE_TYPE_CHAR = 0x0002 FILE_TYPE_REMOTE = 0x8000 GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32)) INVALID_HANDLE_VALUE = DWORD(-1).value def not_a_console(handle): if handle == INVALID_HANDLE_VALUE or handle is None: return True return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR or GetConsoleMode(handle, byref(DWORD())) == 0) old_stdout_fileno = None old_stderr_fileno = None if hasattr(sys.stdout, 'fileno'): old_stdout_fileno = sys.stdout.fileno() if hasattr(sys.stderr, 'fileno'): old_stderr_fileno = sys.stderr.fileno() STDOUT_FILENO = 1 STDERR_FILENO = 2 real_stdout = (old_stdout_fileno == STDOUT_FILENO) real_stderr = (old_stderr_fileno == STDERR_FILENO) if real_stdout: hStdout = GetStdHandle(STD_OUTPUT_HANDLE) if not_a_console(hStdout): real_stdout = False if real_stderr: hStderr = GetStdHandle(STD_ERROR_HANDLE) if not_a_console(hStderr): real_stderr = False if real_stdout or real_stderr: # BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars, # LPDWORD lpCharsWritten, LPVOID lpReserved); WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32)) class UnicodeOutput: def __init__(self, hConsole, stream, fileno, name): self._hConsole = hConsole self._stream = stream self._fileno = fileno self.closed = False self.softspace = False self.mode = 'w' self.encoding = 'utf-8' self.name = name self.flush() def isatty(self): return False def close(self): # don't really close the handle, that would only cause problems self.closed = True def fileno(self): return self._fileno def flush(self): if self._hConsole is None: try: self._stream.flush() except Exception as e: _complain("%s.flush: %r from %r" % (self.name, e, self._stream)) raise def write(self, text): try: if self._hConsole is None: if isinstance(text, unicode): text = text.encode('utf-8') self._stream.write(text) else: if not isinstance(text, unicode): text = str(text).decode('utf-8') remaining = len(text) while remaining: n = DWORD(0) # There is a shorter-than-documented limitation on the # length of the string passed to WriteConsoleW (see # <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>. retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None) if retval == 0 or n.value == 0: raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value)) remaining -= n.value if not remaining: break text = text[n.value:] except Exception as e: _complain("%s.write: %r" % (self.name, e)) raise def writelines(self, lines): try: for line in lines: self.write(line) except Exception as e: _complain("%s.writelines: %r" % (self.name, e)) raise if real_stdout: sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>') else: sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>') if real_stderr: sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>') else: sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>') except Exception as e: _complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,)) # While we're at it, let's unmangle the command-line arguments: # This works around <http://bugs.python.org/issue2128>. GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32)) CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32)) argc = c_int(0) argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc)) argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)] if not hasattr(sys, 'frozen'): # If this is an executable produced by py2exe or bbfreeze, then it will # have been invoked directly. Otherwise, unicode_argv[0] is the Python # interpreter, so skip that. argv = argv[1:] # Also skip option arguments to the Python interpreter. while len(argv) > 0: arg = argv[0] if not arg.startswith(u"-") or arg == u"-": break argv = argv[1:] if arg == u'-m': # sys.argv[0] should really be the absolute path of the module source, # but never mind break if arg == u'-c': argv[0] = u'-c' break # if you like: sys.argv = argv 

最后,可以授予使用DejaVu Sans Mono的愿望,对于控制台来说,我认为这是一个很好的字体。

您可以在“需要在命令窗口中使用的字体的必要条件”中find有关字体要求以及如何为Windows控制台添加新字体的信息Microsoft KB

但基本上,在Vista(也可能是Win7):

  • HKEY_LOCAL_MACHINE_SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont ,将"0"设置为"DejaVu Sans Mono"
  • 对于HKEY_CURRENT_USER\Console下的每个子HKEY_CURRENT_USER\Console ,将"FaceName"设置为"DejaVu Sans Mono"

在XP上,检查线程“更改命令提示符字体?” 在LockerGnome论坛 。

设置PYTHONIOENCODING系统variables:

 > chcp 65001 > set PYTHONIOENCODING=utf-8 > python example.py Encoding is utf-8 

example.py来源很简单:

 import sys print "Encoding is", sys.stdin.encoding 

我也有这个恼人的问题,我讨厌不能在MS Windows中运行我的unicode-aware脚本,就像在linux中一样。 所以,我设法想出一个解决方法。

把这个脚本(比如说,在你的网站包或任何其他) uniconsole.py

 import sys, os if sys.platform == "win32": class UniStream(object): __slots__= ("fileno", "softspace",) def __init__(self, fileobject): self.fileno = fileobject.fileno() self.softspace = False def write(self, text): os.write(self.fileno, text.encode("utf_8") if isinstance(text, unicode) else text) sys.stdout = UniStream(sys.stdout) sys.stderr = UniStream(sys.stderr) 

这似乎解决了python的错误(或win32 unicode控制台错误,无论)。 然后我添加了所有相关的脚本:

 try: import uniconsole except ImportError: sys.exc_clear() # could be just pass, of course else: del uniconsole # reduce pollution, not needed anymore 

最后,我只需要在运行chcp 65001的控制台中运行我的脚本,并且字体是Lucida Console 。 (我如何希望DejaVu Sans Mono可以用来代替…但黑客注册并select它作为控制台字体恢复为位图字体。)

这是一个快捷的stdoutstderrreplace,也不处理任何raw_input相关的错误(显然,因为它根本不触及sys.stdin )。 顺便说一下,我在标准库的encodings\aliases.py文件中添加了cp65001别名。

这是因为cmd的“代码页”不同于系统的“mbcs”。 虽然你改变了“代码页”,python(实际上,Windows)仍然认为你的“mbcs”不会改变。

几点意见:你可能拼错了encodigencodig 。 这是我的例子。

 C:\>chcp 65001 Active code page: 65001 C:\>\python25\python ... >>> import sys >>> sys.stdin.encoding 'cp65001' >>> s=u'\u0065\u0066' >>> s u'ef' >>> s.encode(sys.stdin.encoding) Traceback (most recent call last): File "<stdin>", line 1, in <module> LookupError: unknown encoding: cp65001 >>> 

结论 – cp65001不是Python的已知编码。 尝试“UTF-16”或类似的东西。

你想Python编码为UTF-8?

 >>>print u'ëèæîð'.encode('utf-8') ëèæîð 

Python不会将cp65001识别为UTF-8。