任何在Python 2.6中使用unicode_literals的问题？

我们已经在Python 2.6下运行了我们的代码库。为了准备Python 3.0，我们开始添加：

从__future__导入unicode_literals

到我们的.py文件（当我们修改它们时）。我想知道是否有其他人一直这样做，并遇到任何非明显的陷阱（可能花了很多时间debugging后）。

我用unicodestring处理的问题的主要来源是将utf-8编码的string与unicodestring混合在一起。

例如，请考虑以下脚本。

two.py

 # encoding: utf-8 name = 'helló wörld from two'

one.py

 # encoding: utf-8 from __future__ import unicode_literals import two name = 'helló wörld from one' print name + two.name

运行python one.py的输出是：

 Traceback (most recent call last): File "one.py", line 5, in <module> print name + two.name UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

在这个例子中， two.name是一个utf-8编码的string（不是unicode），因为它没有导入unicode_literals ，而one.name是一个unicodestring。当你混合使用时，python试图解码编码的string（假设它是ascii）并将其转换为unicode并失败。这将工作，如果你print name + two.name.decode('utf-8') 。

如果您编码一个string并尝试稍后混合它们，也会发生同样的情况。例如，这工作：

 # encoding: utf-8 html = '<html><body>helló wörld</body></html>' if isinstance(html, unicode): html = html.encode('utf-8') print 'DEBUG: %s' % html

输出：

 DEBUG: <html><body>helló wörld</body></html>

但添加import unicode_literals它不：

 # encoding: utf-8 from __future__ import unicode_literals html = '<html><body>helló wörld</body></html>' if isinstance(html, unicode): html = html.encode('utf-8') print 'DEBUG: %s' % html

输出：

 Traceback (most recent call last): File "test.py", line 6, in <module> print 'DEBUG: %s' % html UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

它失败，因为'DEBUG: %s'是一个Unicodestring，因此python试图解码html 。一些修复打印的方法是执行print str('DEBUG: %s') % html或print 'DEBUG: %s' % html.decode('utf-8') 。

我希望这可以帮助您了解使用Unicodestring时的潜在问题。

同样在2.6（在Python 2.6.5 RC1 +之前）unicode字面值与关键字参数不一致（问题4978 ）：

例如下面的代码没有使用unicode_literals，但使用TypeError失败：如果使用unicode_literals，则keywords must be string 。

  >>> def foo(a=None): pass ... >>> foo(**{'a':1}) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: foo() keywords must be strings

我确实发现，如果你添加unicode_literals指令，你还应该添加如下内容：

  # -*- coding: utf-8

到第一行或第二行.py文件。否则行如：

  foo = "barré"

导致一个错误，如：

 SyntaxError：第198行文件mumble.py中的非ASCII字符“\ xc3”
 但没有声明编码; 请参阅http://www.python.org/peps/pep-0263.html 
 了解详情

还要考虑到unicode_literal会影响eval()而不是repr() （这是一个不对称的行为，这是一个bug），即eval(repr(b'\xa4'))不会等于b'\xa4' （就像Python 3一样）。

理想情况下，对于unicode_literals和Python { unicode_literals }的所有组合，下面的代码应该是一个不变的，

 from __future__ import unicode_literals bstr = b'\xa4' assert eval(repr(bstr)) == bstr # fails in Python 2.7, holds in 3.1+ ustr = '\xa4' assert eval(repr(ustr)) == ustr # holds in Python 2.7 and 3.1+

第二个断言正常工作，因为repr('\xa4')在Python 2.7中评估为u'\xa4' 。

还有更多。

有库和内build的string不能容忍unicode。

两个例子：

内置：

 myenum = type('Enum', (), enum)

（稍微esotic）不能使用unicode_literals：type（）需要一个string。

图书馆：

 from wx.lib.pubsub import pub pub.sendMessage("LOG MESSAGE", msg="no go for unicode literals")

不起作用：wx pubsub库需要一个string消息types。

前者是深奥的，容易修复

 myenum = type(b'Enum', (), enum)

但是后者是破坏性的，如果你的代码充满了对pub.sendMessage（）（我的是）的调用。

呃，呃？！？

任何在Python 2.6中使用unicode_literals的问题？

如何使Tornado中的SQLAlchemy成为asynchronous？

以大写字母拆分string

列出给定类的层次结构中的所有基类？

模糊string比较

真实世界的错字统计？

获取Selenium中Javascript代码的返回值

Python：如何将一个迭代的内容添加到一个集合？

TypeError：不是在string格式化python过程中转换的所有参数

网站匹配查询不存在

如何在Python中将UTF-8编码的文本打印到控制台<3？