在Swift中使用Unicode代码点

如果您对蒙古语的细节不感兴趣,但只想快速回答在Swift中使用和转换Unicode值,则跳至接受答案的第一部分。


背景

我想为iOS应用程序使用传统蒙古文的Unicode文本。 更好的和长期的解决scheme是使用AAT智能字体来渲染这个复杂的脚本。 ( 这样的字体确实存在,但是他们的许可证不允许修改和非个人使用)。但是,由于我从来没有制作字体,更不用说所有的AAT字体的渲染逻辑,我只是​​打算自己做渲染斯威夫特现在。 也许在稍后的日子,我可以学会做一个聪明的字体。

外部我将使用Unicode文本,但在内部(用于在UITextView显示)我将Unicode转换为以哑巴字体(使用Unicode PUA值编码)存储的单个字形。 因此,我的渲染引擎需要将蒙古Unicode值(范围:U + 1820到U + 1842)转换为存储在PUA中的字形值(范围:U + E360到U + E5CF)。 无论如何,这是我的计划,因为这是我过去在Java中所做的 ,但也许我需要改变我的整个思维方式。

下面的图片显示su在蒙古语中用两种不同的forms写在字母u (红色)上。 (蒙古语是垂直书写的,英文中的草书字母连在一起。

在这里输入图像描述

在Unicode中,这两个string将被表示为

 var suForm1: String = "\u{1830}\u{1826}" var suForm2: String = "\u{1830}\u{1826}\u{180B}" 

suForm2的自由variablesselect器(U + 180B)被Swift String识别(正确)为与之前的u (U + 1826)相同的单位。 斯威夫特认为它是一个单一的字符,一个扩展的字形组合。 但是,为了自己进行渲染,我需要将u (U + 1826)和FVS1(U + 180B)区分为两个不同的UTF-16编码点。

为了内部显示的目的,我会把上面的Unicodestring转换成下面的渲染字形string:

 suForm1 = "\u{E46F}\u{E3BA}" suForm2 = "\u{E46F}\u{E3BB}" 

我一直在玩Swift StringCharacter 。 关于他们有很多方便的事情,但是因为在我的特殊情况下,我只处理UTF-16代码单元,不知道是否应该使用旧的NSString而不是Swift的String 。 我意识到我可以使用String.utf16获取UTF-16代码点,但是转换回String不是很好 。

坚持使用StringCharacter还是应该使用NSStringunichar

我读过的

  • string和字符文档
  • Swift中的string
  • NSString和Unicode

这个问题的更新已被隐藏,以便清理页面。 查看编辑历史。

更新了Swift 3

string和字符

对于将来访问这个问题的几乎所有人来说, StringCharacter将成为您的答案。

直接在代码中设置Unicode值:

 var str: String = "I want to visit 北京, Москва, मुंबई, القاهرة, and 서울시. 😊" var character: Character = "🌍" 

使用hex设置值

 var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大🍎π var character: Character = "\u{65}\u{301}" // é = "e" + accent mark 

请注意,Swift字符可以由多个Unicode代码点组成,但似乎是单个字符。 这被称为扩展的字形集群。

也看到这个问题 。

转换为Unicode值:

 str.utf8 str.utf16 str.unicodeScalars // UTF-32 String(character).utf8 String(character).utf16 String(character).unicodeScalars 

从Unicodehex值转换:

 let hexValue: UInt32 = 0x1F34E // convert hex value to UnicodeScalar guard let scalarValue = UnicodeScalar(hexValue) else { // early exit if hex does not form a valid unicode value return } // convert UnicodeScalar to String let myString = String(scalarValue) // 🍎 

或者:

 let hexValue: UInt32 = 0x1F34E if let scalarValue = UnicodeScalar(hexValue) { let myString = String(scalarValue) } 

再举几个例子

 let value0: UInt8 = 0x61 let value1: UInt16 = 0x5927 let value2: UInt32 = 0x1F34E let string0 = String(UnicodeScalar(value0)) // a let string1 = String(UnicodeScalar(value1)) // 大let string2 = String(UnicodeScalar(value2)) // 🍎 // convert hex array to String let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array var myString = "" for hexValue in myHexArray { myString.append(UnicodeScalar(hexValue)) } print(myString) // Cat‼🐱 

请注意,对于UTF-8和UTF-16,转换并不总是这么简单。 (请参阅UTF-8 , UTF-16和UTF-32问题。

NSString和unichar

在Swift中也可以使用NSStringunichar ,但是你应该认识到,除非你熟悉Objective C,并且善于将语法转换成Swift,否则很难find好的文档。

此外, unichar是一个UInt16数组,如上所述,从UInt16到Unicode标量值的转换并不总是很容易的(即在上层代码平面中转换诸如emoji和其他字符之类的代理对)。

自定义string结构

由于在问题中提到的原因,我最终没有使用任何上述方法。 相反,我写了自己的string结构,这基本上是一个UInt32的数组来保存Unicode标量值。

再次,这不是大多数人的解决scheme。 首先考虑使用扩展,如果您只需要稍微扩展StringCharacter的function。

但是如果你真的需要使用Unicode标量值,你可以编写一个自定义的结构。

优点是:

  • 在进行string操作时,不需要在Types( StringCharacterUnicodeScalarUInt32等)之间不断切换。
  • 在Unicode操作完成之后,最后转换为String很容易。
  • 在需要的时候容易添加更多的方法
  • 简化从Java或其他语言转换代码

缺点是:

  • 使其他Swift开发人员的代码更便于移植,可读性更低
  • 没有像原生Swifttypes那样进行testing和优化
  • 这是另一个文件,每次需要时都要包含在项目中

你可以自己做,但这里是我的参考。 最难的部分是使它可以哈希 。

 // This struct is an array of UInt32 to hold Unicode scalar values // Version 3.4.0 (Swift 3 update) struct ScalarString: Sequence, Hashable, CustomStringConvertible { fileprivate var scalarArray: [UInt32] = [] init() { // does anything need to go here? } init(_ character: UInt32) { self.scalarArray.append(character) } init(_ charArray: [UInt32]) { for c in charArray { self.scalarArray.append(c) } } init(_ string: String) { for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // Generator in order to conform to SequenceType protocol // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... }) func makeIterator() -> AnyIterator<UInt32> { return AnyIterator(scalarArray.makeIterator()) } // append mutating func append(_ scalar: UInt32) { self.scalarArray.append(scalar) } mutating func append(_ scalarString: ScalarString) { for scalar in scalarString { self.scalarArray.append(scalar) } } mutating func append(_ string: String) { for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // charAt func charAt(_ index: Int) -> UInt32 { return self.scalarArray[index] } // clear mutating func clear() { self.scalarArray.removeAll(keepingCapacity: true) } // contains func contains(_ character: UInt32) -> Bool { for scalar in self.scalarArray { if scalar == character { return true } } return false } // description (to implement Printable protocol) var description: String { return self.toString() } // endsWith func endsWith() -> UInt32? { return self.scalarArray.last } // indexOf // returns first index of scalar string match func indexOf(_ string: ScalarString) -> Int? { if scalarArray.count < string.length { return nil } for i in 0...(scalarArray.count - string.length) { for j in 0..<string.length { if string.charAt(j) != scalarArray[i + j] { break // substring mismatch } if j == string.length - 1 { return i } } } return nil } // insert mutating func insert(_ scalar: UInt32, atIndex index: Int) { self.scalarArray.insert(scalar, at: index) } mutating func insert(_ string: ScalarString, atIndex index: Int) { var newIndex = index for scalar in string { self.scalarArray.insert(scalar, at: newIndex) newIndex += 1 } } mutating func insert(_ string: String, atIndex index: Int) { var newIndex = index for scalar in string.unicodeScalars { self.scalarArray.insert(scalar.value, at: newIndex) newIndex += 1 } } // isEmpty var isEmpty: Bool { return self.scalarArray.count == 0 } // hashValue (to implement Hashable protocol) var hashValue: Int { // DJB Hash Function return self.scalarArray.reduce(5381) { ($0 << 5) &+ $0 &+ Int($1) } } // length var length: Int { return self.scalarArray.count } // remove character mutating func removeCharAt(_ index: Int) { self.scalarArray.remove(at: index) } func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar != character { returnString.append(scalar) } } return returnString } func removeRange(_ range: CountableRange<Int>) -> ScalarString? { if range.lowerBound < 0 || range.upperBound > scalarArray.count { return nil } var returnString = ScalarString() for i in 0..<scalarArray.count { if i < range.lowerBound || i >= range.upperBound { returnString.append(scalarArray[i]) } } return returnString } // replace func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar == character { returnString.append(replacementChar) } else { returnString.append(scalar) } } return returnString } func replace(_ character: UInt32, withString replacementString: String) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar == character { returnString.append(replacementString) } else { returnString.append(scalar) } } return returnString } func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString { var returnString = ScalarString() for i in 0..<scalarArray.count { if i < range.lowerBound || i >= range.upperBound { returnString.append(scalarArray[i]) } else if i == range.lowerBound { returnString.append(replacementString) } } return returnString } // set (an alternative to myScalarString = "some string") mutating func set(_ string: String) { self.scalarArray.removeAll(keepingCapacity: false) for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // split func split(atChar splitChar: UInt32) -> [ScalarString] { var partsArray: [ScalarString] = [] if self.scalarArray.count == 0 { return partsArray } var part: ScalarString = ScalarString() for scalar in self.scalarArray { if scalar == splitChar { partsArray.append(part) part = ScalarString() } else { part.append(scalar) } } partsArray.append(part) return partsArray } // startsWith func startsWith() -> UInt32? { return self.scalarArray.first } // substring func substring(_ startIndex: Int) -> ScalarString { // from startIndex to end of string var subArray: ScalarString = ScalarString() for i in startIndex..<self.length { subArray.append(self.scalarArray[i]) } return subArray } func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString { // (startIndex is inclusive, endIndex is exclusive) var subArray: ScalarString = ScalarString() for i in startIndex..<endIndex { subArray.append(self.scalarArray[i]) } return subArray } // toString func toString() -> String { var string: String = "" for scalar in self.scalarArray { if let validScalor = UnicodeScalar(scalar) { string.append(Character(validScalor)) } } return string } // trim // removes leading and trailing whitespace (space, tab, newline) func trim() -> ScalarString { //var returnString = ScalarString() let space: UInt32 = 0x00000020 let tab: UInt32 = 0x00000009 let newline: UInt32 = 0x0000000A var startIndex = self.scalarArray.count var endIndex = 0 // leading whitespace for i in 0..<self.scalarArray.count { if self.scalarArray[i] != space && self.scalarArray[i] != tab && self.scalarArray[i] != newline { startIndex = i break } } // trailing whitespace for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) { if self.scalarArray[i] != space && self.scalarArray[i] != tab && self.scalarArray[i] != newline { endIndex = i + 1 break } } if endIndex <= startIndex { return ScalarString() } return self.substring(startIndex, endIndex) } // values func values() -> [UInt32] { return self.scalarArray } } func ==(left: ScalarString, right: ScalarString) -> Bool { return left.scalarArray == right.scalarArray } func +(left: ScalarString, right: ScalarString) -> ScalarString { var returnString = ScalarString() for scalar in left.values() { returnString.append(scalar) } for scalar in right.values() { returnString.append(scalar) } return returnString } 
 //Swift 3.0 // This struct is an array of UInt32 to hold Unicode scalar values struct ScalarString: Sequence, Hashable, CustomStringConvertible { private var scalarArray: [UInt32] = [] init() { // does anything need to go here? } init(_ character: UInt32) { self.scalarArray.append(character) } init(_ charArray: [UInt32]) { for c in charArray { self.scalarArray.append(c) } } init(_ string: String) { for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // Generator in order to conform to SequenceType protocol // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... }) //func generate() -> AnyIterator<UInt32> { func makeIterator() -> AnyIterator<UInt32> { let nextIndex = 0 return AnyIterator { if (nextIndex > self.scalarArray.count-1) { return nil } return self.scalarArray[nextIndex + 1] } } // append mutating func append(scalar: UInt32) { self.scalarArray.append(scalar) } mutating func append(scalarString: ScalarString) { for scalar in scalarString { self.scalarArray.append(scalar) } } mutating func append(string: String) { for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // charAt func charAt(index: Int) -> UInt32 { return self.scalarArray[index] } // clear mutating func clear() { self.scalarArray.removeAll(keepingCapacity: true) } // contains func contains(character: UInt32) -> Bool { for scalar in self.scalarArray { if scalar == character { return true } } return false } // description (to implement Printable protocol) var description: String { var string: String = "" for scalar in scalarArray { string.append(String(describing: UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!) } return string } // endsWith func endsWith() -> UInt32? { return self.scalarArray.last } // insert mutating func insert(scalar: UInt32, atIndex index: Int) { self.scalarArray.insert(scalar, at: index) } // isEmpty var isEmpty: Bool { get { return self.scalarArray.count == 0 } } // hashValue (to implement Hashable protocol) var hashValue: Int { get { // DJB Hash Function var hash = 5381 for i in 0 ..< scalarArray.count { hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i]) } /* for i in 0..< self.scalarArray.count { hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i]) } */ return hash } } // length var length: Int { get { return self.scalarArray.count } } // remove character mutating func removeCharAt(index: Int) { self.scalarArray.remove(at: index) } func removingAllInstancesOfChar(character: UInt32) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar != character { returnString.append(scalar: scalar) //.append(scalar) } } return returnString } // replace func replace(character: UInt32, withChar replacementChar: UInt32) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar == character { returnString.append(scalar: replacementChar) //.append(replacementChar) } else { returnString.append(scalar: scalar) //.append(scalar) } } return returnString } // func replace(character: UInt32, withString replacementString: String) -> ScalarString { func replace(character: UInt32, withString replacementString: ScalarString) -> ScalarString { var returnString = ScalarString() for scalar in self.scalarArray { if scalar == character { returnString.append(scalarString: replacementString) //.append(replacementString) } else { returnString.append(scalar: scalar) //.append(scalar) } } return returnString } // set (an alternative to myScalarString = "some string") mutating func set(string: String) { self.scalarArray.removeAll(keepingCapacity: false) for s in string.unicodeScalars { self.scalarArray.append(s.value) } } // split func split(atChar splitChar: UInt32) -> [ScalarString] { var partsArray: [ScalarString] = [] var part: ScalarString = ScalarString() for scalar in self.scalarArray { if scalar == splitChar { partsArray.append(part) part = ScalarString() } else { part.append(scalar: scalar) //.append(scalar) } } partsArray.append(part) return partsArray } // startsWith func startsWith() -> UInt32? { return self.scalarArray.first } // substring func substring(startIndex: Int) -> ScalarString { // from startIndex to end of string var subArray: ScalarString = ScalarString() for i in startIndex ..< self.length { subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i]) } return subArray } func substring(startIndex: Int, _ endIndex: Int) -> ScalarString { // (startIndex is inclusive, endIndex is exclusive) var subArray: ScalarString = ScalarString() for i in startIndex ..< endIndex { subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i]) } return subArray } // toString func toString() -> String { let string: String = "" for scalar in self.scalarArray { string.appending(String(describing:UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!) } return string } // values func values() -> [UInt32] { return self.scalarArray } } func ==(left: ScalarString, right: ScalarString) -> Bool { if left.length != right.length { return false } for i in 0 ..< left.length { if left.charAt(index: i) != right.charAt(index: i) { return false } } return true } func +(left: ScalarString, right: ScalarString) -> ScalarString { var returnString = ScalarString() for scalar in left.values() { returnString.append(scalar: scalar) //.append(scalar) } for scalar in right.values() { returnString.append(scalar: scalar) //.append(scalar) } return returnString }