在Java中将UTF-8转换为ISO-8859-1 – 如何将其保存为单字节

我试图将UTF-8中的java编码的string转换为ISO-8859-1。例如，在string'âabcd''中，ISO-8859-1表示为E2。在UTF-8中，它表示为两个字节。 C3 A2我相信。当我做一个getbytes（编码），然后用ISO-8859-1编码中的字节创build一个新的string时，我得到了两个不同的字符。 ¢？。有没有其他的方式来做到这一点，以保持字符相同，即âabcd？

如果你处理的不是UTF-16字符编码，你不应该使用java.lang.String或char原语 – 你应该只使用byte[]数组或ByteBuffer对象。然后，您可以使用java.nio.charset.Charset在编码之间进行转换：

 Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-1"); ByteBuffer inputBuffer = ByteBuffer.wrap(new byte[]{(byte)0xC3, (byte)0xA2}); // decode UTF-8 CharBuffer data = utf8charset.decode(inputBuffer); // encode ISO-8559-1 ByteBuffer outputBuffer = iso88591charset.encode(data); byte[] outputData = outputBuffer.array();

 byte[] iso88591Data = theString.getBytes("ISO-8859-1");

会做的伎俩。从你的描述看来，你似乎试图“存储ISO-8859-1string”。 Java中的string对象总是以UTF-16隐式编码。没有办法改变这个编码。

你可以做什么“，虽然是获得构成其他编码的字节（使用.getBytes（）方法，如上所示）。

从一组使用UTF-8对string进行编码的字节开始，从该数据创build一个string，然后以不同的编码方式获取一些编码该string的字节：

  byte[] utf8bytes = { (byte)0xc3, (byte)0xa2, 0x61, 0x62, 0x63, 0x64 }; Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-1"); String string = new String ( utf8bytes, utf8charset ); System.out.println(string); // "When I do a getbytes(encoding) and " byte[] iso88591bytes = string.getBytes(iso88591charset); for ( byte b : iso88591bytes ) System.out.printf("%02x ", b); System.out.println(); // "then create a new string with the bytes in ISO-8859-1 encoding" String string2 = new String ( iso88591bytes, iso88591charset ); // "I get a two different chars" System.out.println(string2);

这会正确地输出string和iso88591字节：

 âabcd e2 61 62 63 64 âabcd

所以你的字节数组没有与正确的编码配对：

  String failString = new String ( utf8bytes, iso88591charset ); System.out.println(failString);

输出

 Ã¢abcd

（要么，或者你只是写了utf8字节到一个文件，并在别处读取它们作为iso88591）

这是我所需要的：

 public static byte[] encode(byte[] arr, String fromCharsetName) { return encode(arr, Charset.forName(fromCharsetName), Charset.forName("UTF-8")); } public static byte[] encode(byte[] arr, String fromCharsetName, String targetCharsetName) { return encode(arr, Charset.forName(fromCharsetName), Charset.forName(targetCharsetName)); } public static byte[] encode(byte[] arr, Charset sourceCharset, Charset targetCharset) { ByteBuffer inputBuffer = ByteBuffer.wrap( arr ); CharBuffer data = sourceCharset.decode(inputBuffer); ByteBuffer outputBuffer = targetCharset.encode(data); byte[] outputData = outputBuffer.array(); return outputData; }

如果你在string中有正确的编码，你不需要做更多的事情来得到另一种编码的字节。

 public static void main(String[] args) throws Exception { printBytes("â"); System.out.println( new String(new byte[] { (byte) 0xE2 }, "ISO-8859-1")); System.out.println( new String(new byte[] { (byte) 0xC3, (byte) 0xA2 }, "UTF-8")); } private static void printBytes(String str) { System.out.println("Bytes in " + str + " with ISO-8859-1"); for (byte b : str.getBytes(StandardCharsets.ISO_8859_1)) { System.out.printf("%3X", b); } System.out.println(); System.out.println("Bytes in " + str + " with UTF-8"); for (byte b : str.getBytes(StandardCharsets.UTF_8)) { System.out.printf("%3X", b); } System.out.println(); }

输出：

 Bytes in â with ISO-8859-1 E2 Bytes in â with UTF-8 C3 A2 â â

对于文件编码…

 public class FRomUtf8ToIso { static File input = new File("C:/Users/admin/Desktop/pippo.txt"); static File output = new File("C:/Users/admin/Desktop/ciccio.txt"); public static void main(String[] args) throws IOException { BufferedReader br = null; FileWriter fileWriter = new FileWriter(output); try { String sCurrentLine; br = new BufferedReader(new FileReader( input )); int i= 0; while ((sCurrentLine = br.readLine()) != null) { byte[] isoB = encode( sCurrentLine.getBytes() ); fileWriter.write(new String(isoB, Charset.forName("ISO-8859-15") ) ); fileWriter.write("\n"); System.out.println( i++ ); } } catch (IOException e) { e.printStackTrace(); } finally { try { fileWriter.flush(); fileWriter.close(); if (br != null)br.close(); } catch (IOException ex) { ex.printStackTrace(); } } } static byte[] encode(byte[] arr){ Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-15"); ByteBuffer inputBuffer = ByteBuffer.wrap( arr ); // decode UTF-8 CharBuffer data = utf8charset.decode(inputBuffer); // encode ISO-8559-1 ByteBuffer outputBuffer = iso88591charset.encode(data); byte[] outputData = outputBuffer.array(); return outputData; } }

除了Adam Rosenfield的回答之外，我想补充一点， ByteBuffer.array()返回缓冲区的底层字节数组，不一定是“修剪”到最后一个字符。需要额外的操作，比如这个答案中提到的操作; 尤其是：

 byte[] b = new byte[bb.remaining()] bb.get(b);

驱逐非ISO-8859-1字符，将被replace为'？' （在通过示例发送到ISO-8859-1 DB之前）：

utf8String = new String（utf8String.getBytes（），“ISO-8859-1”）;

在Java中将UTF-8转换为ISO-8859-1 – 如何将其保存为单字节

MySQL – 将UTF8表上的latin1字符转换为UTF8

如何在C＃中用UTF-8以外的代码页写出文本文件？

C＃将string从UTF-8转换为ISO-8859-1（Latin1）H

如何在Java中转换ISO-8859-1和UTF-8？

为表单提交Internet Explorer设置字符编码

UTF-8和ISO-8859-1有什么区别？

将utf8字符转换为iso-88591并返回到PHP