Converting Non-Unicode Text |
Thejava.io
package provides classes that allow you to convert between Unicode character streams and byte streams of non-Unicode text. With the InputStreamReaderclass, you can convert byte streams to character streams. You use the OutputStreamWriterclass to translate character streams into byte streams.(figure here -- PENDING)
When you create
InputStreamReader
andOutputStreamWriter
objects, you specify the byte encoding that you want to convert. For example, if you want to translate a text file in the UTF8 encoding into Unicode, you would create anInputStreamReader
as follows:If you omit the encoding identifier,FileInputStream fis = new FileInputStream("output.txt"); InputStreamReader isr = new InputStreamReader(fis, "UTF8");InputStreamReader
andOutputStreamWriter
will rely on the default encoding. Like the list of supported encodings, the default encoding may vary with the Java platform. On version 1.1 of the Java Development Kit, the default encoding is 8859_1 (ISO-Latin-1). This default is set in thefile.encoding
system property. You can determine which encoding anInputStreamReader
orOutputStreamWriter
will use by invoking thegetEncoding
method. In the following example, we invoke this method to determine that the default encoding on our platform is 8859_1:You specify anInputStreamReader defaultReader = new InputStreamReader(fis); System.out.println(defaultReader.getEncoding());InputStream
when creating anInputStreamReader
, and anOutputStream
when constructing anOutputStreamWriter
.InputStream
andOutputStream
are abstract superclasses of all input and output byte streams. This allows you to perform conversions on any of the byte streams that belong to their subclasses. For instance, with anInputStreamReader
you can convert non-Unicode text from aFileInputStream
orPipedInputStream
, because they are both subclasses ofInputStream
.In the example that follows, we'll show you how to perform character set conversions with the
InputStreamReader
andOutputStreamWriter
classes. The full source code for this example is in the file called StreamConverter.java. In this example, we convert a sequence of Unicode characters from aString
object into aFileOutputStream
of bytes encoded in UTF8. The method that performs the conversion is calledwriteOutput
:In another method,static void writeOutput(String str) { try { FileOutputStream fos = new FileOutputStream("output.txt"); Writer out = new OutputStreamWriter(fos, "UTF8"); out.write(str); out.close(); } catch (IOException e) { e.printStackTrace(); } }readInput
, we read the bytes encoded in UTF8 from the file created by thewriteOutput
method. We use anInputStreamReader
to convert the bytes from UTF8 into Unicode, and return the result in aString
. ThereadInput
method is as follows:In thestatic String readInput() { StringBuffer buffer = new StringBuffer(); try { FileInputStream fis = new FileInputStream("output.txt"); InputStreamReader isr = new InputStreamReader(fis, "UTF8"); Reader in = new BufferedReader(isr); int ch; while ((ch = in.read()) > -1) { buffer.append((char)ch); } in.close(); return buffer.toString(); } catch (IOException e) { e.printStackTrace(); return null; } }main
method of our example program, we invoke thewriteOutput
method to create a file of bytes encoded in UTF. Then we read the same file, converting the bytes back into Unicode. The source code for themain
method is:The original string (public static void main(String[] args) { String jaString = new String("\u65e5\u672c\u8a9e\u6587\u5b57\u5217"); writeOutput(jaString); String inputString = readInput(); String displayString = jaString + " " + inputString; new ShowString(displayString, "Conversion Demo"); }jaString
) should be identical to the newly created string (inputString
). To see if the two strings are the same, we concatenate them and display them with aShowString
object. TheShowString
class displays a string with theGraphics.drawString
method. The source code for this class is in the ShowString.java. file. When we instantiateShowString
in our sample program, the following window appears. The repetition of the characters displayed verifies that the two strings are identical.
Converting Non-Unicode Text |