java 使用readLine() 亂碼的解決
使用readLine() 亂碼的解決
本人在公司開發程序遇到瞭讀取一行亂碼
eclipse 默認為utf-8
FileInputStream f4 = new FileInputStream(new File("F:\\bb.txt")); BufferedReader bufferedReader2 = new BufferedReader(new InputStreamReader(f4)); String readLine = bufferedReader2.readLine(); //會輸出亂碼
測試文件有兩個文本文件分別為,aa.txt (UTF-8編碼),bb.txt(GB2312編碼)兩個文件中的內容都為一個字符 中:
前提知識: utf-8中文占三個字節,GB2312中文占兩個字節
測試 代碼:
public class EncodeTest { @Test public void test1() throws Exception{ FileInputStream f1 = new FileInputStream(new File("F:\\aa.txt")); byte[] b1 = new byte[f1.available()]; f1.read(b1); for(byte b : b1){ System.out.println(b); } System.out.println(new String(b1)); System.out.println("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); FileInputStream f2 = new FileInputStream(new File("F:\\bb.txt")); byte[] b2 = new byte[f2.available()]; f2.read(b2); for(byte b : b2){ System.out.println(b); byte[] tb = new byte[]{b}; String lm = new String(tb); System.out.println("當前亂碼"+lm); byte[] lm_b = lm.getBytes(); System.out.println("-----------亂碼 start--------"); for(byte bn: lm_b){ System.out.println(bn); } System.out.println("-----------亂碼 end--------"); } System.out.println(new String(b2,"gb2312")); System.out.println("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); FileInputStream f3 = new FileInputStream(new File("F:\\bb.txt")); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(f3)); String readLine2 = bufferedReader.readLine(); byte[] b3 = readLine2.getBytes("UTF-8"); for(byte b : b3){ System.out.println(b); } System.out.println(new String(b3)); System.out.println("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); FileInputStream f4 = new FileInputStream(new File("F:\\bb.txt")); BufferedReader bufferedReader2 = new BufferedReader(new InputStreamReader(f4,"GB2312")); String readLine = bufferedReader2.readLine(); byte[] b4 =readLine.getBytes("UTF-8"); for(byte b : b4){ System.out.println(b); } System.out.println(new String(b4)); System.out.println("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"); } }
通過分析打印結果:
-28 #字節1
-72 #字節2
-83 #字節3
中 #utf-8 解碼後字符為:中,沒有出現亂碼
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-42 #字節1
當前亂碼� #將 -42 按照utf-8 解碼後的字符是亂碼,然後再將亂碼按照utf-8編碼得到的字節如下
———–亂碼 start——–
-17
-65
-67
———–亂碼 end——–
-48 #字節2
當前亂碼� ##將 -48 按照utf-8 解碼後的字符是亂碼,然後再將亂碼按照utf-8編碼得到的字節如下
———–亂碼 start——–
-17
-65
-67
———–亂碼 end——–
中 # 將 字節1: -42和字節2:-48 按照 gb2312 解碼 後為字符 中
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-17 # 這裡為readline()方法沒有設置使用eclipse默認編碼 默認使用utf-8 (讀取bb.txt)
-65
-67
-17
-65
-67
�� # 輸出的中文為亂碼
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-28 #這裡為readline()方法設置瞭編碼為GB2312 讀取一行文字為中 (讀取bb.txt)
-72
-83
中
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
小結一下
new BufferedReader(new InputStreamReader(f4)); 默認用utf-8去解碼字節,而bb.txt文件內容的字符是gb2312 所以該 中 字符在磁盤中占兩個字節,而utf-8編碼集中的中文占3個字節,而readline()的時候發現是兩個字節,當前使用的又是utf-8,所以java底層就將這兩個字節單獨使用utf-8進行瞭解碼, 每一個字節 使用utf-8編碼一次為 一個char字符,所以經過utf-8將兩個字節分別解碼後的最終數據為兩個亂碼字符,
讀者可以看上面的代碼和打印的信息,兩字符個亂碼編碼後的字節分別為-17 -65 -67(紅色),和上面單獨將一個字節用utf-8 接碼後的字符再按照utf-8編碼後得到的字節 -17 -65 -67(藍色)一樣,也就是說 當字節按照utf-8 解碼時在utf-8編碼集中找不到對應的正確的字符時就會默認作為� 輸出,而� 對應的utf-8 字節 -17 -65 -67。所以當找不到對應正確的編碼字符時都會按照 -17 -65 -67 對應的 字符 � 輸出。
常識: 當使用 new BufferedReader(new InputStreamReader(f4),”文本源的編碼”) 文本源的編碼一定要寫。這樣就不會有亂碼。
調用readLine的亂碼問題
readLine是一個很好用的方法,但是作為字符流的方法,確實會遇到各種關於編碼方面的問題。但是用字節流來處理數據,比如說一個文本文件,要作按行處理的話,又會顯得很不靈活。
下面提供的是readLine字符流指定編碼方式的方法
//定義一個File對象 File someFile = new File("somefile.txt"); //輸入流 FileInputStream fis = new FileInputStream(someFile); InputStreamReader isr = new InputStreamReader(fis,"UTF-8"); //指定以UTF-8編碼讀入 BufferedReader br = new BufferedReader(isr); //輸出流 FileOutputStream fos = new FileOutputStream(someFile + ".生成的文件.txt"); OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8"); //指定以UTF-8編碼輸出 while ((line = br.readLine()) != null) { //osw.write("write something"); osw.write(line); } //關閉IO流 br.close(); osw.close();
以上為個人經驗,希望能給大傢一個參考,也希望大傢多多支持WalkonNet。