emoji表情與unicode編碼互轉的實現(JS,JAVA,C#)

前幾天剛好有需求要把emoji對應的Unicode編碼轉換成文字,比如1f601對應的這個笑臉😁,但沒有找到C#的把1f601轉換成文字的方法,用Encoding.Unicode怎麼轉換都不對,最後直接復制emoji字符,Visual Studio裡面竟然直接顯示出來瞭,那就直接用字符吧,都不用轉換瞭,然後不瞭瞭之瞭。

今天搞Markdown編輯器,由於前面GFM的原因,又對編碼進行測試,沒查到什麼靠譜資料,到時找到很多emoji和Unicode對照表,https://apps.timwhitlock.info/emoji/tables/unicode拿一個笑臉https://apps.timwhitlock.info/unicode/inspect/hex/1F601開刀~

1.表情字符轉編碼

【C#】

Encoding.UTF32.GetBytes("😁") -> ["1", "f6", "1", "0"]

【js】

"😁".codePointAt(0).toString(16) -> 1f601

【java】

  byte[] bytes = "😀".getBytes("utf-32");
  System.out.println(getBytesCode(bytes));

 private static String getBytesCode(byte[] bytes) {
    String code = "";
    for (byte b : bytes) {
      code += "\\x" + Integer.toHexString(b & 0xff);
    }
    return code;
  }

UTF-32結果一致

【C#】

Encoding.UTF8.GetBytes("😁") -> ["f0", "9f", "98", "81"]

【js】

encodeURIComponent("😁") -> %F0%9F%98%81

UTF-8結果一致

2.編碼轉表情字符

【js】

String.fromCodePoint('0x1f601')  utf-32

【java】 

 String emojiName = "1f601"; //其實4個字節
  int emojiCode = Integer.valueOf(emojiName, 16);
  byte[] emojiBytes = int2bytes(emojiCode);
  String emojiChar = new String(emojiBytes, "utf-32");
  System.out.println(emojiChar);



  public static byte[] int2bytes(int num){
    byte[] result = new byte[4];
    result[0] = (byte)((num >>> 24) & 0xff);//說明一
    result[1] = (byte)((num >>> 16)& 0xff );
    result[2] = (byte)((num >>> 8) & 0xff );
    result[3] = (byte)((num >>> 0) & 0xff );
    return result;
  }

c# 漢字和Unicode編碼互相轉換實例

/// <summary>
/// <summary>
/// 字符串轉Unicode
/// </summary>
/// <param name="source">源字符串</param>
/// <returns>Unicode編碼後的字符串</returns>
public static string String2Unicode(string source)
{
 byte[] bytes = Encoding.Unicode.GetBytes(source);
 StringBuilder stringBuilder = new StringBuilder();
 for (int i = 0; i < bytes.Length; i += 2)
 {
 stringBuilder.AppendFormat("\\u{0}{1}", bytes[i + 1].ToString("x").PadLeft(2, '0'), bytes[i].ToString("x").PadLeft(2, '0'));
 }
 return stringBuilder.ToString();
}
 
/// <summary>
/// Unicode轉字符串
/// </summary>
/// <param name="source">經過Unicode編碼的字符串</param>
/// <returns>正常字符串</returns>
public static string Unicode2String(string source)
{
 return new Regex(@"\\u([0-9A-F]{4})", RegexOptions.IgnoreCase | RegexOptions.Compiled).Replace(
   source, x => string.Empty + Convert.ToChar(Convert.ToUInt16(x.Result("$1"), 16)));
}

參考地址:

https://www.jianshu.com/p/8a416537deb3

https://blog.csdn.net/a19881029/article/details/13511729

https://apps.timwhitlock.info/emoji/tables/unicode

到此這篇關於emoji表情與unicode編碼互轉的實現(JS,JAVA,C#)的文章就介紹到這瞭,更多相關emoji表情與unicode編碼互轉內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!

推薦閱讀: