分享

转载>>ASCII、UTF8、Uncicode编码下的中英文字符大小

 goodwangLib 2018-01-26


  • ASCII不能保存中文

  • UTF8是变长编码。在对ASCII字符编码时,UTF更省空间,只占1个字节,与ASCII编码方式和长度相同;Unicode在对ASCII字符编码时,占用2个字节,且第2个字节补零。

  • UTF8在对中文编码时需要占用3个字节;Unicode对中文编码则只需要2个字节。

 

代码示例:

复制代码
 1 private static void ShowCode() {
 2     string[] strArray = { "b", "abcd", "", "甲乙丙丁" };
 3     byte[] buffer;
 4     string mode, back;
 5 
 6     foreach (string str in strArray) {
 7 
 8         for (int i = 0; i <= 2; i++) {
 9             if (i == 0) {
10                 buffer = Encoding.ASCII.GetBytes(str);
11                 back = Encoding.ASCII.GetString(buffer, 0, buffer.Length);
12                 mode = "ASCII";
13             } else if (i == 1) {
14                 buffer = Encoding.UTF8.GetBytes(str);
15                 back = Encoding.UTF8.GetString(buffer, 0, buffer.Length);
16                 mode = "UTF8";
17             } else {
18                 buffer = Encoding.Unicode.GetBytes(str);
19                 back = Encoding.Unicode.GetString(buffer, 0, buffer.Length);
20                 mode = "Unicode";
21             }
22 
23             Console.WriteLine("Mode: {0}, String: {1}, Buffer.Length: {2}",
24                 mode, str, buffer.Length);
25 
26             Console.WriteLine("Buffer:");
27             for (int j = 0; j <= buffer.Length - 1; j++) {
28                 Console.Write(buffer[j] + " ");
29             }
30 
31             Console.WriteLine("\nRetrived: {0}\n", back);
32         }
33     }
34 }
复制代码

运行结果:

复制代码
 1 Mode: ASCII, String: b, Buffer.Length: 1
 2 Buffer: 98
 3 Retrived: b
 4 
 5 Mode: UTF8, String: b, Buffer.Length: 1
 6 Buffer: 98
 7 Retrived: b
 8 
 9 Mode: Unicode, String: b, Buffer.Length: 2
10 Buffer: 98 0
11 Retrived: b
12 
13 Mode: ASCII, String: abcd, Buffer.Length: 4
14 Buffer: 97 98 99 100
15 Retrived: abcd
16 
17 Mode: UTF8, String: abcd, Buffer.Length: 4
18 Buffer: 97 98 99 100
19 Retrived: abcd
20 
21 Mode: Unicode, String: abcd, Buffer.Length: 8
22 Buffer: 97 0 98 0 99 0 100 0
23 Retrived: abcd
24 
25 Mode: ASCII, String: 乙, Buffer.Length: 1
26 Buffer: 63
27 Retrived: ?
28 
29 Mode: UTF8, String: 乙, Buffer.Length: 3
30 Buffer: 228 185 153
31 Retrived: 乙
32 
33 Mode: Unicode, String: 乙, Buffer.Length: 2
34 Buffer: 89 78
35 Retrived: 乙
36 
37 Mode: ASCII, String: 甲乙丙丁, Buffer.Length: 4
38 Buffer: 63 63 63 63
39 Retrived: ????
40 
41 Mode: UTF8, String: 甲乙丙丁, Buffer.Length: 12
42 Buffer: 231 148 178 228 185 153 228 184 153 228 184 129
43 Retrived: 甲乙丙丁
44 
45 Mode: Unicode, String: 甲乙丙丁, Buffer.Length: 8
46 Buffer: 50 117 89 78 25 78 1 78
47 Retrived: 甲乙丙丁
复制代码

得出结论:

1 ASCII不能保存中文(貌似谁都知道=_-`)。
2 UTF8是变长编码。在对ASCII字符编码时,UTF更省空间,只占1个字节,与ASCII编码方式和长度相同;Unicode在对ASCII字符编码时,占用2个字节,且第2个字节补零。
3 UTF8在对中文编码时需要占用3个字节;Unicode对中文编码则只需要2个字节。

 

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多