Character encoding knowledge: how is the conversion between Unicode, UTF-8, ASCII, and GB2312 encoded?
Character coding is the cornerstone of computer technology. To master a computer, you must understand the knowledge of character encoding. Do not pay attention to the people may not care about this, but these nouns sometimes really make people confused, want to learn computer knowledge, understand it is also very important, I also learn slowly learned some knowledge in this respect.
1. ASCII code
Inside the computer, all the information is eventually represented as a binary string. Each binary bit (bit) has 0 and 1 states, so the eight binary bits can be combined into 256 States, called (byte). That is to say, a byte can be used to represent 256 different states, each corresponding to one symbol, i.e., 256 symbols, from 0000000 to
In the 60s of the last century, the United States developed a set of character encoding, and made a uniform stipulation on the relationship between English characters and binary digits. This is called ASCII code, has been used so far.
The ASCII code specifies a total of 128 characters, such as the space SPACE is 32 (decimal 32, binary means, and the uppercase letter A is 65 (binary. These 128 symbols, including 32 printed symbols that cannot be printed, take up only one byte of the latter 7 bits, and the first 1 are uniformly specified as 0. Here is a screenshot: you can go to this webpage for details: /code/ascii/all/
2, non ASCII encoding
It is enough to encode English with 128 symbols, but it is not enough to represent other languages and 128 symbols. For example, in French, with phonetic symbols above a letter, it will not be able to use ASCII code. As a result, some European countries decided to make new symbols using the highest bits of inactivity in bytes. Fo
