Go has three basic data types: boolean, numeric and string. Each data type has a zero value that is automatically assigned when a variable is declared without an initial value.
len() returns the byte count, not the character count.
In Go, strings are encoded using UTF-8. In other words, a string represents a sequence of characters encoded as UTF-8 bytes.
UTF-8 supports a wide range of characters and symbols, and it uses a variable-length encoding scheme. This means a character can be represented by one or more bytes (up to 4), depending on its > Unicode code point.
First code point
Last code point
Byte 1
Byte 2
Byte 3
Byte 4
0
127
0yyyzzzz
128
2,047
110xxxyy
10yyzzzz
2,048
65,535
1110wwww
10xxxxyy
10yyzzzz
65,536
1,114,111
11110uvv
10vvwwww
10xxxxyy
10yyzzzz
Explanation:
For example, the character "é" has a Unicode code point of U+00E9 (233 in decimal). Since 233 is in the range 128 to 2047, it is represented in UTF-8 using two bytes:
When you iterate over a string using range, Go automatically decodes each UTF-8 character and returns its Unicode code point:
str := "Héllo" // [72 195 169 108 108 111]for i, v := range str { fmt.Println(i, v)}// 0 72// 1 233// 3 108// 4 108// 5 111
If you do not care about memory (always uses 4 bytes per character) and want more flexibility (manipulating characters), just convert the string to a slice of runes: