Notessh2a

Basic Data Types

Go has three basic data types: boolean, numeric and string. Each data type has a zero value that is automatically assigned when a variable is declared without an initial value.

Boolean

KeywordValues
booltrue or false

Zero value: false.

Numeric

KeywordSizeValues
uint8/byte8-bit0 to 255
uint1616-bit0 to 65535
uint3232-bit0 to 4294967295
uint6464-bit0 to 18446744073709551615
int88-bit-128 to 127
int1616-bit-32768 to 32767
int32/rune32-bit-2147483648 to 2147483647
int6464-bit-9223372036854775808 to 9223372036854775807
float3232-bit-3.4e+38 to 3.4e+38
float6464-bit-1.7e+308 to +1.7e+308
uint32 bit / 64 bituint32 / uint64
int32 bit / 64 bitint32 / int64

Zero value: 0.

String

KeywordValues
string"anything surrounded by double quotes" or `backticks`

Zero value: "".

Extra:

    • A string written with double quotes is an interpreted string literal. The compiler processes escape sequences such as \n, \t, \uXXXX, and \xNN, converting them into their corresponding byte values. It cannot span multiple lines directly in source code.
    • A string written with backticks is a raw string literal. Its content is preserved exactly as written, with no escape processing. Backslashes have no special meaning, and sequences like \n remain literal text. It may span multiple lines and retains all whitespace.
  • Technically, a string is a read-only slice of bytes:

    str := "abcd" // [97 98 99 100]
    
    fmt.Printf("str[0]: %v, type: %T\n", str[0], str[0]) // str[0]: 97, type: uint8
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 97
    // 1 98
    // 2 99
    // 3 100

    but there are many characters that cannot be represented by only a single byte:

    str := "é"
    fmt.Printf("str: %v, len: %v, bytes: %v", str, len(str), []byte(str)) // str: é, len: 2, bytes: [195 169]

    len() returns the byte count, not the character count.

    In Go, strings are encoded using UTF-8. In other words, a string represents a sequence of characters encoded as UTF-8 bytes.

    UTF-8 supports a wide range of characters and symbols, and it uses a variable-length encoding scheme. This means a character can be represented by one or more bytes (up to 4), depending on its Unicode code point.

    First code pointLast code pointByte 1Byte 2Byte 3Byte 4
    01270yyyzzzz
    1282,047110xxxyy10yyzzzz
    2,04865,5351110wwww10xxxxyy10yyzzzz
    65,5361,114,11111110uvv10vvwwww10xxxxyy10yyzzzz

    Explanation:

    For example, the character "é" has a Unicode code point of U+00E9 (233 in decimal). Since 233 is in the range 128 to 2047, it is represented in UTF-8 using two bytes:

    str := "é" // [195 169]
    
    fmt.Printf("str[0]: %v\n", str[0]) // str[0]: 195
    fmt.Printf("str[1]: %v\n", str[1]) // str[1]: 169

    Proof (decoding):

    195 is 11000011 in binary, and 169 is 10101001.

    Matching the bytes using the table above:

    • First byte: 11000011 (110xxxyy) -> 110 + 00011.
    • Second byte: 10101001 (10yyzzzz) -> 10 + 101001.

    Extracted result: 00011 + 101001 = 0001101001 (binary) = 233 (decimal).

    Extra:

    When you iterate over a string using range, Go automatically decodes each UTF-8 character and returns its Unicode code point:

    str := "Héllo" // [72 195 169 108 108 111]
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 72
    // 1 233
    // 3 108
    // 4 108
    // 5 111

    If you do not care about memory (always uses 4 bytes per character) and want more flexibility (manipulating characters), just convert the string to a slice of runes:

    str := []rune("Héllo") // [72 233 108 108 111]
    
    fmt.Printf("str[1]: %v, type: %T\n", str[1], str[1]) // str[1]: 233, type: int32
    fmt.Println(unsafe.Sizeof(str[0])) // 4 (total str size: 4x5=20 bytes)
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 72
    // 1 233
    // 2 108
    // 3 108
    // 4 111
    
    str[1] = 'e'
    fmt.Println(string(str)) // Hello

On this page