Notessh2a

Basic Data Types

Go has three basic data types: boolean, numeric, and string. Each type has a zero value assigned when a variable is declared without initialization.

Boolean

KeywordValues
booltrue or false

Zero value: false.

Numeric

KeywordSizeValues
uint8/byte8-bit0 to 255
uint1616-bit0 to 65535
uint3232-bit0 to 4294967295
uint6464-bit0 to 18446744073709551615
int88-bit-128 to 127
int1616-bit-32768 to 32767
int32/rune32-bit-2147483648 to 2147483647
int6464-bit-9223372036854775808 to 9223372036854775807
float3232-bit-3.4e+38 to 3.4e+38
float6464-bit-1.7e+308 to +1.7e+308
uint32 bit / 64 bituint32 / uint64
int32 bit / 64 bitint32 / int64

Zero value: 0.

String

KeywordValues
string"text in double quotes" or `backticks`

Zero value: "".

Extra:

    • Double-quoted strings are interpreted string literals. Escape sequences like \n, \t, \uXXXX, and \xNN are processed into their byte values. They cannot span multiple lines in source code.
    • Backtick strings are raw string literals. Content is preserved exactly as written, with no escape processing. They can span multiple lines in source code.
  • Technically, a string is a read-only slice of bytes:

    str := "abcd" // [97 98 99 100]
    
    fmt.Printf("str[0]: %v, type: %T\n", str[0], str[0]) // str[0]: 97, type: uint8
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 97
    // 1 98
    // 2 99
    // 3 100

    However, not every character is a single byte. Some characters require more than one byte:

    str := "é"
    fmt.Printf("str: %v, len: %v, bytes: %v", str, len(str), []byte(str))
    // str: é, len: 2, bytes: [195 169]

    len() returns the byte count, not the character count.

    This is because Go strings are UTF-8 encoded, and UTF-8 uses variable-length encoding. A character can use 1 to 4 bytes depending on its Unicode code point:

    First code pointLast code pointByte 1Byte 2Byte 3Byte 4
    01270yyyzzzz
    1282,047110xxxyy10yyzzzz
    2,04865,5351110wwww10xxxxyy10yyzzzz
    65,5361,114,11111110uvv10vvwwww10xxxxyy10yyzzzz

    Explanation:

    The character é has Unicode code point U+00E9 (233). Since 233 falls in the range 128-2047, it uses two bytes in UTF-8:

    str := "é" // [195 169]
    
    fmt.Printf("str[0]: %v\n", str[0]) // str[0]: 195
    fmt.Printf("str[1]: %v\n", str[1]) // str[1]: 169

    Proof (decoding):

    195 is 11000011 and 169 is 10101001.

    Matching against the second row of the table above:

    • First byte: 11000011 (110xxxyy) -> prefix 110, data bits 00011.
    • Second byte: 10101001 (10yyzzzz)-> prefix 10, data bits 101001.

    Combining 00011 and 101001 gives 00011101001 in binary, which equals 233 in decimal.

    A string does not always hold UTF-8 encoded bytes. Strings can contain arbitrary bytes, but when created from string literals, those bytes are (almost always) UTF-8.

    // From raw bytes:
    var1 := string([]byte{0, 1, 2, 65, 195, 255, 195, 169, 0xff, 0xfd})
    fmt.Printf("var1: %v, len: %v, bytes: %v", var1, len(var1), []byte(var1))
    // var1: A���, len: 10, bytes: [0 1 2 65 195 255 195 169 255 253]
    
    // From byte-level escape sequences:
    var2 := "\xbd\x20\xb2\x3d\xbc\x48"
    fmt.Printf("var2: %v, len: %v, bytes: %v", var2, len(var2), []byte(var2))
    // var2: � �=�H, len: 6, bytes: [189 32 178 61 188 72]
    
    // From a regular string literal:
    var3 := "Hello"
    fmt.Printf("var3: %v, len: %v, bytes: %v", var3, len(var3), []byte(var3))
    // var3: Hello, len: 5, bytes: [72 101 108 108 111]

    For example, in var1 ([0 1 2 65 195 255 195 169 255 253]), some bytes form valid UTF-8 sequences and render as readable characters, such as A and é:

    • A has Unicode code point 65, which is in the 0 to 127 range. In UTF-8, this range uses one byte, so 65 maps directly to A.
    • é has Unicode code point 233, which is in the 128 to 2047 range. In UTF-8, this uses two bytes. In var1, these are 195 and 169, which together decode to 233.

    The remaining bytes do not form valid UTF-8 sequences and are shown as replacement characters ().

  • Iterating with range:

    When iterating with range, Go decodes each UTF-8 character and returns its Unicode code point (not the raw byte value):

    str := "Héllo" // [72 195 169 108 108 111]
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 72
    // 1 233
    // 3 108
    // 4 108
    // 5 111

On this page