Optimize Your Struct with Alignment Trick

I get it, at first this seems a little boring. But to show why these details actually matter for performance, let's look at the below example.

Aug 25, 2023

In this talk, I’m going to walk through how struct memory layout works and cover concepts like alignment, offsets, and padding.

“Ugh, why do we need to learn about these boring struct details?”

I get it, at first this seems a little boring. But to show why these details actually matter for performance, let’s look at the below example.

We’ve got two structs here containing the same fields, just ordered differently:

type StructA struct { // 32 bytes
  A byte
  B int32
  C byte
  D int64
  E byte
}

type StructB struct { // 16 bytes
  B int32
  A byte
  C byte
  E byte
  D int64
}

Okay, get this, even though both structs have the exact same fields, structA takes up 32 bytes. But StructB only uses half that space which is 16 bytes.

Let’s say you have 50 structA instances. That'll eat up 1600 bytes (1.6 kB) of memory. Not too bad, right? But get this - those same 50 StructB instances would only use 800 bytes (0.8 kB).

I know 50 doesn’t sound massive, but imagine this at a larger scale, with bigger structs and all. That difference can really start to impact your memory usage and performance in a hidden way.

What is alignment?

When experts mention alignment, they really mean the “required alignment” or “alignment guarantee” for a type.

What does that mean? Well, say a type has an alignment value of N bytes. The compiler will enforce a rule that any variables of that type must live at a memory address that’s a multiple of N.

If this does not make sense to you, let’s look at the int32 type which has an alignment of 4 bytes. This means int32 variables must be stored at memory addresses that are multiples of 4. So valid addresses would be 0, 4, 8, 12, and so forth.

Let’s see how this plays out in a struct:

type StructA struct {
  A byte
  B int32
  C byte
  D int64
  E int32
}

// 4
fmt.Println("align of int32", unsafe.Alignof(s.B))

// 4
fmt.Println("address offset 1:", unsafe.Offsetof(s.B))

// 24
fmt.Println("address offset 2:", unsafe.Offsetof(s.E))

Check out StructA above, we’ve got two int32 fields in there: B at offset 4 and E at offset 24. See how both those field addresses are nice multiples of 4? That’s because int32 needs that 4-byte alignment.

The compiler’s making sure any int32 fields get slotted into address spots that work for them.

“Is that why our struct layouts seem so weird?”

Yep, you got it! Each type has its own alignment rules, so structs have to juggle those to keep their fields happy. That’s why you see stuff like padding and extending alignment happening.

“But why do we even need alignment?”

Great question, modern CPUs are designed to access memory super efficiently when it’s aligned to their word size (either 32 or 64 bits). Aligning data makes access predictable and smooth since it fits nicely into those preset chunks.

You may have heard terms like “machine word,” “native word,” or just “word.” These refer to the size of data a processor can handle in a single operation, based on if it’s 32-bit or 64-bit architecture.

To illustrate — say you’re adding two 64-bit integers on a 32-bit processor. That processor would have to break it into multiple steps, handling the data in 32-bit chunks. It can’t process the full 64-bits in one shot..

Now that we’ve got the basic idea of word sizes and alignment, let’s dive into how alignment works specifically for structs in memory.

Struct Alignment

Let’s assume 64-bit architecture unless I say otherwise.

As we discussed earlier, each field in a struct has its own alignment requirements. The struct’s job in laying out memory is to satisfy the alignment needs of all its fields.

“But what if a struct field is another struct instead of a primitive type? How does that work?”

A struct’s alignment is determined by its field with the largest alignment requirement. For instance, if a struct contains an int64 field, it will need 8-byte alignment overall. That means any memory address used to store the struct must be a multiple of 8.

On the other hand, if the struct field with the biggest alignment is 4 bytes, like an int32, then the struct only needs 4-byte alignment.

How struct alignment look like?

To understand how alignment works, nothing beats looking at real examples. Let’s examine StructA from before example:

// 32 bytes
type StructA struct { 
    A byte   // 1-byte alignment
    B int32  // 4-byte alignment
    C byte   // 1-byte alignment
    D int64  // 8-byte alignment
    E byte   // 1-byte alignment
}

Here we inspect StructA which has a largest alignment which is the D field with 8-bytes alignment, so this struct itself should have 8-byte alignment.

Figure 1: StructA, Unoptimized Memory Layout

The struct A will consume 8*4 = 32 bytes.

From the initial section of this talk, we know that a byte has an alignment of 1 byte, int32 aligns to 4 bytes, and int64 aligns to 8 bytes. To explain the idea of StructA memory layout, it is easy to use alignment logic:

byte alignment begins at 0, 1, 2, 3,… (it can be anywhere)
int32 with a 4-byte alignment begins at addresses like 0, 4, 8,…
int64 with an 8-byte alignment starts at addresses such as 0, 8, 16,… In our example, int64 begins at address 16.

“Oh I see the problem now, there are a lot of empty holes in the layout of StructA”

You’re right, those gaps created by the alignment rules are called “padding”.

Padding and optimize padding

Padding’s added to make sure each struct field lines up properly in memory based on its needs, like we saw earlier.

But while it enables efficient access, padding can also waste space if the fields ain’t ordered well.

Now check out this optimized version of StructA:

type OptimizedStructA struct {
  D int64  // 8-byte alignment
  B int32  // 4-byte alignment
  A byte   // 1-byte alignment
  C byte   // 1-byte alignment
  E byte   // 1-byte alignment
}

Figure 2. OptimizedStructA Memory Layout

This guy only takes 16 bytes total, with just 1 wasted byte of padding, way better now.

We’ve looked at simple structs so far, but let’s dig into more complex scenarios — structs containing other structs, pointers, arrays, maps, and the like.

How Types Align

Each type in Go has its own alignment, as we’ve seen before, and this depends on the architecture of the processor:

bool, uint8, int8: align to 1 byte.
int16, uint16: align to 2-byte boundaries.
int32, uint32, float32: align to 4 bytes.
int64, uint64, float64: align to 8 bytes.
uint, int: these behave like uint32 and int32 on 32-bit architectures, and like uint64 and int64 on 64-bit architectures.
arrays: determined by the array’s element type.
structs: determined by the field with the largest alignment requirement.
Others (string, channel, pointers, etc.): Align to size of machine word.

What if we have a struct that contains an array with 5 elements of type uint32, a byte field, and a pointer to a byte like this::

type StructA struct {
    A [5]int32
    B byte
    BPtr *byte
}

Let us inspect each field:

A: an int32 takes 4 bytes, so 5 int32s will take up 20 bytes (5 * 4), the array will take alignment of its type.
The byte field B takes 1 byte.
The pointer BPtr takes up either 4 bytes (32-bit arch) or 8 bytes (64-bit arch) depending on the system architecture (currently we are accounting for 64-bit).

Given the fields, the total size of StructA will be either 28 bytes (32-bit) or 32 bytes (64-bit).

But because struct field alignments work like the largest field, the entire struct will align to the 8-byte pointer.

“How about removing the pointer? Wouldn’t the struct align to 4 bytes then?”

You got it, without that pointer, the array’s alignment of 4 bytes for int32 would become the largest alignment.

“Do I really need to change my code just to apply this optimization?”

No, you shouldn’t overhaul working code just for this micro-optimization, unless you know memory issues are being caused by your struct specifically.

This talk is more to give you a sense of things when first creating a struct. Don’t strictly apply every tip if it hurts readability.

It’s a balance between readable code and optimizations.

Devtrovert

Discussion about this post