From Novice to Pro with Go Benchmarking

By fine-tuning memory allocation and employing parallel benchmarking to monitor CPU usage, developers can glean useful insights about code performance and spot areas that could use some work

Nov 30, 2023

Remember that last article, ‘Take Golang Testing Beyond the Basics,’ where we looked at Golang’s testing package? Well, now it’s time to focus on something just as critical: benchmarking with the testing package.

Go Testing? All You Need to Know Here

Phuong Le

October 19, 2023

Read full story

In basic terms, benchmarking lets you measure how well your software performs under different situations and with different types of data.

1. Benchmark Rules

Indeed, one of the first things we might be curious about is how to create a benchmark.

Similar to writing tests, make a new file and give it a name that ends with _test, say, type_test.go.
Define a function with the prefix “Benchmark”, like BenchmarkReturnPointer, and provide an argument of *testing.B.
Finally, write the code we wish to benchmark inside a for loop with b.N to ensure reliable results.

Before diving in, let’s set the stage.

We aim to benchmark and compare the efficiency of returning a pointer versus returning a struct from a function. To make that happen, we first need to define our struct and related utilities.

// just a struct with 3 string fields
type Hero struct {
  Name        string
  Description string
  Class       string
}

func NewHero() Hero {
  return Hero{
    Name:        "Hero",
    Description: "Hero description",
    Class:       "Hero class",
  }
}

//go:noinline
func ReturnHero() Hero {
  h := NewHero()
  return h
}

//go:noinline
func ReturnHeroPtr() *Hero {
  h := NewHero()
  return &h
}

For this example, the code is pretty straightforward.

But be cautious, the compiler might optimize things to a point where no memory allocation actually happens.

And to avoid that, we’ll use the //go:noinline directive to keep the compiler from messing with our functions.

Here’s a straightforward way to run the benchmarks:

func BenchmarkReturnStruct(b *testing.B) {
  for i := 0; i < b.N; i++ {
    _ = ReturnHero()
  }
}

func BenchmarkReturnPointer(b *testing.B) {
  for i := 0; i < b.N; i++ {
    _ = ReturnHeroPtr()
  }
}

As for running these benchmarks, you’ve got options: you could either click the built-in ‘debug/run benchmark’ button in your IDE, or go old-school and use the command line.

My go-to method is the command line:

go test -bench=. ./medium/type_test.go

Alright, up next, we’ll dive into the detail of what these benchmark results actually mean, but now that you’ve got the hang of setting up benchmarks.

2. Benchmark Results

After running our benchmarks and gathering the data, let’s break down the results for a better understanding:

goos: darwin
goarch: arm64
BenchmarkReturnStruct-8         260851584                4.500 ns/op
BenchmarkReturnPointer-8        45171722                26.43 ns/op
PASS
ok      command-line-arguments  3.218s

So here is my simple explanation of what these metrics mean:

goos: darwin and goarch: arm64: These tell us the operating system and the computer architecture.
BenchmarkXxx-8: Here, the benchmark was executed with GOMAXPROCS set to 8, it’s essentially telling us that Go utilized 8 logical CPUs for any tasks that could be done in parallel.
260851584: This is the count of how many times the benchmark loop was executed.
4.5 ns/op: On average, each operation in the benchmark for returning a struct took about 4.5 nanoseconds.
ok command-line-arguments 3.218s: The entire run, including tests and benchmarks, took 3.218 seconds to complete.

3. Collect Allocations

To gain more insight into memory allocation during benchmarking, You can either use b.ReportAllocs() within your benchmark code or tack on the -benchmem flag when running from the command line.

Here's how you'd include b.ReportAllocs() in the benchmarks::

func BenchmarkReturnStruct(b *testing.B) {
  b.ReportAllocs()
  for i := 0; i < b.N; i++ {
    _ = ReturnHero()
  }
}

func BenchmarkReturnPointer(b *testing.B) {
  b.ReportAllocs()
  for i := 0; i < b.N; i++ {
    _ = ReturnHeroPtr()
  }
}

Let’s run this command:

go test -bench=. -benchmem ./medium/type_test.go

“Should I use both b.ReportAllocs and -benchmem?”

No need for overkill, either one does the job of reporting memory allocation.

If you opt for b.ReportAllocs() or -benchmem, you'll see two additional columns in the output: AllocedBytesPerOp and AllocsPerOp.

goos: darwin
goarch: arm64
BenchmarkReturnStruct-8         273821572                4.210 ns/op           0 B/op          0 allocs/op
BenchmarkReturnPointer-8        51452276                23.31 ns/op           48 B/op          1 allocs/op
PASS
ok      command-line-arguments  2.931s

Breaking down the BenchmarkReturnPointer-8 results, here's what you get:

MemAllocsPerOp = 48 B/op: Indicates the bytes allocated for each operation.
AllocatePerOp = 1: Tells us there was one memory allocation per operation during the benchmark.

4. Table Driven Benchmark, Sub-benchmark, Parallel

Just like tests, benchmarks can have sub-benchmarks and run in parallel.

However, the way you code it is a bit different, for demonstration, let’s use the recursive Fibonacci algorithm to show how to employ sub-benchmarks and parallel execution.

func fibRecursive(n int) int {
  if n < 2 {
    return n
  }

  return fibRecursive(n-1) + fibRecursive(n-2)
}

func BenchmarkFibonacciRecursive(b *testing.B) {
  testCases := []struct { name string; n int }{
    {name: "fib(10)", n: 10},
    {name: "fib(20)", n: 20},
    {name: "fib(30)", n: 30},
    {name: "fib(40)", n: 40},
  }
  
  for _, tc := range testCases {
  // tc := tc
    b.Run(tc.name, func(b *testing.B) {
      b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
          _ := fibRecursive(tc.n)
        }
      })
    })
  }
}

The RunParallel function behaves a bit differently in benchmarks compared to the Parallel function in tests, which I covered in a past article.

If you’re not familiar with the terms “sub-test”, “sub-benchmark” and “parallel” You can refer back to my previous article.

Go Testing? All You Need to Know Here

Phuong Le

October 19, 2023

Read full story

Also, in RunParallel, you'll notice that b.N is replaced with pb.Next() right? It signals that the loop should continue and is managed by the testing package.

In the given example, the set of test cases will execute one after the other, but the benchmark for each will run in parallel, so this means each benchmark is executed multiple times, not just once.

If you’re interested in learning about how benchmarking works, you can check out this resource: How Go Benchmark Actually Works.

Devtrovert

Go Testing? All You Need to Know Here

Go Testing? All You Need to Know Here

Discussion about this post