From Novice to Pro with Go Benchmarking
By fine-tuning memory allocation and employing parallel benchmarking to monitor CPU usage, developers can glean useful insights about code performance and spot areas that could use some work
Remember that last article, ‘Take Golang Testing Beyond the Basics,’ where we looked at Golang’s testing package? Well, now it’s time to focus on something just as critical: benchmarking with the testing package.
In basic terms, benchmarking lets you measure how well your software performs under different situations and with different types of data.
1. Benchmark Rules
Indeed, one of the first things we might be curious about is how to create a benchmark.
Similar to writing tests, make a new file and give it a name that ends with
_test
, say,type_test.go
.Define a function with the prefix “Benchmark”, like BenchmarkReturnPointer, and provide an argument of
*testing.B
.Finally, write the code we wish to benchmark inside a for loop with b.N to ensure reliable results.
Before diving in, let’s set the stage.
We aim to benchmark and compare the efficiency of returning a pointer versus returning a struct from a function. To make that happen, we first need to define our struct and related utilities.
// just a struct with 3 string fields
type Hero struct {
Name string
Description string
Class string
}
func NewHero() Hero {
return Hero{
Name: "Hero",
Description: "Hero description",
Class: "Hero class",
}
}
//go:noinline
func ReturnHero() Hero {
h := NewHero()
return h
}
//go:noinline
func ReturnHeroPtr() *Hero {
h := NewHero()
return &h
}
For this example, the code is pretty straightforward.
But be cautious, the compiler might optimize things to a point where no memory allocation actually happens.
And to avoid that, we’ll use the //go:noinline
directive to keep the compiler from messing with our functions.
Here’s a straightforward way to run the benchmarks:
func BenchmarkReturnStruct(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = ReturnHero()
}
}
func BenchmarkReturnPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = ReturnHeroPtr()
}
}
As for running these benchmarks, you’ve got options: you could either click the built-in ‘debug/run benchmark’ button in your IDE, or go old-school and use the command line.
My go-to method is the command line:
go test -bench=. ./medium/type_test.go
Alright, up next, we’ll dive into the detail of what these benchmark results actually mean, but now that you’ve got the hang of setting up benchmarks.
2. Benchmark Results
After running our benchmarks and gathering the data, let’s break down the results for a better understanding:
goos: darwin
goarch: arm64
BenchmarkReturnStruct-8 260851584 4.500 ns/op
BenchmarkReturnPointer-8 45171722 26.43 ns/op
PASS
ok command-line-arguments 3.218s
So here is my simple explanation of what these metrics mean:
goos: darwin and goarch: arm64: These tell us the operating system and the computer architecture.
BenchmarkXxx-8: Here, the benchmark was executed with
GOMAXPROCS
set to 8, it’s essentially telling us that Go utilized 8 logical CPUs for any tasks that could be done in parallel.260851584: This is the count of how many times the benchmark loop was executed.
4.5 ns/op: On average, each operation in the benchmark for returning a struct took about 4.5 nanoseconds.
ok command-line-arguments 3.218s: The entire run, including tests and benchmarks, took 3.218 seconds to complete.
3. Collect Allocations
To gain more insight into memory allocation during benchmarking, You can either use b.ReportAllocs()
within your benchmark code or tack on the -benchmem
flag when running from the command line.
Here's how you'd include b.ReportAllocs()
in the benchmarks::
func BenchmarkReturnStruct(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = ReturnHero()
}
}
func BenchmarkReturnPointer(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_ = ReturnHeroPtr()
}
}
Let’s run this command:
go test -bench=. -benchmem ./medium/type_test.go
“Should I use both b.ReportAllocs and -benchmem?”
No need for overkill, either one does the job of reporting memory allocation.
If you opt for b.ReportAllocs()
or -benchmem
, you'll see two additional columns in the output: AllocedBytesPerOp
and AllocsPerOp
.
goos: darwin
goarch: arm64
BenchmarkReturnStruct-8 273821572 4.210 ns/op 0 B/op 0 allocs/op
BenchmarkReturnPointer-8 51452276 23.31 ns/op 48 B/op 1 allocs/op
PASS
ok command-line-arguments 2.931s
Breaking down the BenchmarkReturnPointer-8
results, here's what you get:
MemAllocsPerOp = 48 B/op: Indicates the bytes allocated for each operation.
AllocatePerOp = 1: Tells us there was one memory allocation per operation during the benchmark.
4. Table Driven Benchmark, Sub-benchmark, Parallel
Just like tests, benchmarks can have sub-benchmarks and run in parallel.
However, the way you code it is a bit different, for demonstration, let’s use the recursive Fibonacci algorithm to show how to employ sub-benchmarks and parallel execution.
func fibRecursive(n int) int {
if n < 2 {
return n
}
return fibRecursive(n-1) + fibRecursive(n-2)
}
func BenchmarkFibonacciRecursive(b *testing.B) {
testCases := []struct { name string; n int }{
{name: "fib(10)", n: 10},
{name: "fib(20)", n: 20},
{name: "fib(30)", n: 30},
{name: "fib(40)", n: 40},
}
for _, tc := range testCases {
// tc := tc
b.Run(tc.name, func(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
_ := fibRecursive(tc.n)
}
})
})
}
}
The RunParallel
function behaves a bit differently in benchmarks compared to the Parallel
function in tests, which I covered in a past article.
If you’re not familiar with the terms “sub-test”, “sub-benchmark” and “parallel” You can refer back to my previous article.
Also, in RunParallel
, you'll notice that b.N
is replaced with pb.Next()
right? It signals that the loop should continue and is managed by the testing package.
In the given example, the set of test cases will execute one after the other, but the benchmark for each will run in parallel, so this means each benchmark is executed multiple times, not just once.
If you’re interested in learning about how benchmarking works, you can check out this resource: How Go Benchmark Actually Works.