How Go Benchmark Actually Works
Gain a deeper understanding of how Go benchmarking works and optimize your code for maximum performance.
In my last piece on Go Benchmarking Essentials, I touched on the core methods for measuring how well your functions perform and various topics like sub-benchmarks, running in parallel, and monitoring allocations got their fair share of attention.
However, I left out how Go actually executes these benchmarks.
What’s the deal with b.N? How can you tweak your benchmarks to be more reliable? These are the areas we’ll dive into next.
1. How Go Runs Our Benchmark
If you’re here for a quick rundown, let’s cut to the chase.
“Does the benchmark just run our function a single time with a sky-high b.N, or what?”
Nope, the benchmark runs your function multiple times and it carefully scales up b.N
to offer more consistent and reliable results.
“So what’s the upper limit on this b.N, huh?”
The benchmark kicks off with b.N
set at 1 and continues to escalate it, either until it hits the target benchmark time or maxes out at 1e9
.
So you see, this approach gives you a comprehensive look at your function’s performance under different scenarios, pretty cool.
Enjoy the Example
To better illustrate, we’ll take a function from the basic article, we’re focusing on benchmarking a function that returns a pointer, here’s some context:
// just a struct with 3 string fields
type Hero struct {
Name string
Description string
Class string
}
func NewHero() Hero {
return Hero {
Name: "Hero",
Description: "Hero description",
Class: "Hero class",
}
}
//go:noinline
func ReturnHeroPtr() *Hero {
h := NewHero()
return &h
}
I’ll use a handy trick known as ‘time elapsed defer’ to log how long it takes along with the b.N values.
If you’re on Go 1.20 or newer, there’s also a convenient function called b.Elapse() that you might find useful but I’ll talk more about that later.
func timeElapsed(name string, loop int, current time.Time) {
fmt.Printf("%s took %s\n with b.N = %d\n", name, time.Since(current), loop)
}
func BenchmarkReturnPointer(b *testing.B) {
defer timeElapsed("BenchmarkReturnStruct", b.N, time.Now())
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = ReturnHeroPtr()
}
}
And if you’re not in the mood to sift through the details and just want a snapshot of the results, no problem, I’ve got just the thing for you.
Take a look at the outcomes for BenchmarkReturnPointer
:
BenchmarkReturnPointer took 833ns with b.N = 1
BenchmarkReturnPointer took 4.209µs with b.N = 100
BenchmarkReturnPointer took 429.417µs with b.N = 10000
BenchmarkReturnPointer took 48.229708ms with b.N = 1000000
BenchmarkReturnPointer took 862.395625ms with b.N = 24860826
BenchmarkReturnPointer took 1.203314417s with b.N = 34592074
So, it’s clear: the benchmark ran a total of 6 times, each time ramping up the value of b.N
until the function’s execution time hovered around the 1-second mark, which is the default setting.
2. b.Elapsed() in Go 1.20
Right now, our method of tracking time is somewhat basic, let’s enhance it by using b.Elapsed(), a feature introduced in Go 1.20.
This function is especially useful because it eliminates the need to include the time package just to measure the benchmark’s duration, here’s a slight modification of our benchmark to incorporate this function:
func BenchmarkReturnPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = ReturnHeroPtr()
}
fmt.Printf("%s took %s with b.N = %d\n", b.Elapsed(), b.N)
}
“Why didn’t you stick to using a defer statement with Println, like before?”
defer fmt.Printf("%s took %s with b.N = %d\n", b.Elapsed(), b.N)
Unfortunately, you can’t do that because the arguments for a function called with defer are evaluated when the defer statement is executed, not when the function is called.
I’ve elaborated on this in my prior article titled ‘What you know about DEFER in Go is not enough’.
However, if you’re keen on using defer, there’s a workaround, here’s how:
func timeEllapsedV2(b *testing.B) {
fmt.Printf("%s took %s with b.N = %d\n", b.Elapsed(), b.N)
}
// later in the code
defer timeEllapseV2(b)
3. Customize our Benchmark
Up to this point, we’ve been running benchmarks using the default settings but sometimes, though, we need a more controlled and complex environment to test our functions accurately.
-benchmem: Reporting memory allocations
If you’ve gone through our basic article, you should already know about this flag. It adds extra fields like MemAllocsPerOp
and AllocsPerOp
to the benchmark report.
-benchtime: adjust the benchmark duration
By default, Go runs the benchmark for around one second, but if you’re after more stable results, you can extend this duration to at least 3 seconds using the -benchtime
flag:
go test -bench=. -benchmem -benchtime=3s ./medium/type_test.go
And here are the results:
BenchmarkReturnPointer took 417ns with b.N = 1
BenchmarkReturnPointer took 2.5µs with b.N = 100
BenchmarkReturnPointer took 261.291µs with b.N = 10000
BenchmarkReturnPointer took 32.863208ms with b.N = 1000000
BenchmarkReturnPointer took 2.231092708s with b.N = 100000000
BenchmarkReturnPointer took 3.634805833s with b.N = 161354492
“Why did the benchmark end up taking roughly 3.6 seconds?”
Benchmarks are not guaranteed to take an exact amount of time because they don’t run in a vacuum. Some factors like CPU speed, background tasks, garbage collection,… can all influence the time taken.
With the -benchtime
option, you can even specify a custom b.N
value using the ‘x
’ suffix.
For example, to run the function 161354492 times:
$ go test -bench=. -benchmem -benchtime=161354492x ./medium/type_test.go
This gave us the following result:
BenchmarkReturnPointer took 3.587512708s with b.N = 161354492
-cpu: Adjusting CPU Usage
As I mentioned earlier, the tag “BenchmarkReturnPointer-8
” suggests that the test was run using 8 logical CPUs and Go takes advantage of these for parallel tasks.
To run the benchmark on different numbers of CPUs, you can use:
$ go test -bench=. -benchmem -benchtime=3s -cpu=1,2,4,8 ./medium/type_test.go
goos: darwin
goarch: arm64
BenchmarkReturnPointer 78372181 48.96 ns/op 48 B/op 1 allocs/op
BenchmarkReturnPointer-2 102945602 35.35 ns/op 48 B/op 1 allocs/op
BenchmarkReturnPointer-4 99063824 35.27 ns/op 48 B/op 1 allocs/op
BenchmarkReturnPointer-8 97751709 35.37 ns/op 48 B/op 1 allocs/op
You’ll see four different sets of results: BenchmarkReturnPointer
, BenchmarkReturnPointer-2
, BenchmarkReturnPointer-4
, and BenchmarkReturnPointer-8
.
“Why aren’t there big differences in the benchmark outcomes?”
In this specific case, since our function operates sequentially, using this flag doesn’t make a huge impact.
Having more cores only really helps if you compare it to having just a single core. It doesn’t make much of a difference whether you’re using 2, 4, or 8 cores.
With only one core, the OS scheduler might not allocate enough processing time to our test, resulting in slower outcomes.
-count: Run each test/ benchmark n times (default 1)
To make sure our benchmark results are dependable, it might be useful to run the tests multiple times.
If the elapsed time changes significantly across runs, it suggests that our function may be influenced by external variables, like random behavior or third-party services such as databases.
In those situations, the benchmarks might not be as reliable or useful as we’d hope.
Fine-tuning your benchmark settings, you’re not just collecting data, you’re gathering insights that can lead to meaningful code improvements.