Skip to content

optimizations

Go Performance Numbers You Can Actually Trace Back to Something

Warning

All benchmarks on this site are synthetic. They measure isolated behavior of Go's runtime and standard library under controlled, artificial conditions. Your production workload has different allocation patterns, different concurrency, different GC pressure, and different hardware. A result that shows a 20% improvement in a tight benchmark loop may mean nothing in your service, or it may mean everything -- you won't know without profiling your own code. Use this data as a directional signal, not a deployment decision.

A performance guide without numbers is opinion dressed as engineering. I've been building goperf.dev for a while now as a companion to my Go optimization work, and for most of that time the advice there was exactly that: well-reasoned opinion backed by understanding of the runtime internals, but not by reproducible measurements across Go versions. That changes now.

I've added actual benchmark data covering Go 1.24, 1.25, and 1.26 across three platforms: Linux amd64, Linux arm64, and macOS arm64. Seventy-six benchmarks spanning runtime internals, standard library, and networking. Every number on the site traces back to a specific EC2 instance type, a specific kernel version, a specific commit, and a documented collection process. The data lives on the Go Version Performance Tracking page.

When C++ Optimization Slows Down Your Go Code

When you have years of C++ experience, you definitely obtain some habits. These habits are good for C++, but could cause you some surprises in Go. In C++, you usually preallocate everything, avoiding unnecessary allocations, caching values aggressively, and always thinking of CPU cache misses. So when I rewrote a simple algorithm in Go—finding the number of days until the next warmer temperature—I reached for the same tricks. But this time, they backfired.

Here’s how applying familiar C++ optimizations ended up making my Go code slower and heavier.

Lazy initialization in Go using atomics

Aside from the main performance guide, I'm considering using the blog to share quick, informal insights and quirks related to Go performance and optimizations. Let's see if this casual experiment survives contact with reality.

Someone recently pointed out that my getResource() function using atomics has a race condition. Guilty as charged—rookie mistake, really. The issue? I naïvely set the initialized flag to true before the actual resource is ready. Brilliant move, right? This means that with concurrent calls, one goroutine might proudly claim victory while handing out a half-baked resource:

var initialized atomic.Bool
var resource *MyResource

func getResource() *MyResource {
    if !initialized.Load() {
        if initialized.CompareAndSwap(false, true) {
            resource = expensiveInit()
        }
    }
    return resource
}