When C++ Optimization Slows Down Your Go Code¶
When you have years of C++ experience, you definitely obtain some habits. These habits are good for C++, but could cause you some surprises in Go. In C++, you usually preallocate everything, avoiding unnecessary allocations, caching values aggressively, and always thinking of CPU cache misses. So when I rewrote a simple algorithm in Go—finding the number of days until the next warmer temperature—I reached for the same tricks. But this time, they backfired.
Here’s how applying familiar C++ optimizations ended up making my Go code slower and heavier.
The C++ Context¶
The original problem: for each day, figure out how many days pass until a warmer temperature appears. A classic use case for a monotonic stack. Here's the performance data from a C++ implementation:
dailyTemperatures, basic implementation
std::vector<int> dailyTemperatures( const std::vector<int> &temperatures )
{
std::vector<int> result( temperatures.size(), 0 );
std::stack<int> s;
for( int i = 0; i < temperatures.size(); ++i ) {
while( !s.empty() && temperatures[i] > temperatures[s.top()] ) {
int prev = s.top();
s.pop();
result[prev] = i - prev;
}
s.push( i );
}
return result;
}
dailyTemperatures, optimized implementation
std::vector<int> dailyTemperaturesOpt( const std::vector<int> &temperatures )
{
std::vector<int> res( temperatures.size(), 0 );
std::vector<int> track; // (1)
track.reserve( temperatures.size() ); // (2)
for( int i = 0; i < temperatures.size(); ++i ) {
int currTemp = temperatures[i]; // (3)
while( !track.empty() && currTemp > temperatures[track.back()] ) {
int prev = track.back();
track.pop_back();
res[prev] = i - prev;
}
track.push_back( i );
}
return res;
}
-
std::vector
is used instead ofstd::stack
for better control and performance. Unlikestd::stack
,std::vector
allows preallocation, random access, and avoids the overhead of an adapter layer. It's more cache-friendly and directly exposes the underlying data layout. -
We reserve the full capacity of the
track
stack to avoid dynamic memory reallocations during growth. Since the number of pushed elements can never exceedtemperatures.size()
, this is a safe and efficient optimization. -
currTemp
holds the value oftemperatures[i]
so we don’t hit the same memory location multiple times inside the loop. While the access pattern is sequential and usually cache-friendly, reading the same slice element repeatedly adds work for the compiler and the CPU. By assigning it to a local variable, we give the compiler a clear signal that this value won’t change. That usually means it stays in a register, which can reduce instruction count and make the inner loop tighter—especially when you’re benchmarking or pushing for low-latency behavior.
Benchmark | Time per op (ns) |
---|---|
BM_DailyTemperatures/100000 | 206,340 |
BM_DailyTemperaturesOpt/100000 | 115,490 |
Standard C++ optimization tactics can cut the runtime almost in half. But even if something works in C++ pretty well, it does not mean that the same approach will not make your Go code slower.
Translating to Go¶
Here’s what a clean idiomatic Go version looks like:
func DailyTemperatures(temperatures []int) []int {
result := make([]int, len(temperatures))
var stack []int
for i, temp := range temperatures {
for len(stack) > 0 && temp > temperatures[stack[len(stack)-1]] {
prevIndex := stack[len(stack)-1]
stack = stack[:len(stack)-1]
result[prevIndex] = i - prevIndex
}
stack = append(stack, i)
}
return result
}
Pretty typical: no preallocation, straightforward stack growth via append.
I rewrote this with “optimizations”: preallocate the stack slice, replace variables early, and reduce slice bounds checks. Classic C++-style low-level thinking.
func DailyTemperaturesOpt(temperatures []int) []int {
n := len(temperatures)
result := make([]int, n)
stack := make([]int, 0, n/4) // (1)
for i := 0; i < n; i++ {
curr := temperatures[i] // (2)
for len(stack) > 0 && curr > temperatures[stack[len(stack)-1]] {
prev := stack[len(stack)-1]
stack = stack[:len(stack)-1]
result[prev] = i - prev
}
stack = append(stack, i)
}
return result
}
- The
stack
is preallocated with a capacity ofn/4
, not the fulln
, to strike a balance between reducing allocations and avoiding unnecessary heap growth. In Go, over-allocation can backfire due to GC sensitivity and increased memory footprint. A smaller initial capacity avoids paying for memory that may never be used. NOTE: The result will be even worse if we preallocatelen(temperatures)
here. curr
storestemperatures[i]
so we don’t keep looking it up inside the loop. Go’s compiler might optimize that on its own, but being explicit gives it less to guess about. It also helps the value stay in a register instead of bouncing back to memory, which can matter in tight loops. If you’re running benchmarks or chasing small gains, this kind of local caching can reduce register pressure and cut down on subtle overhead.
And the Result? It got worse.
Benchmark | Time per op (ns) | Bytes per op | Allocs per op |
---|---|---|---|
BenchmarkDailyTemperatures/Baseline-14 | 174,419 | 862,847 | 15 |
BenchmarkDailyTemperatures/Optimized-14 | 175,021 | 1,007,620 | 2 |
Same logic, but now slower and heavier. One fewer allocation, but an extra 150KB of memory usage. Why? Because in Go, allocating a 100,000-capacity slice (even if you barely use it) is expensive. The runtime doesn’t treat that lightly.
Why Go Behaves Differently¶
The runtime is more opinionated. Memory is GC-managed. There’s no real benefit to preallocating more than you need, especially if your code doesn’t end up using it. append()
is cheap, and in many cases more efficient than second-guessing the allocator.
On top of that, Go’s escape analysis doesn’t work the way C++’s stack allocation does. What you think is local might end up on the heap, just because of one indirect reference.
Key Takeaways
- Preallocation helps in C++ because memory layout and growth are under your control. In Go, the runtime handles it differently.
- Trust the idioms of the language. If you’re writing Go, let Go be Go.
- Measure everything. Some optimizations only look good on paper—or in other languages.
I still write C++. I still optimize memory, inlining, and stack frames. But when I’m in Go, I’ve learned to lean into the model that Go is designed for. Sometimes performance comes from understanding how much less you need to do—not how much you can tweak.