Zero-Copy Techniques¶

When writing performance-critical Go code, how memory is managed often has a bigger impact than it first appears. Zero-copy techniques are one of the more effective ways to tighten that control. Instead of moving bytes from buffer to buffer, these techniques work directly on existing memory—avoiding copies altogether. That means less pressure on the CPU, better cache behavior, and fewer GC-triggered pauses. For I/O-heavy systems—whether you’re streaming files, handling network traffic, or parsing large datasets—this can translate into much higher throughput and lower latency without adding complexity.

Understanding Zero-Copy¶

In the usual I/O path, data moves back and forth between user space and kernel space—first copied into a kernel buffer, then into your application’s buffer, or the other way around. It works, but it’s wasteful. Every copy burns CPU cycles and clogs up memory bandwidth. Zero-copy changes that. Instead of bouncing data between buffers, it lets applications work directly with what’s already in place—no detours, no extra copies. The result? Lower CPU load, better use of memory, and faster I/O, especially when throughput or latency actually matter.

Common Zero-Copy Techniques in Go¶

Using `io.Reader` and `io.Writer` Interfaces¶

Using interfaces like io.Reader and io.Writer gives you fine-grained control over how data flows. Instead of spinning up new buffers every time, you can reuse existing ones and keep memory usage steady. In practice, this avoids unnecessary garbage collection pressure and keeps your I/O paths clean and efficient—especially when you’re dealing with high-throughput or streaming workloads.

func StreamData(src io.Reader, dst io.Writer) error {
    buf := make([]byte, 4096) // Reusable buffer
    _, err := io.CopyBuffer(dst, src, buf)
    return err
}

io.CopyBuffer reuses a provided buffer, avoiding repeated allocations and intermediate copies. An in-depth io.CopyBuffer explanation is available on SO.

Slicing for Efficient Data Access¶

Slicing large byte arrays or buffers instead of copying data into new slices is a powerful zero-copy strategy:

func process(buffer []byte) []byte {
    return buffer[128:256] // returns a slice reference without copying
}

Slices in Go are inherently zero-copy since they reference the underlying array.

Memory Mapping (`mmap`)¶

Using memory mapping enables direct access to file contents without explicit read operations:

import "golang.org/x/exp/mmap"

func ReadFileZeroCopy(path string) ([]byte, error) {
    r, err := mmap.Open(path)
    if err != nil {
        return nil, err
    }
    defer r.Close()

    data := make([]byte, r.Len())
    _, err = r.ReadAt(data, 0)
    return data, err
}

This approach maps file contents directly into memory, entirely eliminating copying between kernel and user-space.

Benchmarking Impact¶

Here's a basic benchmark illustrating performance differences between explicit copying and zero-copy slicing:

func BenchmarkCopy(b *testing.B) {
    data := make([]byte, 64*1024)
    for b.Loop() {
        buf := make([]byte, len(data))
        copy(buf, data)
    }
}

func BenchmarkSlice(b *testing.B) {
    data := make([]byte, 64*1024)
    for b.Loop() {
        _ = data[:]
    }
}

In BenchmarkCopy, each iteration copies a 64KB buffer into a fresh slice—allocating memory and duplicating data every time. That cost adds up fast. BenchmarkSlice, on the other hand, just re-slices the same buffer—no allocation, no copying, just new view on the same data. The difference is night and day. When performance matters, avoiding copies isn’t just a micro-optimization—it’s fundamental.

Info

These two functions are not equivalent in behavior—BenchmarkCopy makes an actual deep copy of the buffer, while BenchmarkSlice only creates a new slice header pointing to the same underlying data. This benchmark is not comparing functional correctness but is intentionally contrasting performance characteristics to highlight the cost of unnecessary copying.

Benchmark	Time per op (ns)	Bytes per op	Allocs per op
BenchmarkCopy	4,246	65536	1
BenchmarkSlice	0.592	0	0

File I/O: Memory Mapping vs. Standard Read¶

We also benchmarked file reading performance using os.ReadAt versus mmap.Open for a 4MB binary file.

func BenchmarkReadWithCopy(b *testing.B) {
    f, err := os.Open("testdata/largefile.bin")
    if err != nil {
        b.Fatalf("failed to open file: %v", err)
    }
    defer f.Close()

    buf := make([]byte, 4*1024*1024) // 4MB buffer
    for b.Loop() {
        _, err := f.ReadAt(buf, 0)
        if err != nil && err != io.EOF {
            b.Fatal(err)
        }
    }
}

func BenchmarkReadWithMmap(b *testing.B) {
    r, err := mmap.Open("testdata/largefile.bin")
    if err != nil {
        b.Fatalf("failed to mmap file: %v", err)
    }
    defer r.Close()

    buf := make([]byte, r.Len())
    for b.Loop() {
        _, err := r.ReadAt(buf, 0)
        if err != nil && err != io.EOF {
            b.Fatal(err)
        }
    }
}

How to run the benchmark

To run the benchmark involving mmap, you’ll need to install the required package and create a test file:

go get golang.org/x/exp/mmap
mkdir -p testdata
dd if=/dev/urandom of=./testdata/largefile.bin bs=1M count=4

Benchmark Results

Benchmark	Time per op (ns)	Bytes per op	Allocs per op
ReadWithCopy	94,650	0	0
ReadWithMmap	50,082	0	0

The memory-mapped version (mmap) is nearly 2× faster than the standard read call. This illustrates how zero-copy access through memory mapping can substantially reduce read latency and CPU usage for large files.

Show the complete benchmark file

package perf

import "testing"


// interface-start

type Worker interface {
    Work()
}

type LargeJob struct {
    payload [4096]byte
}

func (LargeJob) Work() {}
// interface-end

// bench-slice-start
func BenchmarkBoxedLargeSlice(b *testing.B) {
    jobs := make([]Worker, 0, 1000)
    for b.Loop() {
        jobs = jobs[:0]
        for j := 0; j < 1000; j++ {
            var job LargeJob
            jobs = append(jobs, job)
        }
    }
}

func BenchmarkPointerLargeSlice(b *testing.B) {
    jobs := make([]Worker, 0, 1000)
    for b.Loop() {
        jobs := jobs[:0]
        for j := 0; j < 1000; j++ {
            job := &LargeJob{}
            jobs = append(jobs, job)
        }
    }
}
// bench-slice-end

// bench-call-start
var sink Worker

func call(w Worker) {
    sink = w
}

func BenchmarkCallWithValue(b *testing.B) {
    for b.Loop() {
        var j LargeJob
        call(j)
    }
}

func BenchmarkCallWithPointer(b *testing.B) {
    for b.Loop() {
        j := &LargeJob{}
        call(j)
    }
}
// bench-call-end

When to Use Zero-Copy¶

Zero-copy techniques are highly beneficial for:

Network servers handling large amounts of concurrent data streams. Avoiding unnecessary memory copies helps reduce CPU usage and latency, especially under high load.
Applications with heavy I/O operations like file streaming or real-time data processing. Zero-copy allows data to move through the system efficiently without redundant allocations or copies.

Warning

Zero-copy isn’t a free win. Slices share underlying memory, so reusing them means you’re also sharing state. If one part of your code changes the data while another is still reading it, you’re setting yourself up for subtle, hard-to-track bugs. This kind of shared memory requires discipline—clear ownership and tight control. It also adds complexity, which might not be worth it unless the performance gains are real and measurable. Always benchmark before committing to it.

Real-World Use Cases and Libraries¶

Zero-copy strategies aren't just theoretical—they're used in production by performance-critical Go systems:

fasthttp: A high-performance HTTP server designed to avoid allocations. It returns slices directly and avoids string conversions to minimize copying.
gRPC-Go: Uses internal buffer pools and avoids deep copying of large request/response messages to reduce GC pressure.
MinIO: An object storage system that streams data directly between disk and network using io.Reader without unnecessary buffer replication.
Protobuf and MsgPack libraries: Efficient serialization frameworks like google.golang.org/protobuf and vmihailenco/msgpack support decoding directly into user-managed buffers.
InfluxDB and Badger: These storage engines use mmap extensively for fast, zero-copy access to database files.

These libraries show how zero-copy techniques help reduce allocations, GC overhead, and system call frequency—all while increasing throughput.