Skip to content

Zero-Copy Techniques

Managing memory wisely can make a noticeable difference when writing performance-critical Go code. Zero-copy techniques are particularly effective because they avoid unnecessary memory copying by directly manipulating data buffers. By doing so, these techniques significantly enhance throughput and reduce latency, making them highly beneficial for applications that handle intensive I/O operations.

Understanding Zero-Copy

Traditionally, reading or writing data involves copying between user-space buffers and kernel-space buffers, incurring CPU and memory overhead. Zero-copy techniques bypass these intermediate copying steps, allowing applications to access and process data directly from the underlying buffers. This approach significantly reduces CPU load, memory bandwidth, and latency.

Common Zero-Copy Techniques in Go

Using io.Reader and io.Writer Interfaces

Leveraging interfaces such as io.Reader and io.Writer can facilitate efficient buffer reuse and minimize copying:

func StreamData(src io.Reader, dst io.Writer) error {
    buf := make([]byte, 4096) // Reusable buffer
    _, err := io.CopyBuffer(dst, src, buf)
    return err
}

io.CopyBuffer reuses a provided buffer, avoiding repeated allocations and intermediate copies. An in-depth io.CopyBuffer explanation is available on SO.

Slicing for Efficient Data Access

Slicing large byte arrays or buffers instead of copying data into new slices is a powerful zero-copy strategy:

func process(buffer []byte) []byte {
    return buffer[128:256] // returns a slice reference without copying
}

Slices in Go are inherently zero-copy since they reference the underlying array.

Memory Mapping (mmap)

Using memory mapping enables direct access to file contents without explicit read operations:

import "golang.org/x/exp/mmap"

func ReadFileZeroCopy(path string) ([]byte, error) {
    r, err := mmap.Open(path)
    if err != nil {
        return nil, err
    }
    defer r.Close()

    data := make([]byte, r.Len())
    _, err = r.ReadAt(data, 0)
    return data, err
}

This approach maps file contents directly into memory, entirely eliminating copying between kernel and user-space.

Benchmarking Impact

Here's a basic benchmark illustrating performance differences between explicit copying and zero-copy slicing:

var sink []byte

func BenchmarkCopy(b *testing.B) {
    data := make([]byte, 64*1024)
    for i := 0; i < b.N; i++ {
        buf := make([]byte, len(data))
        copy(buf, data)
        sink = buf
    }
}

func BenchmarkSlice(b *testing.B) {
    data := make([]byte, 64*1024)
    for i := 0; i < b.N; i++ {
        s := data[:]
        sink = s
    }
}

In BenchmarkCopy, a 64KB buffer is copied into a new slice during every iteration, incurring both memory allocation and data copy overhead. In contrast, BenchmarkSlice simply re-slices the same buffer without any allocation or copying. This demonstrates how zero-copy operations like slicing can vastly outperform traditional copying under load.

Info

These two functions are not equivalent in behavior—BenchmarkCopy makes an actual deep copy of the buffer, while BenchmarkSlice only creates a new slice header pointing to the same underlying data. This benchmark is not comparing functional correctness but is intentionally contrasting performance characteristics to highlight the cost of unnecessary copying.

Benchmark Time per op (ns) Bytes per op Allocs per op
BenchmarkCopy 4,246 65536 1
BenchmarkSlice 0.592 0 0

File I/O: Memory Mapping vs. Standard Read

We also benchmarked file reading performance using os.ReadAt versus mmap.Open for a 4MB binary file.

func BenchmarkReadWithCopy(b *testing.B) {
    f, err := os.Open("testdata/largefile.bin")
    if err != nil {
        b.Fatalf("failed to open file: %v", err)
    }
    defer f.Close()

    buf := make([]byte, 4*1024*1024) // 4MB buffer
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := f.ReadAt(buf, 0)
        if err != nil && err != io.EOF {
            b.Fatal(err)
        }
        sink = buf
    }
}

func BenchmarkReadWithMmap(b *testing.B) {
    r, err := mmap.Open("testdata/largefile.bin")
    if err != nil {
        b.Fatalf("failed to mmap file: %v", err)
    }
    defer r.Close()

    buf := make([]byte, r.Len())
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := r.ReadAt(buf, 0)
        if err != nil && err != io.EOF {
            b.Fatal(err)
        }
        sink = buf
    }
}
How to run the benchmark

To run the benchmark involving mmap, you’ll need to install the required package and create a test file:

go get golang.org/x/exp/mmap
mkdir -p testdata
dd if=/dev/urandom of=./testdata/largefile.bin bs=1M count=4

Benchmark Results

Benchmark Time per op (ns) Bytes per op Allocs per op
ReadWithCopy 94,650 0 0
ReadWithMmap 50,082 0 0

The memory-mapped version (mmap) is nearly 2× faster than the standard read call. This illustrates how zero-copy access through memory mapping can substantially reduce read latency and CPU usage for large files.

Show the complete benchmark file
package perf

import "testing"


// interface-start

type Worker interface {
    Work()
}

type LargeJob struct {
    payload [4096]byte
}

func (LargeJob) Work() {}
// interface-end

// bench-slice-start
var sink []Worker

func BenchmarkBoxedLargeSlice(b *testing.B) {
    jobs := make([]Worker, 0, 1000)
    for i := 0; i < b.N; i++ {
        jobs = jobs[:0]
        for j := 0; j < 1000; j++ {
            var job LargeJob
            jobs = append(jobs, job)
        }
        sink = jobs
    }
}

func BenchmarkPointerLargeSlice(b *testing.B) {
    jobs := make([]Worker, 0, 1000)
    for i := 0; i < b.N; i++ {
        jobs := jobs[:0]
        for j := 0; j < 1000; j++ {
            job := &LargeJob{}
            jobs = append(jobs, job)
        }
        sink = jobs
    }
}
// bench-slice-end

// bench-call-start
var sinkOne Worker

func call(w Worker) {
    sinkOne = w
}

func BenchmarkCallWithValue(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var j LargeJob
        call(j)
    }
}

func BenchmarkCallWithPointer(b *testing.B) {
    for i := 0; i < b.N; i++ {
        j := &LargeJob{}
        call(j)
    }
}
// bench-call-end

When to Use Zero-Copy

Zero-copy techniques are highly beneficial for:

  • Network servers handling large amounts of concurrent data streams. Avoiding unnecessary memory copies helps reduce CPU usage and latency, especially under high load.
  • Applications with heavy I/O operations like file streaming or real-time data processing. Zero-copy allows data to move through the system efficiently without redundant allocations or copies.

Warning

Zero-copy should be used judiciously. Since slices share underlying memory, care must be taken to prevent unintended data mutations. Shared memory can lead to subtle bugs if one part of the system modifies data still in use elsewhere. Zero-copy can also introduce additional complexity, so it’s important to measure and confirm that the performance gains are worth the tradeoffs.

Real-World Use Cases and Libraries

Zero-copy strategies aren't just theoretical—they're used in production by performance-critical Go systems:

  • fasthttp: A high-performance HTTP server designed to avoid allocations. It returns slices directly and avoids string conversions to minimize copying.
  • gRPC-Go: Uses internal buffer pools and avoids deep copying of large request/response messages to reduce GC pressure.
  • MinIO: An object storage system that streams data directly between disk and network using io.Reader without unnecessary buffer replication.
  • Protobuf and MsgPack libraries: Efficient serialization frameworks like google.golang.org/protobuf and vmihailenco/msgpack support decoding directly into user-managed buffers.
  • InfluxDB and Badger: These storage engines use mmap extensively for fast, zero-copy access to database files.

These libraries show how zero-copy techniques help reduce allocations, GC overhead, and system call frequency—all while increasing throughput.