'How to benchmark memory usage of a function?

I notice that Rust's test has a benchmark mode that will measure execution time in ns/iter, but I could not find a way to measure memory usage.

How would I implement such a benchmark? Let us assume for the moment that I only care about heap memory at the moment (though stack usage would also certainly be interesting).

Edit: I found this issue which asks for the exact same thing.



Solution 1:[1]

You can use the jemalloc allocator to print the allocation statistics. For example,

Cargo.toml:

[package]
name = "stackoverflow-30869007"
version = "0.1.0"
edition = "2018"

[dependencies]
jemallocator = "0.3"
jemalloc-sys = {version = "0.3", features = ["stats"]}
libc = "0.2"

src/main.rs:

use libc::{c_char, c_void};
use std::ptr::{null, null_mut};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

extern "C" fn write_cb(_: *mut c_void, message: *const c_char) {
    print!("{}", String::from_utf8_lossy(unsafe {
        std::ffi::CStr::from_ptr(message as *const i8).to_bytes()
    }));
}

fn main() {
    unsafe { jemalloc_sys::malloc_stats_print(Some(write_cb), null_mut(), null()) };
}

In a single-threaded program that should allow you to get a good measurement of how much memory a structure takes. Just print the statistics before the structure is created and after and calculate the difference.


You can also use Valgrind (Massif) to get the heap profile. It works just like with any other C program. Make sure you have debug symbols enabled in the executable (e.g. using debug build or custom Cargo configuration). You can use, say, http://massiftool.sourceforge.net/ to analyse the generated heap profile.

(I verified this to work on Debian Jessie, in a different setting your mileage may vary).

(In order to use Rust with Valgrind you'll probably have to switch back to the system allocator).

P.S. There is now also a better DHAT.


jemalloc can be told to dump a memory profile. You can probably do this with the Rust FFI but I haven't investigated this route.

Solution 2:[2]

As far as measuring data structure sizes is concerned, this can be done fairly easily through the use of traits and a small compiler plugin. Nicholas Nethercote in his article Measuring data structure sizes: Firefox (C++) vs. Servo (Rust) demonstrates how it works in Servo; it boils down to adding #[derive(HeapSizeOf)] (or occasionally a manual implementation) to each type you care about. This is a good way of allowing precise checking of where memory is going, too; it is, however, comparatively intrusive as it requires changes to be made in the first place, where something like jemalloc’s print_stats() doesn’t. Still, for good and precise measurements, it’s a sound approach.

Solution 3:[3]

Currently, the only way to get allocation information is the alloc::heap::stats_print(); method (behind #![feature(alloc)]), which calls jemalloc's print_stats().

I'll update this answer with further information once I have learned what the output means.

(Note that I'm not going to accept this answer, so if someone comes up with a better solution...)

Solution 4:[4]

Now there is jemalloc_ctl crate which provides convenient safe typed API. Add it to your Cargo.toml:

[dependencies]
jemalloc-ctl = "0.3"
jemallocator = "0.3"

Then configure jemalloc to be global allocator and use methods from jemalloc_ctl::stats module:


Here is official example:

use std::thread;
use std::time::Duration;
use jemalloc_ctl::{stats, epoch};

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

fn main() {
    loop {
        // many statistics are cached and only updated when the epoch is advanced.
        epoch::advance().unwrap();

        let allocated = stats::allocated::read().unwrap();
        let resident = stats::resident::read().unwrap();
        println!("{} bytes allocated/{} bytes resident", allocated, resident);
        thread::sleep(Duration::from_secs(10));
    }
}

Solution 5:[5]

There's a neat little solution someone put together here: https://github.com/discordance/trallocator/blob/master/src/lib.rs

use std::alloc::{GlobalAlloc, Layout};
use std::sync::atomic::{AtomicU64, Ordering};

pub struct Trallocator<A: GlobalAlloc>(pub A, AtomicU64);

unsafe impl<A: GlobalAlloc> GlobalAlloc for Trallocator<A> {
    unsafe fn alloc(&self, l: Layout) -> *mut u8 {
        self.1.fetch_add(l.size() as u64, Ordering::SeqCst);
        self.0.alloc(l)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, l: Layout) {
        self.0.dealloc(ptr, l);
        self.1.fetch_sub(l.size() as u64, Ordering::SeqCst);
    }
}

impl<A: GlobalAlloc> Trallocator<A> {
    pub const fn new(a: A) -> Self {
        Trallocator(a, AtomicU64::new(0))
    }

    pub fn reset(&self) {
        self.1.store(0, Ordering::SeqCst);
    }
    pub fn get(&self) -> u64 {
        self.1.load(Ordering::SeqCst)
    }
}

Usage: (from: https://www.reddit.com/r/rust/comments/8z83wc/comment/e2h4dp9)

// needed for Trallocator struct (as written, anyway)
#![feature(integer_atomics, const_fn_trait_bound)]

use std::alloc::System;

#[global_allocator]
static GLOBAL: Trallocator<System> = Trallocator::new(System);

fn main() {
    GLOBAL.reset();
    println!("memory used: {} bytes", GLOBAL.get());
    {
        let mut vec = vec![1, 2, 3, 4];
        for i in 5..20 {
            vec.push(i);
            println!("memory used: {} bytes", GLOBAL.get());
        }
        for v in vec {
            println!("{}", v);
        }
    }
    // For some reason this does not print zero =/
    println!("memory used: {} bytes", GLOBAL.get());
}

I've just started using it, and it seems to work well! Straight-forward, realtime, requires no external packages, and doesn't require changing your base memory allocator.

It's also nice that, because it's intercepting the allocate/deallocate calls, you should be able to add custom logic if desired (eg. if memory usage goes above X, print the stack-trace to see what's triggering the allocations) -- although I haven't tried this yet.

I also haven't yet tested to see how much overhead this approach adds. If someone does a test for this, let me know!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Chris Morgan
Solution 3 llogiq
Solution 4 diralik
Solution 5 Venryx