Learning Rust: Week 4

2023/05/14

This will be the last installment in my series on learning Rust, in which I give some notes on Cow. I finished re-implementing Cow for myself and found some performance stuff that surprised me! As my main testbed, I’m using this function:

fn test_speed() {
    use rand::{thread_rng, Rng};
    use std::time::{Duration, Instant};
    use std::vec::*;

    let mut rng = thread_rng();
    let pos_nums: Vec<i32> = (1..1000000).collect();
    let neg_nums: Vec<i32> = (-50..999950).collect();

    let t_start = Instant::now();
    // naive method
    for _ in 1..10 {
        let mut mystery_arr = match rng.gen_bool(0.9) {
            true => pos_nums.clone(),
            false => neg_nums.clone(),
        };
        for i in 0..mystery_arr.len() {
            if mystery_arr[i] < 0 {
                mystery_arr[i] = -mystery_arr[i];
            }
        }
    }
    println!("Time elapsed (naive): {:?}", t_start.elapsed());

    let t_start = Instant::now();
    // cow method
    for _ in 1..10 {
        let mut mystery_arr = match rng.gen_bool(0.9) {
            true => MyCow::Borrowed(&pos_nums),
            false => {
                println!("MyCow must clone");
                MyCow::Borrowed(&neg_nums)
            }
        };
        for i in 0..mystery_arr.len() {
            if mystery_arr[i] < 0 {
                mystery_arr.to_mut()[i] = -mystery_arr[i];
            }
        }
    }
    println!("Time elapsed (my cow): {:?}", t_start.elapsed());

    use std::borrow::Cow;
    let t_start = Instant::now();
    // cow method
    for _ in 1..10 {
        let mut mystery_arr = match rng.gen_bool(0.9) {
            true => Cow::Borrowed(&pos_nums),
            false => {
                println!("Cow must clone");
                Cow::Borrowed(&neg_nums)
            }
        };
        for i in 0..mystery_arr.len() {
            if mystery_arr[i] < 0 {
                mystery_arr.to_mut()[i] = -mystery_arr[i];
            }
        }
    }
    println!("Time elapsed (cow): {:?}", t_start.elapsed());
}

This is taken pretty much exactly from the docs, and it will tell us how long each method (naive, Cow, and MyCow) took and how many borrows had to happen. All results are reported using debug compile targets, though I found that the release targets were consistently around 10x faster. And the result is…

Name Time Clones
Naive 902.07 ms 10
MyCow 1011.62 ms 0
Cow 998.73 ms 0

… huh. The good news is that MyCow performs similarly to Cow. The bad news is that we’re actually doing like 11% worse by using Cow. What’s going on here? My theory was that every time we access data inside a Cow, we incur a match statement that checks the variant. Not having branch prediction really hurts here, because Cow is pretty much the ultimate candidate for that. It looks like one Cow access is around 11% slower than copying 32 bits of memory as part of a large copy (we’d expect small copies to be slower per unit data because the latency will start to dominate). Indeed, if we make the testing arrays much smaller (100 elements):

Name Time Clones
Naive 250.69 $\mu$s 10
MyCow 101.88 $\mu$s 0
Cow 116.80 $\mu$s 1

Ah, this is more expected. Since the testing code is randomized, Cow ended up mutably borrowing once and MyCow not at all. The question then becomes: for small copies, how many accesses does it take to lose the benefits of Cow? The right answer, of course, is that it depends on the context and compiler and hardware, but the irresponsible speculation is that I found that it took around 1000 accumulated accesses on a Cow to equal the overhead of one copy (remembering that this is for small data). I wouldn’t be surprised by 50% deviations in this number, though!

This is the end of my Learning Rust log. I expect to keep using Rust, maybe a lot, in the future, and maybe posting about it here, but the idea of the regular updates was to track progress and investigate different ways to learn a programming language, and I think this final project makes sense as a milestone to indicate that the learning curve is leveling off enough that regular updates no longer make sense.