Learning Rust: Week 3

2023/05/11

This week I’m focusing on implementing my own version of Cow, which I’m increasingly convinced is helpful even though I haven’t made much progress on it.

What I’ve Done

Rust’s Smart Pointers and Cow

Rust has raw pointers that you can pass around the way you’d pass around C pointers, but you’re not supposed to use them except in very specific circumstances. Raw pointers in Rust are unsafe and sort of ruin the point of following all of the ownership rules. Instead, you’re generally supposed to use smart pointers, the simplest of which is Box, which just creates a heap-allocated variable to which all the usual Rust ownership rules apply. It’s a smart pointer in that the heap memory will be deallocated when the Box falls out of scope.

Cow is another smart pointer with a special property: it will clone (copy) the data it points to only if you end up needing a mutable reference. To take an example from the Cow documentation, let’s say you have an array of i32 and ultimately need an immutable reference to the elementwise absolute value of that array. In general, you need a mutable reference to this data in case any of the values are negative, so you may have to clone it. But, you know that most of the time all of the values will already be nonnegative, so cloning the whole thing will usually be a waste of time. This is where Cow comes in:

use std::borrow::Cow;

fn abs_all(input: &mut Cow<[i32]>) {
	// Cow implements Deref, so Cows can be treated like the type they contain
    for i in 0..input.len() {
        let v = input[i];
        if v < 0 {
            // Clones into a vector if not already owned.
            input.to_mut()[i] = -v;
        }
    }
}

// No clone occurs because `input` doesn't need to be mutated.
let slice = [0, 1, 2];
let mut input = Cow::from(&slice[..]);
abs_all(&mut input);

// Clone occurs because `input` needs to be mutated.
let slice = [-1, 0, 1];
let mut input = Cow::from(&slice[..]);
abs_all(&mut input);

// No clone occurs because `input` is already owned.
let mut input = Cow::from(vec![-1, 0, 1]);
abs_all(&mut input);

My Progress

I haven’t made much progress on this yet, but I’ll break down what I have as a learning exercise for myself:

enum Cow<'a, T>
where
    T: 'a + ToOwned + ?Sized,
{
    Borrowed(&'a T),
    Owned(<T as ToOwned>::Owned),
}

Cow<T> is a enum, with two variants: Borrowed and Owned. The Borrowed variant will encapsulate a reference to a T object, while the Owned variant will own a T object.

T must implement ToOwned, an interface which generalizes clone to borrowed data (references) via a method called .to_owned(&self). ToOwned has a type member <T as ToOwned>::Owned, which defines the type of object you get back after calling .to_owned(). To indicate that Cow can wrap objects whose size is not known at compile time, T is also annotated with the ?Sized trait.

impl<T> Deref for Cow<'_, T>
where
    T: ToOwned + ?Sized,
    T::Owned: Borrow<T>,
{
    type Target = T;

    fn deref(&self) -> &T {
        match *self {
            Cow::Borrowed(b) => b,
            Cow::Owned(ref owned) => owned.borrow(),
        }
    }
}

This block implements Deref for Cow, which is the feature that allows a Cow<T> to be syntactically treated like a T. As before, we specify that T implements ToOwned and may not have a known size, but we also require that the type T::Owned (a type-valued field T inherits from ToOwned) implement Borrow<T>, meaning that T::Owned can be borrowed as a &T. For example, String implements Borrow<str>, meaning that a String instance my_string can be borrowed as a &str by calling my_string.borrow().

In the match statement, then, our Cow object is either a Borrowed(&T), in which case we return the reference directly, or it’s an Owned(T::Owned) (where, confusingly, the outer Owned is the variant of the Cow enum and the inner one is the field of the ToOwned interface that T implements). In this case, we will borrow the contents of the Cow as a &T with .borrow() and return it. The ref keyword in the match statement ensures that the match doesn’t consume the inner value owned. Note also that simply calling deref() does not get an owned copy of the data! It simply borrows it.

General Thoughts

I think this method for learning Rust is also extremely useful, even more than I’d imagined before I tried it! The huge strengths of open source contributions are that other people have done a lot of work already and will review your code/help you improve for free. But this can also be a weakness, because you don’t get to implement a lot of new data structures or interact with bad code. Implementing data structures is a thing you probably will have to do in making your own stuff, and interacting with bad code (especially your own) certainly isn’t an unadulterated good, but it’s important for developing an appreciation for/understanding of good code. Beyond that, there are a few advantages of re-implementing standard library features: