This week I’m focusing on implementing my own version of Cow
, which I’m increasingly convinced is helpful even though I haven’t made much progress on it.
What I’ve Done
Rust’s Smart Pointers and Cow
Rust has raw pointers that you can pass around the way you’d pass around C pointers, but you’re not supposed to use them except in very specific circumstances. Raw pointers in Rust are unsafe and sort of ruin the point of following all of the ownership rules. Instead, you’re generally supposed to use smart pointers, the simplest of which is Box
, which just creates a heap-allocated variable to which all the usual Rust ownership rules apply. It’s a smart pointer in that the heap memory will be deallocated when the Box
falls out of scope.
Cow
is another smart pointer with a special property: it will clone (copy) the data it points to only if you end up needing a mutable reference. To take an example from the Cow
documentation, let’s say you have an array of i32
and ultimately need an immutable reference to the elementwise absolute value of that array. In general, you need a mutable reference to this data in case any of the values are negative, so you may have to clone it. But, you know that most of the time all of the values will already be nonnegative, so cloning the whole thing will usually be a waste of time. This is where Cow
comes in:
use std::borrow::Cow;
fn abs_all(input: &mut Cow<[i32]>) {
// Cow implements Deref, so Cows can be treated like the type they contain
for i in 0..input.len() {
let v = input[i];
if v < 0 {
// Clones into a vector if not already owned.
input.to_mut()[i] = -v;
}
}
}
// No clone occurs because `input` doesn't need to be mutated.
let slice = [0, 1, 2];
let mut input = Cow::from(&slice[..]);
abs_all(&mut input);
// Clone occurs because `input` needs to be mutated.
let slice = [-1, 0, 1];
let mut input = Cow::from(&slice[..]);
abs_all(&mut input);
// No clone occurs because `input` is already owned.
let mut input = Cow::from(vec![-1, 0, 1]);
abs_all(&mut input);
My Progress
I haven’t made much progress on this yet, but I’ll break down what I have as a learning exercise for myself:
enum Cow<'a, T>
where
T: 'a + ToOwned + ?Sized,
{
Borrowed(&'a T),
Owned(<T as ToOwned>::Owned),
}
Cow<T>
is a enum, with two variants: Borrowed
and Owned
. The Borrowed
variant will encapsulate a reference to a T
object, while the Owned
variant will own a T
object.
T
must implement ToOwned
, an interface which generalizes clone
to borrowed data (references) via a method called .to_owned(&self)
. ToOwned
has a type member <T as ToOwned>::Owned
, which defines the type of object you get back after calling .to_owned()
. To indicate that Cow
can wrap objects whose size is not known at compile time, T
is also annotated with the ?Sized
trait.
impl<T> Deref for Cow<'_, T>
where
T: ToOwned + ?Sized,
T::Owned: Borrow<T>,
{
type Target = T;
fn deref(&self) -> &T {
match *self {
Cow::Borrowed(b) => b,
Cow::Owned(ref owned) => owned.borrow(),
}
}
}
This block implements Deref
for Cow
, which is the feature that allows a Cow<T>
to be syntactically treated like a T
. As before, we specify that T
implements ToOwned
and may not have a known size, but we also require that the type T::Owned
(a type-valued field T
inherits from ToOwned
) implement Borrow<T>
, meaning that T::Owned
can be borrowed as a &T
. For example, String
implements Borrow<str>
, meaning that a String
instance my_string
can be borrowed as a &str
by calling my_string.borrow()
.
In the match
statement, then, our Cow
object is either a Borrowed(&T)
, in which case we return the reference directly, or it’s an Owned(T::Owned)
(where, confusingly, the outer Owned is the variant of the Cow
enum and the inner one is the field of the ToOwned
interface that T
implements). In this case, we will borrow the contents of the Cow
as a &T
with .borrow()
and return it. The ref
keyword in the match statement ensures that the match doesn’t consume the inner value owned
. Note also that simply calling deref()
does not get an owned copy of the data! It simply borrows it.
General Thoughts
I think this method for learning Rust is also extremely useful, even more than I’d imagined before I tried it! The huge strengths of open source contributions are that other people have done a lot of work already and will review your code/help you improve for free. But this can also be a weakness, because you don’t get to implement a lot of new data structures or interact with bad code. Implementing data structures is a thing you probably will have to do in making your own stuff, and interacting with bad code (especially your own) certainly isn’t an unadulterated good, but it’s important for developing an appreciation for/understanding of good code. Beyond that, there are a few advantages of re-implementing standard library features:
- There are reference implementations for everything you do, so you don’t have to get stuck.
- The standard library implementations tend to be unusually arcane/weird Rust. Once you can read and write this, you’re in very good shape for normal Rust. Also, sometimes “arcane” stuff is actually a good practice that people tend to be too lazy to do.
- You get to learn the standard library much better, so you know what’s already available to you and maybe end up reinventing the wheel a little less on net!