14th March, 2023 - 1:00 PM
Understanding Moves through Rust String data type Representation.
I was reading The Rust Programming Language Book, on Chapter 3, titled, Ownership. The first part of that chapter talks about understanding Ownership by using the String data type.
This writeup is not about teaching you Ownership and Moves. I hope that it gives you an understanding of how Rust represents complex data structures in memory. This article is supposed to be used as a companion to Rust Book on the chapter that teaches Ownership.
What led me to writing this article is that, I realized that the Rust Book drawings of how Rust represent Strings in memory wasn't clear enough. I took to Excalidraw to draw the diagrams in a more understandable way with intentions of showing it on Wednesday during our DevCongress Rust Study Group meeting. When I was done with the drawing, I realized that understanding how Rust store Strings in memory and the need to prevent double frees, gives one an intuitive understanding of Rust move semantics. So I set out to write this using my diagrams to explain instead of using The Rust Book's drawings with hopes that Ownership, Rust Strings and Moves will click for you and you will never have to worry about them again. Just hopes, no promises.
The 3 cardinal Ownership rules
- 1. Each Value has an Owner.
- 2. Each Value must have only one Owner.
- 3. When the owner of a value goes out of Scope, the value is dropped.
These are the rules that power Rust's advanced type system. These rules might be simple and easy to memorize but understanding the rules and how they provide Rust with memory safety is very important to understanding Rust.
The Borrow Checker
Sure you must have heard of the Borrow-Checker and how Rust has a steep learning curve. The Borrow-Checker will fight you, you will have a lot of compilation issues but once you are able to fix the issues and your code compiles, Rust guarantees you that the program will always run without memory safety bugs. So the Borrow-Checker gives you upfront pain for more peace during runtime.
Stack vs Heap
There are a lot of resources that try to explain Stacks and Heaps, a good example is The Rust Book, which I would advise that you read it, if you haven't read it already.
I am only going to state a few properties of Stacks and Heaps that are relevant in explaining how Rust manages memory for us without Garbage Collection while guaranteeing memory safety.
The Stack is a Last-in First-out data structure. This means that when you put data into the stack, You put each data block on top of the prior stacked block, think of packing plates, the ideal way to remove the plates is to start removing from the top. You add data to the Stack through a process called push and you remove data from the Stack through a process called pop.
In Rust, all data that is stored on the Stack is fixed and can be determined during compile time. This is very important. In Rust, function arguments, pointers to data allocated on the Heap and the function's own local variables are stored on the stack.
The Heap is a growable data structure. When you want to store data on the Heap, you ask the memory allocator to allocate a space of memory for you, and it finds a memory space large enough to hold your data and marks it as being in use and returns a pointer to the memory. The Heap is slower than the Stack because we have to first allocate enough memory, dynamically grow it and also look up data when we need it using a pointer.
The String Rust Data Type
Is a growable, mutable, owned, UTF-8 Encoded data type, known as a complex data type in Rust. Does the growable give you a hint on where it will be stored? Heap Heap Hurra.....
Rust has two types of strings, str, known as a string slice, its size is easily determined at compile time by just looking at it. It is stored in the program's binary. There is another String data type implemented in the std lib. It has an implementation that allows the Rust Compiler to drop it when its owner goes out of Scope, freeing memory. This is in contrast to other languages like Go that has a Garbage collector to free memory that is no longer in use and C that allows the programmer to manually allocate and free memory. Both of which have downsides.
A Rust String data type is made up of 3 sections.
1. A pointer to the memory allocated for the string to store its data 2. Length, the number of bytes that the String data consumes. 3. Capacity. Before this clicked for me, length and capacity confused me but the difference is very simple. The capacity is the total number of bytes allocated for the String's data. That means we can have a string data that doesn't use all the bytes that the OS allocates for it, in which case the Capacity will be bigger than the length, or we can have the String's data consume all the bytes, in this case the length and the capacity are the same. If the allocated memory weren't dynamically resizable, We could have another case where the String's data is bigger than the capacity. But how? Hear me out. We already said the String data type is growable. But, how do we grow a String? The String data type has methods that we can use to add data to the String instance, essentially growing it.
let mut comm = String::from("Dev");
comm.push_str("Congress"); // push_str() appends a literal to a String
println!("{}", comm); // This will print the string `DevCongress`
When we wrote the line above, we got allocated with some memory, of length 3 and capacity 3. Later we decided to add "Congress"."DevCongress" can no longer be stored in the first memory we were allocated because its capacity was only 3 but now "comm" requires memory of minimum capacity of 11. Now our variable "comm" gets allocated with a new memory address, one that has enough capacity to store "DevCongress", That is what we call growing a string.
I hope this gives you a good anatomy of the String data type. An astute reader might ask, so where is the string data? In our example above the String's data would be "DevCongress". Here is where the Heap comes in, the string data is stored on the heap. This way it can be dynamically resized without any problems because that is how the Heap works.
Suffice to say, when it comes to String, Rust divides it into 2, Storing the metadata of the String on the Stack and the value of the String itself in the heap. This way, we can store the metadata which is fixed at compile time on the Stack and the value of the String, which can be dynamically resized during runtime is then stored on the heap.
Rust String Representation
![Rust String Representation](../images/string_representation.png)
The image above shows everything that I described above so far. I will add in little notes below in hopes of clarifying the diagram.
You can see ... written below the Stack, Heap texts on both diagrams, and also at the bottom of the Heap. They show the directions that more memory can be allocated. In the Stack diagram it shows that data can only be added from the above. And in the Heap since memory is allocated randomly, we assume that data can be added from above or below, since the space that the value of our variable occupies was allocated randomly.
Understanding Moves
let a = 13
let b = a
println!("a: {}, b: {}", a, b); // prints 'a: 13, b: 13'
If you compiled this it would run without issues, this works because Rust Scalar data types are lightweight, they are fixed sized and don't consume a lot of memory storage. They have a Copy trait implementation that allows them to be copied from one variable to the other without any performance hits.
let comm = String::from("Rust Study Group");
let comm = comm;
println!("community: {}, group: {}", comm, group);
One would expect the above to work like that, but it doesn't. You see, Strings can be very huge, they can be a 3 letter variable like we saw above to as big as a whole dictionary. It would be grossly inefficient to copy such huge data from variable to variable. This is why Rust stores Strings are explained above.
What Rust does above is that, instead of copying the whole String, it copies only the part of the string stored on the Stack ( the pointer, the length and the capacity ).
So now, both comm
and group
, stores the same data.
point to the same
memory location. Which
means they both now own the same value stored on the heap. This violates Ownership rules #2 as stated
Enter Move!
We already violated Ownership rule #2 above, which states that, Each value must have only one Owner.
Why not violate, #3 while we can? Which also states that, When the Owner of a value goes out of Scope,
the value is dropped. What happens if our program transitions from the scope of comm and group? comm and
group should be dropped, as per Ownership rule #3. But now we have 2 variables pointing to the same
memory. If Rust tried dropping group and comm
, By the time it tried to drop
, group
would have been
dropped already leaving us with what we call double-frees. Double free is definitely a memory bug and as
Graydon Hoare states, [Memory] Safety is Rust's
raison d'etre; we need to find a way to solve this problem.
How does Rust manage double free errors, in this situation? Move. When we bound group
to comm
, Rust
made sure to drop comm
, Making group
the only owner
of the
data stored in the heap. In this way,
When group
is dropped later on, there aren't any issues because group
is the only owner of "Rust
Study Group".
This is the epiphany that led me to this article. Move doesn't need to be taught differently, in fact move becomes a design consequence of having memory safety, not a new feature that we need to learn.
Foot Note: DevCongress
DevCongress DevCongress is a developers' movement aimed at becoming the most vibrant across the software developers space in Ghana and Africa. The objectives are very simple.
Foot Note: Rust Study Group
The Rust Study Group is a group of Rust enthusiast DevCongress members who study The Rust Programming Language Book and meet weekly on Wednesdays to discuss and teach. I am the Group Lead.