Writing Unsafe Rust

Unsafe Superpowers

Up until now the code we have wrote followed Rust memory safety guidelines and checked at compile time.

To opt out of these memory safety guarantees then we need to use unsafe Rust.

This exist for two reasons:

Static Analysis is conservative by nature.

Rust will reject a valid program if it can't guarantee that the program is memory safe even though you as a developer know it is. 2. Underlying computer hardware is inherently unsafe.

If Rust can't allow you to do certain unsafe operations then you couldn't do certain tasks. Since it is systems programming language so it must allow you to do low-level systems programming which sometimes require unsafe code.

Unsafe code is written in unsafe block and gives you five abilities:

Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of unions

Although unsafe doens't disable borrow checker or disable Rust safety checks. For e.g., if you have a reference inside unsafe it'll still be checked.

It's upto developer to make sure that memory inside unsafe is handled appropriately. Keep unsafe block small and manageable. We can also enclose unsafe code in a safe abstration and provide a safe API.

Dereferncing a Raw Pointer

The compiler ensures references are valid.

Dandling References

Unsafe Rust has two types of raw pointers similar to references. T 1. Immutable raw pointer (*const <T>) The pointer can't be directly assigned after it has been dereferenced. 2. Mutable raw pointer (*mut <T>)

The * here isn't dereference operator.

fn main() {
    let mut num = 5;

    let r1 = &num as *const i32;   // Immutable raw pointer
    let r2 = &mut num as *mut i32; // Mutable raw pointer
}

We don't use unsafe keyword here. That's because Rust allows us to create raw pointers but doesn't allow us to dereference them unless done in unsafe block.

The as keyword is used to cast a immutable reference and mutable reference to their corresponding raw pointer types.

Raw pointer are different from Smart pointers: - Raw pointers are allowed to ignore Rust borrowing rules by having mutable and immutable pointer or multiple mutable pointers to the same location in memory. - Raw pointers are also not guaranteed to point to valid memory. - Raw pointer are allowed to be null. - Raw pointer don't implement any automatic cleanup.

We do know that the raw pointers we created above are valid, but that won't be true always incase of raw pointers. for e.g., let create a pointer to an arbitrary memory address:

fn main() {
    let address = 0x012345usize;
    let r3 = address as *const i32;
}

In this example there might be valid memory or there might not be at that address. This can lead to undefined behavior. Compiler might try to optimize the code such that there is no memory access or we may get segmentation fault.

Let's try to dereference our original r1 & r2 raw pointers by using unsafe:

fn main() {
    let mut num = 5;

    let r1 = &num as *const i32;
    let r2 = &mut num as *mut i32;

    unsafe {
        println!("r1 is: {}", *r1);
        println!("r2 is: {}", *r2);
    }
}

Running this works fine:

r1 is: 5
r2 is: 5

If we were to create an immutable and mutable reference to the same location in memory then the program would not compile because that would violate ownership rules. Raw pointers allow us to bypass those rules but can lead to data races.

Calling Unsafe Function or Method

Unsafe functions or methods look the same as regular functions or methods, except they have a unsafe keyword at the beginning of their definition.

unsafe in this context means that the function requires correct arguments otherwise it could lead to undefined behavior.

So do make sure to read documentation suggesting the requirements of unsafe function for upholding the functions contracts.

Unsafe functions must be called inside other unsafe functions or inside unsafe block.

fn main() {
    unsafe fn dangerous() {
        // no need for unsafe block inside unsafe function
    }

    unsafe {
        dangerous();
    }

    // unsafe {
        dangerous();
    //  ^^^^^^^^^^^ error: this operation is unsafe and requires an unsafe function or block
    // }
}

Creating a Safe Abstration

Just because a function contains unsafe code doesn't make it an unsafe function.

You can wrap unsafe code inside a safe one.

fn main() {
    let mut v = vec![1, 2, 3, 4, 5, 6];

    let r = &mut v[..]; // create a mutable slice `r` of `v`

    let (a, b) = r.split_at_mut(3); // this will return tuple

    assert_eq!(a, &mut [1, 2, 3]);
    assert_eq!(b, &mut [4, 5, 6]);
}

split_at_mut() is a safe method implemented on mutable slices which will split the slice into two slices along the index passed in.

Imagine we wanted to split this function using only safe Rust code. Which might look something like this:

// for simplicity we implement function not a method
fn slit_at_mut(sliceL &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
    let len = slice.len();

    assert_eq!(mid <= len);

    // return a tuple with two slices
    // first slice everything upto midpoint, second slice everything after midpoit
    (&mut slice[..mid], &mut slice[mid..])
    //    ^^^^^              ^^^^^
    // error: cannot borrow `*slice` as mutable more than once a time
}

When creating the tuple we're immutably borrowing slice twice int the same scope. The borrow checker doesn't know we're borrowing different parts of the slice.

We know it valid so let's use unsafe block here:

use std::slice;

fn split_at_mut(slice: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
    let len = slice.len();
    let ptr = slice.as_mut_ptr();  // get a raw mutable pointer

    assert_eq!(mid <= len);

    unsafe {
        (
            // unsafe fn; create a new slice taking pointer to data and length
            slice::from_raw_parts_mut(ptr, mid),
            slice::from_raw_parts_mut(
                // ptr.add() return a pointer at a given offset
                ptr.add(mid), len - mid
            )
        )
    }
}

We know that slices are a pointer to some data and the length of that data.

The Slice Type

ptr.add() and slice::from_raw_parts_mut() are unsafe because it expects the poiter passed in to be valid. The function split_at_mut() itself is safe and can be called from safe Rust code.

extern Functions to Call External Code

Sometimes our Rust code may need to interact with code in different language.

For this purpose Rust has extern keyword which facilitates the creation and use of foreign function interface, FFI.

A foreign language interface is a way for a programming language to define a function that another language or a foreing language could call.

extern "C" {
    fn abs(input: i32) -> i32;
}

fn main() {
    unsafe {
        println!("Absolute value of -3 according to C: {}", abs(-3));
    }
}

Here we set up an integration with abs() function from C standard library. Calling a function defined withing an extern block 'cause we don't know the language we're calling into has the same rules and guarantees as Rust.

It's developer responsibility that functions defined an extern block are safe to call.

Inside extern we specify the name and signature of the foreign function we want to call.
The "C" defines which Application Binary Interface or ABI the external function uses. ABI defines how to call the function at the assembly level.

The "C" ABI is the most common ABI and follows the C programming lagnuage API.

We can also allow other languages to call our Rust functions by using the extern keyword in the function signature:

#[no_mangle] annotation is required to let the Rust compiler know not to mangle the name of our function. Mangling is when the compiler changes the name of a function to give it more informatioin for other parts of the compilation process.

#[no_mangle]
pub exter "C" fn call_from_c() {
    println!("Just called a Rust function from C!");
}

Accessing or Modifying Mutable Static Variable

Uptil now we haven't talked about Global variables in Rust; although are supported but can cause problem with Rust ownership rules.

If two threads are accessing the same mutable global state then it could cause a data race.

In Rust global Variables are called Static Variables

// naming convention is to use screaming snakecase with type annotation
// with static lifetime
static HELLO_WORLD: &str = "Hello, world!";

fn main() {
    println!("name is: {}", HELLO_WORLD);
}

Constants and immutable static variables are similar with the difference being static variables have fixed address in memory. Constants are allowed to duplicate their data whenever they are used. Compiler can replace all the ocuurence of constants with concrete value.

Static variables can be mutable but accessing and modifyig mutable static variables is unsafe.

static mut COUNTER: u32 = 0;

fn add_to_count(inc: u32) {
    // modification is unsafe
    unsafe {
        COUNTER += inc;
    }
}

fn main() {
    add_to_count(3);

    // accessing is unsafe
    unsafe {
        println!("COUNTER: {}", COUNTER);
    }
}

Implementing an Unsafe Trait

A trait it unsafe when at least one of it's method is unsafe.

unsafe trait Foo {
    // methods go here
}

unsafe impl Foo for i32 {
    // method implementation go here
}

fn main() {}

Accessing Fields of a Union

A union is similar to struct but only one field is used for each instance. Unions are primarily used to interface with C unions and it's unsafe to access fields of a union because Rust cannot guarantee what the tyoe of data stored in the union is for given instance.

Whe to Use Unsafe Code

Using unsafe isn't wrong or goes against the belief of Rust. But it's a developer responsibility to know what he/she is doing.