Extending Django with Rust

Introduction

We use Rust extensions to Python in our Django app. This blog post explains some of our motivations and methodology.

Motivation

At Rapid Rēhita, we have to take in raw data about patients which varies hugely, depending on a variety of factors: the specific questions practices choose to ask, the details of the patient and more, such as information pre-provided in enrolment links. Then we need to shift this information into a wide variety of formats: these include rendered as a PDF or TIFF, sometimes with a cover sheet; as anonymised information to track enrolments; and into the right format for various external APIs.

Although our main site is built using Django and Python, these transformations are harder to do when not using a typesafe language, given the sheer number of possibilities in a given form. Static analysis helps here, but still has limitations. We find it helpful, then, to convert completed user forms from Python classes into Rust structs which can then be transformed in a typesafe way into our various output formats using helpful features like enums and match statements.

As well as this primary use, the greater control Rust gives is useful in other ways. For example, we need to let users search addresses. Our initial implementation stored addresses in the database and used a full text search but was just too slow. Using Rust, we have produced a much more elegant solution in which a build script ingests addresses to produce mmap-able data which allows searching using an inverted index to get sorted streams of integers representing results. These can then be intersected or united efficiently, dramatically reducing our address search time.

One obvious method to use Rust like this would just be to write various microservices and communicate with them from Django. This would certainly work; but it also adds latency and deployment and test complexity. Using extensions instead allows an easier experience, and is especially useful when we want to escape the GIL but have a function too small to justify the added complexity of a call to a microservice. As a result our code base is about two-thirds Python and one-third Rust extensions. This has been surprisingly easy to achieve, thanks to the quality of tooling available.

Implementation

How do we do it? It's almost embarassingly easy thanks to PyO3 and Maturin.

These do all the work with a handful of macros.

Briefly, we simply have a separate directory of rusty extensions in our main code base at the same level as the directory of our django code. In other words, the file structure at the repository root might look like /django_directory, /rusty_extensions. (We'll use these names in the following example.)

The interface to these extensions is a struct with a single instance of a Tokio runtime, and various associated methods. This struct and its methods are then exposed as a Python class. (We keep a single class rather than exposing each method individually as a function in order to allow reusing the runtime across multiple calls.)

A simplified version might look like this:

use pyo3::exceptions::{PyException, PyIOError};
use pyo3::prelude::*;
use pyo3::types::PyBytes;
use tokio::runtime::Runtime;
mod expensive_mod;

#[pyclass]
struct RustyExtension {
    runtime: Runtime,
}

#[pymethods]
impl RustyExtension {
    #[new]
    fn new() -> Self {
    	  // one nice feature is allowing logging in Python from Rust: this provides support for that.
				// For it to work, you need to add `pyo3-log` to your `Cargo.toml`.
        pyo3_log::init();
				
        let runtime = tokio::runtime::Runtime::new().expect("Error creating runtime");
        let _guard = runtime.enter();
        RustyExtension { runtime }
    }

    /// This is a simple method intended to demonstrate a basic call
    fn uppercase_str(&self, val: &str) -> String {
    	val.to_uppercase()
    }

    /// This method has the advantage of releasing
    /// the GIL; imagine
    /// `expensive_method::expensively` is an async
    /// function that involves some
    /// form of IO operation
    fn do_something_slow(&self, py: Python, val: &str) -> PyResult<String> {
    	let result = py.allow_threads(|| {
    		self.runtime.block_on(
    		    expensive_mod::expensively(val))
    		)
    	});
    	// let's imagine this could fail on an IO
    	// error; this can easily be transformed
    	// into a Python exception to be handled
    	// by the caller.
    	let unerrored_result = result.map_err(|e| PyIOError::new_err(format!("{:?}", e)))?;
    	Ok(unerrored_result)
    }
}

In our Python code, we have a separate interface file that splits our single object into typed Python functions. This lets them be easily mocked out for testing and so on, as well as allowing available functions to be seen without having to look at the Rust code --- important in a polyglot codebase with different programmers with different specialties. For the example above, it might look like:

# this dual import will be explained shortly
try:
    from rusty_extensions import RustyExtension
except Exception:
    from packages.rusty_extensions import RustyExtension

EXTENSIONS_SINGLETON = RustyExtension()

def uppercase_str(s: str) -> str:
	result = EXTENSIONS_SINGLETON.uppercase_str(str)
	# the below could be skipped, since
	# we know that this is a str,
	# but a typecheck is very cheap
	# and reassures static analysis
	if isinstance(result, str):
		return result
	raise Exception("Type error")

def do_something_slow(s: str) -> str:
	try:
		return EXTENSIONS_SINGLETON.do_something_slow(str)
	except IOError:
		# handle this somehow
		pass

So how do we get from this Rust code to a working Python extension?

In development, it's as simple as running maturin develop in the Rust directory while in our development virtualenv. This allows the first import method above (from rusty_extensions import RustyExtension).

Our release code is slightly more complicated, but not very. We use Github Actions to build and deploy. A simplified fragment of the relevant configuration file is:

jobs:
  deploy:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: django_directory
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
	  - uses: actions-rs/toolchain@v1
	    with:
	      toolchain: stable
	      override: true
	  - uses: messense/maturin-action@v1
	    with:
	      maturin-version: latest
	      command: build
	      args: --release --strip -m rusty_extensions/Cargo.toml --out dist
	      manylinux: auto
	  - uses: actions/setup-python@v2
	    with:
	      python-version: '3.9'
	      architecture: x64
	  - name: Install rust extensions
	    run: |
	      pip3 install rusty_extensions -t ./packages/ --find-links ../dist
	      touch ./packages/__init__.py

This uses Maturin to build a library and place it in dist. Then we install it with pip and place it in packages in django_directory. From there it can be accessed by Python using from packages.rusty_extensions import RustyExtension. This means the site can be deployed in whatever way you like using django_directory as the self-contained source.

And that's all there is to it!

Conclusion

We've been very happy with this as a solution, and would strongly recommend it to others dealing with complex business logic on the backend, especially when that logic involves complex serialization and deserialization. Maturin and PyO3 are very, well, mature for their age and have been extremely easy to use effectively; this has been helped by their excellent and detailed documentation.

Our only warning note is the cost of marshalling between PyObjects and the native types used by Rust. This isn't huge, especially when dealing with something like a str, but is sufficient to make it desirable to do as much in a single function as possible, rather than regularly converting back and forth.

About Rapid Rēhita

Rapid Rēhita provide technological services to New Zealand general practices, especially by helping with online enrolments.

This is a blog about interesting information we've discovered along the way.

See all posts

Recent Posts