Rust in Science and ever-changing requirements

I have heard many times over that for a given proof-of-concept if you have fast, changing requirements, then you are better off with a Dynamic programming language like Python. Python gives the illusion of faster development because you do not have to think about the rigidity of the Type system as much. Hence, it makes these dynamic languages good for prototyping and creating proofs-of-concept.

However, in this essay, I am trying to dump some thoughts about Rust usage in scientific computation, its benefits, and generic chatter in the community. I think using Rust has an advantage in an ever-mutating environment like research and I think even for prototyping, Rust can be much more beneficial than a language like Python.

Introduction

I am very much in a love affair with Rust. This is not going to come to you as a surprise at all if you follow me on Twitter. Part of the reason is that I work as a software writer and general yak shaver for science-(genomics)-adjacent work and I appreciate what Rust has to offer in terms of the ecosystem, safety, and speed. I'd argue that a lot of work that is usually written in C/C++, Python, Perl, etc. could be replaced by Rust.

Perhaps I do not need to spend a lot of effort to convince you that Rust can replace some C/C++ codebase more easily because Rust is a Systems programming language. But for proofs-of-concept, you ask, how can Rust be better than, say Python? Isn't it painful to change the code, which you do in early prototyping when you have to use a Static language[1]?

The case of Python

When I started in the sciences as a programmer, reading Perl was always an issue and the choice that I had to make was clear to me — I chose Python. Perl was slowly being displaced by Python anyway. So, I will give you some anecdotal comparisons from my experiences.

You might think that I am comparing apples to oranges and you are absolutely correct. This is an opinion blog post! I am choosing Python also because it is touted as the language which makes life easier to work with given ever-changing requirements.

Ever-changing requirements

One issue that I always hear is the idea of “refactoring” and/or “changing course”. Computation in sciences, they say, is a lot of trial and error. I find this to be true! I have worked in novel projects where even the stakeholders (scientists) were learning while we were building the projects. There is a lot of backtracking and honestly a lot of your "software engineering" breaks down.

For this article, I am going to focus on some attributes of languages that instill confidence in making the changes so that you can iterate. I believe the following attributes are necessary to work with constantly changing requirements —

  1. Readability
  2. Testability
  3. Feedback
  4. Toolchain

Readability

Syntax

The clear advantage of Python is its syntactic readability if you are considerate of that while writing (but it does not always pan out!). This means that it is easier to keep the code in your head to make the model, while you are making changes. Rust can be hard to read because it does not fit the imperative, object-oriented style mental model we have with languages like Python. Rust’s ownership model and typing definitely have some learning curve.

I have taught Git and Python to my colleagues and in many ways, they — and I — have had to build new mental models for these tools as well, especially if they are already familiar with C++, Perl, and SVN, CVS. Learning new mental models is what we do all the time. Still, I agree that this requires a lot of effort on the programmer’s part. This is especially true for an experienced programmer.

As Esteban says in his RustConf 2020 talk — Bending the Curve: A Personal Tutor at Your Fingertips

Rust has a curse, it has many but this one is critical — inefficient code is generally visible. Experienced developers hate to notice that their code is inefficient.

Jumping around the codebase

Rust is statically typed, empowers your editor or Integrated Development Environment (IDE) to link the symbols in your code for easy access. A function defined somewhere in a struct can be easily found from where it is called. This makes looking up code for dependencies much easier and faster.

The only tool in Python that brings us any closer to this style of working is JetBrains PyCharm. And even this IDE for Python fails to look up symbols if you mess up your virtualenv or fail to register it with your PyCharm project. You can use typing to annotate your code with the types but as you can see that there is a warning at the top of the typing documentation -

Note: The Python runtime does not enforce function and variable type annotations. They can be used by third-party tools such as type checkers, IDEs, linters, etc.

With Rust, you can survive with just the compiler because these types are a part of the language. Add Rust Langauge Server (RLS) to that and you can comfortably navigate your codebase.

Docs!

Since Rust is statically typed, you can see what is the expected type of each function argument or what is the type of the value that the function returns. Its type system becomes part of the documentation! Whereas in Python you have to use external tools and depend on function annotations to generate documentation, Rust brings the documentation to you via rustdoc.

For example, please compare

def chillin(name, place):
    """Informs what's the user's chillin' number

    @param name: string, name of the user
    @param palce: string, where the user is chillin'
    @return: int, information
    """
    result = 0
    if place == "toronto" {
        result = 100
    }
    return result

with

/// Create an alias to integer type
/// Now it is much clearer what the return type 
///  of the function stands for
type ChillinNum = u32;

/// Informs what's the user's chillin' number
/// * `name`: name of the user
/// * `place`: where the user is chillin'
fn chillin(name: String, place: &str) -> ChillinNum {
    match place {
        "toronto" => return 100,
        _ => return 0,
    }
}

It is a tiny example and while Rust's syntax is denser, it also provides clarity on what the elements stand for. Here, ChillinNum is the type of the return value. We could have just used u32 but using ChillinNum is clearer. If you run cargo doc in your codebase, Rust compiler will take all these triple-slash /// comments and generate nice documentation for you. You can see the documentation of the crate eyre and the source of the crate eyre that generates it.

Testability

Python has a ton of testing frameworks and Rust is slowly catching up. A clear advantage in favor of Python. However, the things you are testing also dictate how much confidence you have in your codebase. With the type system in Rust, you do not have to test for certain cases that in Python you’d have to. This means there are certain test cases that you do not need to write if you’re building your project in Rust. To me, this makes the iteration faster with more confidence in the code I am writing.

As Esteban puts it -

The reduced need for tests is because of using patterns that leverage the type system to completely eliminate the representability of an invalid state.

Take Rust's match pattern, as an example. The following code will fail to compile because we failed to provide a catch-all -

/// Create an alias to integer type
/// Now it is much clearer what the return type 
///  of the function stands for
type ChillinNum = u32;

/// Informs what's the user's chillin' number
/// * `name`: name of the user
/// * `place`: where the user is chillin'
fn chillin(name: String, place: &str) -> ChillinNum {
    match place {
        "toronto" => return 100,
    }
}

Previously, we had provided the default/catch-all case _ => return 0,. Similarly, the Rust compiler will complain if you are not covering all the variants of a Rust enum in your match pattern. This helps the programmer in considering all the paths and cases. The compiler will show an error like this -

error[E0004]: non-exhaustive patterns: `&_` not covered
  --> src/main.rs:12:11
   |
12 |     match place {
   |           ^^^^^ pattern `&_` not covered
   |
   = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
   = note: the matched value is of type `&str`

I do not enjoy writing tests and eating my veggies but if I could eat fewer veggies to achieve the same level of confidence in my changes, I’d always choose that.

This also brings me to the point of designing your project. Just because your project is in alpha/beta/whateva, does not mean that you are not allowed to think about your data structures, your types before the implementation. A little bit of thought in the structure of your codebase goes a long way, especially when you have to change it. Once again, listen to Esteban -

Thinking about the API surface first and then changing the internal logic and in-memory representation to make it faster/more efficient is much easier in type-safe languages than in dynamic languages, and more so in Rust simply because more things are represented in the type system than is customary in other languages. This makes "iteratively fixing the compiler errors" a valid refactoring strategy.

These points, to me, sound like Rust has more advantages than Python if we count Testing as a property of confidence in the changes we are making.

Feedback

Even though Rust has its new paradigm issues[2] when it comes to learning the language, I still feel that Rust compiler emits some of the most helpful error messages and help messages. The safety features are built right into the compiler and the team has done a fantastic job so far in making messages ergonomic. For example, in Rust, the array out of bounds error looks like this -

let x: [u8; 3] = [10; 3];    
println!("{:?}", x[4]);
error: this operation will panic at runtime
  --> src/main.rs:23:22
   |
23 |     println!("{:?}", x[4]);
   |                      ^^^^^ index out of bounds: the len is 3 but the index is 4
   |
   = note: `#[deny(unconditional_panic)]` on by default

For Python, this looks like this -

x = [10,10,10]
print(x[4])
    File "app.py", line 3, in <module>
        print(x[4])
IndexError: list index out of range

We've already seen an error message above for missing catch-all but let me show that again -

error[E0004]: non-exhaustive patterns: `&_` not covered
  --> src/main.rs:12:11
   |
12 |     match place {
   |           ^^^^^ pattern `&_` not covered
   |
   = help: ensure that all possible cases are being handled, possibly by adding wildcards or more match arms
   = note: the matched value is of type `&str`

Side note: It is amazing that languages can and do learn from each other and copy parts that seem helpful for the users. One recent exciting case is New safety rules in C++ Core Check. I am very happy to have other languages learn, as Rust has from what came before.

I hope you can appreciate the effort being put into the compiler and its error messages. Once again, Esteban’s talk is a fantastic watch, where he talks about the Rust compiler as a tutor.

When we are emitting diagnostic errors, it is the perfect place and moment to teach people [that] they have made a mistake and we can explain to them why they made it.

I will embed the talk here for you —

Python, on the other hand, had me stuck for a whole day because it complained somewhere I was scripting NoneType. If you care, this is the pet peeve that started this blog post.

Toolchain

In the Python world, the packaging and environment setup is still something that makes me sad. The closest I have come to create some sanity in my life is to use direnv and shell.nix in my repositories. This sucks because this is a solution that is at the OS (NixOS) level[3]. Indeed there are tools available for auto-loading environments, and I am thankful for that. However, Python toolset feels like a moving target. Years ago, it was just setup.py and today it is setup.cfg, pyproject.toml as well but the latter two do not support all the features so you end up having setup.py in your codebase. Here, let Tall, Snarky Canadian explain it to you.

On the Rust side though, the ease of use of cargo and rustup has not been beaten for me, so far. The Rust ecosystem is young and has the advantage of learning from other platforms.

Something else that made me happy about Rust is the promise of backward compatibility. Most of us have been burnt with Python 2 to 3 move although I am in awe of the Python community that undertook such a massive effort and made it happen.

The biggest disadvantage in Python that I see is that if I move a module out of a package, it is much harder for my editor to figure out the details of my refactoring. PyCharm has made my life much easier but there are cases where it fails because it can do only so much with a dynamic language. This is obvious because Python is a dynamic language. My good friend, Esteban once again nails it -

The kind of dynamism that Python has, where almost everything you write is valid code, can be a behavior you did not want, makes it harder to write a Python interpreter that helps you get back on track. Rust leverages its rigid syntax to simultaneously keep the language simpler than it otherwise would be and to make it easier to figure out what the user was actually intending to do.

While this does allow us to write code without thinking too much about types, it does not give us a lot of confidence when we are changing the code. This means that one can likely write much faster in Python, but to change the code for new requirements is much harder because it is likely to be error-prone.

Not all is ready, yet

I do not think that a piece should be entirely replaced, just because it can be. Some tools being used are more mature than Rust is at the moment. However, there are clear benefits to be had. I am happy to report that I am not the only one who feels this way in (some of) the community.

There is some way to go with the data structures and libraries etc. but I think it is a chicken-and-egg problem and as the adoption increases there will be more libraries available. The best part is that you can do piecemeal work by delegating parts of your codebase to Rust using FFI. Let Luiz explain it to you how he works in his research with Rust and Python!

Static for dynamicity

This is where I would conclude that if you are starting a journey and are sure that the things will change many times over, you may be better off with a language that gives you

I also conclude that you could leverage Rust to replace parts of your project that require more safety and speed.

I enjoy writing Python because of its syntax but I keep missing the Rust compiler's help every day.

Once again, this article would not have been possible without the help and guidance of the following friends -

[1] I am sorry but I am taking a generic leeway to combine Dynamic languages with Dynamically typed languages and Static languages with Statically typed languages

[2] They are rather non-mainstream because Rust borrows almost everything from other languages but you get the idea

[3] If you want to know how to manage your work environment for a project using NixOS and direnv, you can read my small blog post about Environment management with nix-shell.