An attempt to understand HRTB, Variance, Lifetimes, and Generics
—
This blog post assumes you are familiar with Rust’s generic types, ownership/borrowership system, and lifetime annotations. 🙏
A signature you may have seen
If you’ve worked with reqwest and serde, and had a custom send() function to do your data munging or convert to your own error variants, you must have encountered something called Higher-Ranked Trait Bounds (HRTBs). So, in this snippet the <T: for<'a> Deserialize<'a>> part.
pub async fn send<T: for<'a> Deserialize<'a>>(
req: RequestBuilder, // reqwest::RequestBuilder
) -> Result<Option<T>, MyError> {
// ...
}This is a generic function so various API endpoints with different return types can use this. The intention is to asynchronously send an HTTP request and parse the response body as Some(T) or return None if there's no body, or (sigh) MyError on failure. Though the interesting part is the trait bound: T: for<'a> Deserialize<'a>. The rest of this post is my rabbit hole of that trait bound.
Background: why Serde Deserialize has a lifetime
We know that Serde's Deserialize has a lifetime.
pub trait Deserialize<'de>: Sized {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>;
}Here 'de is the lifetime of the input data being deserialized from. However, some types would like to directly borrow instead of create a local copy. In Rust parlance, we call this "zero-copy deserialization". An example of this is serde_json::from_str::<&str>(s) returns a &str that points into s rather than allocating a new String. This is why that lifetime annotation on Deserialize exists.
A few more examples to make it clearer:
- This is Owned, it copies bytes out of the input and works for any
'de.impl<'de> Deserialize<'de> for String { ... } impl<'de> Deserialize<'de> for u64 { ... } - This is Borrowed, it holds a
&strpointing into the input buffer. Only works when'deoutlives or matches the resulting reference's lifetime.impl<'de> Deserialize<'de> for &'de str { ... }
What for<'a> actually says
Many important people have given explanations of this and I have noted some in the References section of this post below.
for<'a> is sort of a universal quantifier which reads as:
For all lifetimes
'a,T: Trait<'a>
Compare the following two ways to define lifetime annotations:
TimplsDeserializefor one specific'afn foo<'a, T: Deserialize<'a>>(...)TimplsDeserializefor every'afn bar<T: for<'a> Deserialize<'a>>(...)
Another look at send()
Now, say in send() function, the response body has a temporary buffer owned by the function itself.
// Lives until end of function
let bytes = response.bytes().await?;
// Borrows from `bytes`
let value: T = serde_json::from_slice(&bytes)?;And say, that the lifetime of &bytes is some short, anonymous lifetime (let's call it 'short). The compiler needs T: Deserialize<'short> but 'short is not nameable in the send() function signature because it is internal to the that function. The signature needs to convey to the borrow checker that this should work for whatever lifetime turns up at my doorsteps. That message is the universal quantifier.
In fact, Serde has a blanket sub trait for Deserialize, called DeserializeOwned, to specify this.
impl<T> DeserializeOwned for T where T: for<'de> Deserialize<'de> {}Explanation from the caller vs callee angle
I may ruffle some pedantic feathers in this section but bear with me please. Another way to look at the same problem is looking at caller vs callee. When a function has a generic parameter, someone decides what concrete value it will take. We know that for type parameters it is the caller.
fn id<T>(x: T) -> T { x }
id::<u32>(5); // Caller picks T = u32
id::<String>(s); // Caller picks T = StringBut it is also true for the lifetime parameters.
fn first<'a>(s: &'a str) -> &'a str { s }
let s = String::from("hello");
let r = first(&s); // 'a inferred as the lifetime of s
let r2 = first("hello"); // 'a inferred as 'staticHowever, callee’s body potentially creates values with their own lifetimes that the caller has no knowledge of.
pub async fn send<T: ???>(req: RequestBuilder) // ignore ??? for now
-> Result<Option<T>, MyError> {
let response = req.send().await?;
let bytes = response.bytes().await?; // bytes is created here
let value: T = serde_json::from_slice(&bytes)?;
Ok(Some(value))
} // bytes is cleaned up hereIn the code above, bytes exists only inside this function. Let’s call the &bytes (borrow) lifetime annotation 'body. This lifetime has the following properties
- It starts after the
awaitreturns. - It ends at the closing brace.
- It has no name the caller could possibly write down.
You can see that naming 'a in the signature doesn't help because it is now chosen by the caller. Lifetimes in function generics are inferred. Say, the caller picked 'a = 'static as below, which is not a valid syntax.
pub async fn send<'a, T: Deserialize<'a>>(req: RequestBuilder) -> ...
// Somewhere at the call site; not valid syntax
let result = send::<'static, MyStruct>(req).await?;Inside the function the compiler needs T: Deserialize<'body>, but the caller promised T: Deserialize<'static>. That won't work because 'body is not 'static (although body could be ecstatic!). Also note that the caller cannot name 'body either.
Given the example above, let’s revisit the question: What HRTBs let the callee do?
pub async fn send<T: for<'a> Deserialize<'a>>(
req: RequestBuilder, // reqwest::RequestBuilder
) -> Result<Option<T>, MyError> {
// ...
}
let v: Option<T> = send(req).await?;In my mind, the conversation while compiling the following code, is happening this way:
- Compiler/checker : I need
T: Deserialize<'body>. - Bound : My friend,
T: Deserialize<'a> for all 'ais totally happening. - Compiler : Lovely,
T: Deserialize<'body>works for me, have a nice day.
The body of the callee instantiates the universal quantifier for<'a> with whatever lifetime it actually has.
Note: "The callee picks" is a useful shorthand, but it’s not what actually happens. When compiler sees
send(req).await?call, it might ask “DoesT: Deserialize<'body>work for us?”. The HRTB saysTworks for all lifetimes, so'bodyis covered and the type checker concludes the proof at the specific use site inside the body.
Think of the bound as a rule the caller hands to the function:
T: Deserialize<'a>with'ain signature: T works for this one lifetime, which I'll tell you when I call you.T: for<'a> Deserialize<'a>: T works for any lifetime. Use whichever one you need.
Other examples of HRTBs I have seen in the wild
From what I understand, every HRTB use case reduces to "someone needs to instantiate the bound with a lifetime that wasn't fixed at the call site”. That abstract description shows up in several recognizable patterns as far as I have found.
Closures called with borrows of callee-internal locals
fn for_each_word<F: for<'a> Fn(&'a str)>(text: String, f: F) {
for word in text.split_whitespace() {
// `word` borrows out of `text`, which is local to this function.
// Its lifetime is internal, the caller has no way to name it.
f(word);
}
}Without HRTB, the regular <'a, F: Fn(&'a str)> would force the caller to pick a single 'a for the closure's parameter. When text is owned (String) rather than a reference, the function's local borrow of it, i.e. text.split_whitespace(), produces words with a lifetime tied to the local text binding inside the function. That lifetime has no name the caller can provide. If text were a &str from the caller, the caller-picks form would actually work, i.e. every word would share the caller's lifetime. As the cool kids say, HRTB becomes load-bearing once the callee owns the data being borrowed.
Closures whose return lifetime ties to their input
fn apply<F>(f: F) -> String
where
F: for<'a> Fn(&'a str) -> &'a str,
{
let s = String::from("hello world");
let trimmed = f(&s);
trimmed.to_owned()
}The relationship between input and output lifetimes, i.e. the output borrow lives as long as the input, is expressed by the Fn(&'a str) -> &'a str, with the same 'a appearing in both positions. That would hold whether 'a is introduced with for<'a> or as a named parameter <'a, F: ...>. Here s is local, so its lifetime is unnameable from outside. Without for<'a>, a caller-named ’a couldn't satisfy it.
Storing closures in structs / trait objects
struct Parser {
callback: Box<dyn for<'a> Fn(&'a str) -> Token<'a>>,
}A boxed closure has no caller-visible lifetime parameters, you can't pick 'a "at construction time" because the same stored closure is invoked many times with different inputs. HRTB is the only way to express "this stored callable accepts any borrow lifetime."
Scoped APIs that create lifetimes the caller cannot pre-name
The signature of std::thread::scope looks like this
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T,'scope is invented by the scope function itself it represents "the duration of this particular scope call," which the caller has no way to write. HRTB lets scope hand the closure a Scope whose lifetime is brand-new, internal, and unnameable from outside. If the caller could name ’scope and pick it, it could potentially choose a lifetime that outlives the actual scope block, which would allow spawned threads to keep running after thread::scope returns. That is a potential use-after-free on the borrowed environment data.
Trait method bounds where the impl, not the trait, is generic
trait Handler {
fn handle<'a>(&self, req: &'a Request) -> &'a Response;
}The trait method itself is generic over 'a so every impl Handler must work for all lifetimes. Turning this into a dyn Handler makes the trait object's signature effectively for<'a> .... HRTBs and "generic methods on traits" are two surfaces of the same mechanism.
Note: for a trait to be “dyn compatible” (dyn compatibility was called object-safety in older versions of Rust), its methods cannot have generic type parameters, but they can have generic lifetime parameters (HRTBs).
Variance / soundness work
Occasionally HRTBs are used not because the lifetime is unnameable but to force a lifetime-polymorphism property most commonly the "branded lifetime" trick (next section).
The unifying principle
HRTBs exist whenever a bound must hold for a lifetime that isn't a parameter of the surrounding signature. The lifetime is quantified separately from the function's own generic parameter list. Sources of such lifetimes can be:
- Lifetimes internal to the function body (Serde case).
- Lifetimes created freshly per-call (iteration borrows, scope handles).
- Lifetimes invented by the API itself (
thread::scope). - Lifetimes that vary across many uses of a stored value (boxed closures, trait objects).
The branded-lifetime / ghost-cell trick - combination with invariance
So far for<'a> has been useful because it lets code work with lifetimes it can't name. But HRTBs can also be flipped around. Instead of expressing "works for any lifetime that turns up" you can use them to invent a lifetime that the caller can't interfere with. Combine that with invariance, the property that stops the compiler from silently treating two different lifetimes as the same, you get a compile-time identity tag. Each call to a function gets its own unforgeable brand, distinguishable from every other call's brand, at zero runtime cost.
use std::marker::PhantomData;
// Invariant in `'id`, the `*mut` prevents subtyping from
// shrinking/growing `'id`.
pub struct Brand<'id> {
_marker: PhantomData<*mut &'id ()>,
}
pub fn with_brand<R>(f: impl for<'id> FnOnce(Brand<'id>) -> R)
-> R {
f(Brand { _marker: PhantomData })
}Each call to with_brand produces a Brand<'id> whose 'id is provably distinct from every other call's 'id at compile time, with zero runtime cost. You can attach this brand to other types to tie them together.
pub struct Ticket<'id> {
value: u32,
_brand: PhantomData<Brand<'id>>, // see note below
}Note on the
PhantomDatachoice:PhantomData<Brand<'id>>andPhantomData<*mut &'id ()>are equivalent for variance both makeTicketinvariant in'id, becauseBrand<'id>already wraps*mut &'id ()and the invariance is inherited. UsingBrand<'id>is a readability choice: it keeps the connection betweenTicketand its issuingBrandvisible in the type definition.
impl<'id> Brand<'id> {
pub fn issue(&self, v: u32) -> Ticket<'id> {
Ticket { value: v, _brand: PhantomData }
}
pub fn redeem(&self, t: Ticket<'id>) -> u32 { t.value }
}Works:
with_brand(|brand| {
let t = brand.issue(42);
brand.redeem(t) // same 'id
});Fails to compile:
with_brand(|outer| {
let t = outer.issue(42);
with_brand(|inner| {
inner.redeem(t) // error: t has outer's 'id, inner expects its own
})
});Where the HRTB comes in, again
for<'id>forces'idto be instantiated freshly insidewith_brand, not chosen by the caller. If the bound were the non-HRTB form like,with_brand<'id, F, R>(f: F) -> R where F: FnOnce(Brand<'id>) -> R, a caller could write:
and reuse the same// Hypothetical only compiles against the non-HRTB signature above. fn cheat<'id>() -> Ticket<'id> { with_brand::<'id, _, _>(|b| b.issue(99)) }'idacross multiple invocations, defeating the brand. The real, HRTB-boundwith_brandhas no'idslot in its turbofish, so the caller has nothing to forge. To be precise, you can turbofish the function's own generic parameters theRinwith_brand::<MyR>(...)but'idis not one of them. It lives inside thefor<'id>on the closure parameter's bound, not in the function's<...>list. Lifetimes inside afor<>are quantified separately and are not reachable from the call site's turbofish. There is no syntax to pin them to a caller-chosen lifetime.- Invariance (
PhantomData<*mut &'id ()>) prevents the compiler from silently equating two different'ids by lifetime subtyping.*mutis invariant because both reading and writing directions to be safe simultaneously you need the exact lifetime. So invariance is the only sound choice, because no substitution is permitted.
But the bottomline is that neither alone is sufficient. HRTB ensures the lifetime is fresh per call and invariance ensures it stays distinguishable from other lifetimes.
The unifying takeaway
Deserialize<'de>carries a lifetime because some types want zero-copy borrows from the input.for<'a> Deserialize<'a>says "T works for any input lifetime”, which is exactly the property a function needs when it deserializes from a buffer it owns internally.- HRTBs in general exist to bind lifetimes that aren't parameters of the surrounding signature.
- Branded lifetimes combine HRTB freshness with invariance to produce compile-time identity tags. HRTB makes each call's tag fresh; invariance stops subtyping from collapsing different tags into one.
HRTBs aren't a special feature, they're the same generic mechanism applied to a lifetime that the surrounding generics can't bind. Variance is the partner mechanism that decides whether the compiler is allowed to silently shift a bound lifetime once it is fixed.
Please report any errata or mistakes in my understanding at web@amanjeev.com.
References
- The Rust Reference — Higher-Ranked Trait Bounds
- The Rustonomicon — Subtyping and Variance
- The Rustonomicon — PhantomData
std::marker::PhantomDatastd::thread::scopestd::cell::Cellstd::cell::RefCelldyn compatibility- Learning Rust - Higher-ranked types - Quinedot
- Serde — Understanding deserializer lifetimes
serde::Deserializeserde::de::DeserializeOwnedserde_json::from_sliceserde_json::from_strreqwest::RequestBuilderghost-cellcrate — separating permissions from data with branded lifetimes.qcellcrate —TCell,QCell,LCellvariants on the same idea.generativitycrate — exposes the brand pattern as a primitive (make_guard!).indexingcrate — bounds-check elision via branded indices.- Yanovski, J., Dang, H.-H., Jung, R., Dreyer, D. (2021). GhostCell: Separating Permissions from Data in Rust. Proc. ACM Program. Lang. 5(ICFP)