A library that empowers developers to painlessly build high-performance parallel applications.
Introduction
Defining complex computational workflows is a tedious, complex, and often error-prone task. These difficulties are only heightened when parallelism — simultaneous execution of different processes within a program — becomes involved. Race conditions, deadlocks, atomicity violations… entire classes of bugs that are difficult to catch, reproduce, and diagnose arise once programs make the leap from sequential to parallel execution.
dagger is built to shift this paradigm entirely, by enabling programmers to write simple, boilerplate-free code that looks sequential, but executes in parallel. Let’s begin with a simplified example:
Typically, a program written in this format would execute sequentially; that is, the code would execute top to bottom, similar to how one would read a book. As such, the program’s execution could be visualized as
since this is the order in which the processes are defined. However, dagger resolves the execution of the program with an entirely novel approach. At compile-time, dagger analyzes how each process depends on other processes, then intelligently re-organizes and parallelizes them accordingly. If we were to take the above code and put it in a dagger! block:
dagger! { a :: { /* ... */ }; b :: { /* code that depends on the value of a */ }; c :: { /* code that depends on the value of a */ }; d :: { /* code that depends on the values of b and c */ };}
the program’s execution would look very different:
Here, dagger has resolved how the program should be executed into a directed acyclic graph (also known as a DAG, hence the name dagger!). Since the processes b and c both depend on a in some manner, they must execute aftera has finished; since d depends on b and c, it must execute after both b and c have terminated. The order of definition is now entirely irrelevant; if the above code defined d first, then c, b, and a, dagger would still construct the exact same execution DAG.
This document explores how dagger works internally, why the “execute as a DAG” mentality is so powerful, and the potential of this library in powering the foundational technologies which advance synthetic biology.
Key highlights
dagger is the core library that powers miso, our hardware control framework, and maestro, our workflow executor
miso powered the bioreactors which enabled us to optimize cell culture conditions in Wet Lab
maestro powered the modelling workflows which enabled us to compare and select carbonic anhydrase candidates in silico
dagger enables computational scientists in iGEM and beyond to develop powerful, efficient software tools with greater ease and correctness
An (exceedingly) brief introduction to programming and compilers
Programming, at its core, involves defining a series of instructions that a machine then executes. Specifically, these instructions are interpreted by and executed on a specialized hardware device called the central processing unit (CPU), which creates and modifies data according to a specific accepted set of instructions referred to as machine code. In the early 20th century, computer programs were typically written directly in machine code. However, this practice presents numerous drawbacks: not only is machine code specific to a CPU’s architecture, making it difficult to port programs to different devices, the hardware-oriented nature of the language limits the types of programs that can be built.
In programming, this concept is typically referred to as abstraction. Less abstract (“lower-level”) languages map closer to direct hardware instructions, providing the programmer greater control over how their program executes. More abstract (“higher-level”) languages are less bound to their hardware implementation, making it easier to express complex mathematical, logical, or analytical concepts. If we were to imagine writing a program for a human raising their hand, a low-level language might define exactly which neurons need to fire to contract the necessary muscles to raise the hand. We could imagine the pseudocode for this imaginary language as
let target_neuron_1 = motor_cortex[1923812392];fire_neuron(target_neuron_1);let target_neuron_2 = motor_cortex[9223492389];fire_neuron(target_neuron_2);wait_microseconds(5);let target_neuron_3 = motor_cortex[138292394];fire_neuron(target_neuron_3);// ... and many more instructions
This would provide great control over exactly how the hand is raised, the position of the fingers, etc., since we are directly controlling specific neurons in the motor cortex; however, describing an action as simple as raising a hand is extraordinarily complex. A higher-level language, on the other hand, might allow the programmer to simply state
enabling us to write a program to raise the right hand in a single line of code! The abstractions provided by such a language would be far better suited to describe a complex series of body movements, like a dance routine.
In the 1950s, higher-level languages were developed, decreasing the barrier of entry into programming and enabling the development of complex programs for scientific analysis, mathematics, business, and more. FORTRAN, for instance, was a language developed by IBM for scientific and numeric computing; COBOL (Common Business-Oriented Language) was developed for use in business, finance, and administrative systems; and the LISP family of languages was created as a mathematical notation for computer programs, derived from Alonzo Church’s lambda calculus.
These programming languages, and all languages since, are implemented via a program that translates from the source language into machine code (which a computer’s hardware can directly understand and execute). If this translation is designed to occur before the code is executed, this program is referred to as a compiler; if the translation is designed to occur while the code executes, the program is referred to as an interpreter. Although interpreted languages present certain advantages (the exploration of which lie beyond the scope of this document), compiled languages present the significant advantage of vastly improved performance (here, we shall define performance as the time required for a computer to execute a specific action).
This performance improvement is in part due to the fact that compilers run before program execution and, as such, users can directly run the machine code output rather than have an interpreter translate on-the-fly. When you, dear reader, opened your browser of choice to wade through these ramblings, what you really did was execute some machine code that was previously compiled from a different language. However, compilers are also performant because they are designed to analyze the meaning of the input code (typically referred to as semantic analysis) and perform optimizations to improve the performance of the output machine code.
For instance, if the input code contains the expression 5 + (3 - 2), the compiler would likely resolve this into simply 6 rather than have the program compute the value when it runs. The previous example is a very simple optimization, and modern compilers contain a slew of far more complex optimizations; in fact, it is common practice for compilers to not translate directly from the input language into machine code, but rather transform the input through multiple “intermediate” languages, where new optimizations are run on each intermediate representation. For instance, rustc — the compiler for the Rust programming language — executes the following translations:
INPUT: Rust source
Token parsing
Abstract Syntax Tree construction ← dagger executes here!
High-Level Intermediate Representation
Typed High-Level Intermediate Representation
Mid-level Intermediate Representation
Codegen backend: LLVM IR/Cranelift IR/…
OUTPUT: Machine code
Each layer of a compiler runs checks and optimizations, then “lowers” into the next intermediate representation. dagger is no different; it verifies the structure of its input, builds a DAG representation of program execution, then generates lower-level Rust code. It runs at the AST construction level, by taking in a stream of tokens, analyzing it, then emitting a modified stream containing the rearranged and parallelized code.
Here, a token is defined as a discrete unit of code. For instance, the following code:
let name = || { return "Bob";};
is represented by the following token stream:
TokenStream [ Ident { ident: "let" }, Ident { ident: "name" }, Punct { ch: '=' }, Punct { ch: '|' }, Punct { ch: '|' }, Group { delimiter: Brace, stream: TokenStream [ Ident { ident: "return" }, Literal { kind: Str, symbol: "Hello" }, Punct { ch: ';' }, ], }, Punct { ch: ';' },]
Note: some information has been stripped from the above representation for concision, notably Span/Spacing fields
A brief note on Rust
dagger is provided as a library for the Rust programming language. Although Rust is a relatively modern language, it has seen widespread adoption in the fields of operating system development, embedded systems, web backend and frontend, and more. It was selected as the implementation language for dagger due to its memory safety guarantees, language ergonomics, large community, and unique procedural macro feature that enables dagger to execute its token transformations.
dagger!
The primary API provided by the dagger library is its namesake macro. All code written within the bounds of a dagger! invocation will be processed into DAG-based execution:
dagger! { // code here will be processed by dagger}
Formal Specification
Augmented Backus-Naur Form is a metalanguage — a language for formally specifying the syntax of other languages — outlined in IETF Request for Comment (RFC) 5234. dagger’s syntax is defined in ABNF as follows (note: %s syntax from RFC 7405 is used):
where the special rule EXPR represents a Rust expression, and IDENT represents a Rust identifier.
Informal Specification and Usage
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
In more common parlance, dagger follows the following syntactic rules:
// everything inside the dagger! block is visible to daggerdagger! { // MAY contain one or more "node assignments", followed by // an OPTIONAL return expression}
Here, a node assignment MUST follow the structure
ident :: expr;
where ident is an identifier (name) for the node, and expr is a Rust expression (section of code), whose value is assigned to ident once evaluated. For instance, in
dagger! { a :: Ok(3); b :: Ok(a + 5);}
node a is assigned to the result of Ok(3), and node b is assigned to the result of Ok(a + 5), which in this case would evaluate to Ok(8) (the meaning behind Ok will be covered shortly). Since arbitrary Rust expressions are permitted, the expression “body” of each node can be as simple or complex as the user wishes. For instance, here is an example of a more complex node definition, taken from our team’s firmware platform miso:
write_history :: { for (sensor_read, history_item) in sensor_data .iter() .zip(HISTORY.iter()) .filter_map(|(sensor_read, history_item)| { sensor_read.as_ref().map(|value| (value.1, history_item)) }) { let mut lock = history_item.lock(); lock.push((*timestamp, sensor_read)); } Ok(())};
After all node definitions, the dagger! invocation MAY terminate with a return statement, which specifies which node(s) should have their value(s) returned out of the block. This return statement MUST take one of the following forms:
and the returned value MUST be either a single value (e.g., the above) or a tuple of values. For example:
dagger! { a :: { // ...some code }; b :: { // ...some code }; return (a, b)}
This enables output value(s) to be bound to one or more variables for later use, outside the dagger! block:
let dagger_output = dagger! { a :: Ok(5); b :: Ok(a + 3); (a, b)}.exe();println!("dagger_output = {dagger_output:#?}");
Which prints
dagger_output = ( Ok( 5, ), Ok( 8, ),)
Error Propagation and Lazy Evaluation
Keen readers likely noticed that in all examples provided so far, node expressions have been wrapped in Ok(...). This is because node expressions MUST return a value of type NodeResult<T>, where T is any success type.
Unlike many languages, Rust represents errors as values that can be passed around rather than exceptions that traverse up the call stack. As such, fallible operations typically return an Resultenumeration, which is a value that can either be in a success (Ok) state, or a failure (Err) state. dagger’s NodeResult<T> type is essentially
enum NodeResult<T> { Ok(T), Err(NodeError),}
where NodeError is a specialized error reporting type. This means that each node expression MUST return either a success value (of any type T), or a NodeError in case of failure.
For ergonomic reasons, automatic conversion of almost all error types into a NodeError is supported. As a practical example:
The std::process::Command::new("my_process").output() expression attempts to run Firefox, and returns a std::io::Error if Firefox cannot be spawned (for instance, if is not installed on the system)
The ? operator tells Rust to early-return an error if the preceding expression fails
Although my_fallible_function expects a NodeError, the std::io::Error is accepted, since a NodeError can automatically be constructed from most other error types This allows dagger to execute processes in a manner where errors are properly handled and propagated, rather than have them crash the program altogether. If we use my_fallible_function in a dagger declaration like so:
let graph = dagger! { a :: Ok(5); b :: Ok(a + 3); c :: my_fallible_function(); d :: { c; Ok(*b) }; return (b, d);};
which yields the following execution graph (d depends on c and b, and nodes b and d are output)
The b output will never fail to be computed, since both a and b always return Ok(...). However, c may be in a failed (error) state, depending on if my_fallible_function succeeds or fails. In the failure case, the following execution visualization is produced:
graph.exe_visualize("fallible.svg");
Since c failed, all nodes downstream of c will propagate its error (in this case d and part of output). However, all paths that did not encounter errors while executing (like a > b > output) will continue to execute and succeed. The benefits of such an approach are immediately obvious: when designing complex inter-related processes with many branches, dagger allows paths which do not encounter errors to succeed, while “poisoned” execution branches propagate their errors to the user.
Node Types
In the following example:
dagger! { name :: get_a_name(); b :: Ok(name.clone() + "'s birthday party"); c :: Ok(name.clone() + "'s anniversary");}
name is assigned the result of get_a_name(). Let us assume this function returns a NodeResult<String>, and the function occasionally fails to get a valid name. As mentioned in the previous section, if a node fails, all downstream nodes will propagate the failure. As such, downstream nodes only execute when their parents succeed, and only need to consider the “success” variant of their parents. If the NodeResult<String> type returned by get_a_name() is defined as:
downstream nodes will only execute if the parent returns a NodeResult::Ok(String). As such, the value that dagger passes to child nodes is NOT a NodeResult<String>, but rather a read-only reference to the success case (denoted &String). The fact that the reference is immutable means that child nodes cannot modify the value returned by parents. This avoids cases where multiple children attempt to modify the same parent value, leading to problematic behavior, and it is why name is cloned in b and c before concatenation (.clone() in Rust makes a copy of the value). If the above code had attempted to mutate the value of name or reassign it:
b :: Ok(name.push_str("!")); // `push_str` mutates the String in place
OR
b :: Ok(*name = "Bob".to_string()); // *name = ... reassigns name
it would fail to compile with an error:
error[E0596]: cannot borrow `*name` as mutable, as it is behind a `&` reference--> lib/examples/playground.rs:9:17 |9 | b :: Ok(name.push_str("!")); | ^^^^ `name` is a `&` reference, so the data it refers to cannot be borrowed as mutable
Execution APIs
A dagger! definition does not immediately execute. Instead, it returns a Graph object:
let output: Graph<_, _> = dagger! { // some code};
The Graph can then be executed by calling one of the following methods on it:
exe
Purpose: Directly execute the Graph and return its result
Signature:
pub fn exe(&self) -> T
Example:
let output = dagger! { a :: Ok(3); b :: Ok(a + 2); return (a, b);}.exe();println!("{output:?}")
outputs (Ok(3), Ok(5))
exe_visualize
Purpose: Execute the graph, return its result, and save a visualization of the process
Signature:
pub fn exe_visualize<P: AsRef<Path>>(&self, path: P) -> T
Example:
dagger! { a :: Ok(3); b :: { a; Err(NodeError::msg("Oh no! I failed!")) }; c :: Ok(b + 2); x :: Ok(10); y :: Ok(x + 2); z :: Ok(y + 2); return (z, c);}.exe_visualize("visualization.svg");
saves the following SVG file at visualization.svg:
Additional APIs
Apart from the dagger! macro, the dagger library also provides two additional parallelization APIs.
parallelize
This function allows users to execute a function on every element of a collection in parallel. This is useful when multiple processes need to be executed in parallel, and the processes are determined at runtime (e.g., they are based on user input).
This function takes an iterator of items, and a function which defines the transformation to execute on each item.
This function returns a slice equal in length to the input iterator. Each slice element is a Result<O, Box<dyn Any>, where O is the output type of the transformation function. The output elements will always be the Ok(O) variant, unless the transformation causes a panic (stack unwind), in which case the Err variant will be returned.
Example
let mut input_list = String::new();print!("Enter a comma-separated list: ");stdout().flush().unwrap();stdin().read_line(&mut input_list).unwrap();parallelize(input_list.split(',').enumerate(), |(i, item)| { let delay = rand::random_range(100..1000); thread::sleep(Duration::from_millis(delay)); println!("Item {i} has value {}!", item.trim());});
This program queries the user for a comma-separated list of items. Then, for each list item, the program waits a random duration (100ms-1s) and prints the item’s index and value. Since this is done in parallel, the order in which items are printed is stochastic:
Enter a comma-separated list: 1,2,3,4,5Item 1 has value 2!Item 4 has value 5!Item 2 has value 3!Item 0 has value 1!Item 3 has value 4!
parallelize_with_time_limit
This function is identical to parallelize, except it takes one additional parameter to define a time limit for the transformation.
This function returns a slice equal in length to the input iterator. Each item is an Option<O>, where O is the output type of the transformation function. These items will be the Some(O) variant unless the transformation times out or causes a stack unwind, in which case the item will be None.
Installation
dagger is distributed as a standard Rust library (rlib). To add dagger to your project, simply use Cargo to add dagger as a dependency:
The dagger library contains a single feature flag (visualize), which is enabled by default. This enables graph rendering capabilities (.exe_visualize), but implicitly imports the layout-rs as a rendering backend for graph visualizations. If you wish to disable this feature, simply append --no-default-features to the above installation command. When visualize is disabled, dagger has 0 dependencies (apart from the Rust standard library, which all Rust projects depend on by default).
To import dagger’s APIs and types, import the prelude module at the top of your Rust file:
use dagger_lib::prelude::*;
DBTLs and Development
The current rendition of dagger is the product of numerous iteration cycles across various aspects of the library.
Macro syntax
V1
The syntax will enable linking tasks to identifiers and defining task inputs. Only permissive toward an identifier followed by a function call.
Implementation is done via a procedural macro that builds nodes out of (Identifier, Identifier, Punctuated<Identifier, Comma>) where the first Identifier is the node name, second Identifier is the called function, and Punctuated<Identifier, Comma> is the called function’s arguments (where each Identifier in this list is the name of another node).
We are able to parse node definitions and build a DAG out of dependency relationships, serving as an initial proof of concept of the dagger methodology.
In future, node definitions should be more permissive toward complex Rust expressions rather than simply a function call and arguments.
The syntax will recursively parse Rust expressions; for each node, all identifiers which correspond to other nodes are collected. As such, an identifier followed by an arbitrary Rust expression is permitted.
Implementation is done via recursive parsing of the expression, and DAG construction from the resultant relationships.
We are able to parse complex Rust expressions and identify nodal relationships out of the noise.
The primary advantage of the V2 syntax is that existing Rust code can be easily converted into dagger-compatible syntax. For instance, we had existing bioreactor code which was running sequentially; conversion to dagger syntax took less than 5 minutes, and made our code run in parallel.
You get a thread! You get a thread! Everyone gets a thread!
This initial implementation will run every node in a different thread. One thread will be spawned per node, and threads will wait until all “parent” nodes complete before executing the node’s function.
Initial implementation is in this commit. Threads are synchronized via waiting on atomics via platform-specific APIs (e.g., futex on Linux).
Initially, various bugs from the platform-specific implementations and use of unsafe code arose (for instance, an ownership bug which broke Rust’s invariants caused double-free and use-after-free bugs). These have been resolved, but the implementation still feels very fragile.
In future, greater reliance on std APIs over hand-rolled, platform-specific APIs would be desirable.
The algorithmic basis for this DBTL cycle is as follows (all thread scheduling is still resolved at compile-time, so the amount of time needed for each process to run is unknown):
If a process is guaranteed to run AFTER another process, it can be written on the previous process’s thread.
Closed paths “free” threads for downstream use. For instance, given the following execution graph:
joining b and c frees the thread that executed c for downstream reuse. As such, the thread that executed process b can be reused for process e, and the thread for process c can be used for f. This means only two threads are needed in total to run this process; one thread could run processes a, b, d, and e, while the other thread runs processes c and f.
If a process does not directly inherit from a previous process, there is no guarantee that it will run after this process. For instance, if the above graph was modified as follows:
Now, d and its children no longer inherit from c. As such, there is no guarantee that c will be finished by the time f runs, so f cannot be scheduled on c’s thread.
Formally, this is somewhat akin to finding the minimum path cover of the execution graph
The above algorithm will be implemented via a greedy local search algorithm. Broadly, each “orphan” node (in the above example, this is 'a') becomes a starting point for search, and the graph is traversed top-to-bottom, with “free threads” (due to multi-parent nodes) tracked for later distribution.
The algorithm is implemented in a testing workspace, enabling algorithmic validation on manually constructed DAGs.
The algorithm successfully resolves complex DAG structures in an efficient manner. However, it is still suboptimal since it cannot leverage runtime information (e.g., in the above figure, if 'c' terminates before 'f', it can be re-used, enabling the process to only run with 2 threads).
DAG execution can only be fully optimized by leveraging both compile-time and runtime information. At compile-time, relationships between processes must be determined from the source code tokens; then, at runtime, thread scheduling should be determined.
Tony was extremely interested in dagger’s implications on building more robustly parallelized software. He reviewed this algorithm, and suggested leveraging thread pooling mechanisms rather than pure compile-time information. This would enable additional optimizations based on process execution duration, and likely result in a simpler solution.
Tony Liu
Doctoral student, bioinformatics (microbiology and computer science)
I’m watching you
The final dagger implementation leverages both compile-time and runtime information. At compile-time, a DAG is constructed, and all processes (along with information about their parents/children) are re-emitted into the output token stream. Specifically, the information is passed to a Scheduler::execute(...) call, which constructs a dagger scheduler that oversees thread scheduling at runtime.
The runtime scheduler will run on a supervisory thread, remaining parked for most of the DAG’s execution, waking sporadically when processes terminate to check if their children can be run. If so, it will schedule child tasks on a previously-spawned thread that is now free; if no such thread exists, a new thread is spawned.
This was built by designing a novel Scheduler structure, along with data structures to hold task/thread information. Implementation details are available in the library internals section.
This implementation has been tested extensively to be robust, including by leveraging dynamic analysis tools.
Leveraging runtime introspection is critical when designing parallelization systems, as it improves thread reuse efficiency beyond what can be reasoned at compile-time.
Library Internals
dagger is technically comprised of two internal crates, dagger_lib and dagger_proc . dagger_proc defines the dagger! procedural macro, while dagger_lib defines all other parallelization primitives and the dagger task scheduler.
Token parsing
First, the dagger! macro analyzes its input tokens (represented as a TokenStream), and parses it into a series of Node definitions and a list of nodes to be output:
The name and process fields of each Node are initialized based on the node’s name identifier and process expression, and parents is initialized as an empty list. Then, a recursive algorithm walks each node’s process, and identifies any other node names in the expression; these other nodes are declared “parents” of the current node.
Additional computations are then executed on the GraphStructure object:
The children of each node are also resolved, with lookup done through a HashMap using the rustc_hash algorithm
This algorithm is used by the Rust compiler, and is ~10% faster than the default hash algorithm used in std according to our rough benchmarks
Graph visualizations are computed if the visualize feature is enabled
Memory buffers to hold node outputs are set up
Task objects (which hold parent/child information) are computed for each node
Error handling is set up, ensuring that downstream nodes will propagate errors or receive the suitable Ok type This information is then emitted into the output token stream.
Macro expansion
All compile-time information must be embedded in the output token stream. For instance, the following (extremely) simple dagger! invocation
let graph = dagger! { a :: Ok(5); b :: Ok(*a); return b;};
is expanded into this (expansion determined via cargo-expand):
let graph = { use dagger_lib::__private::*; #[allow(path_statements, clippy::unused_unit)] let execution_function = | write_path: Option<&::std::path::Path>, dot: &'static str| { let __private_a = ProcessData::default(); let __private_b = ProcessData::default(); let __private_a_fn = || { if true { let () = (); let process_value: NodeResult<_> = { Ok(5) }; let process_result = process_value.into_graph_result("a"); __private_a.set(process_result); } else { let mut joined_err = GraphError::default(); __private_a.set(Err(joined_err)); } }; let __private_a_task = Task::new(0u32, &[1usize], &__private_a_fn); let __private_b_fn = || { let a = unsafe { __private_a.get() }; if a.is_ok() { let (a) = (a.unwrap()); let process_value: NodeResult<_> = { Ok(*a) }; let process_result = process_value.into_graph_result("b"); __private_b.set(process_result); } else { let mut joined_err = GraphError::default(); if let Err(e) = a { e.push_error(&mut joined_err); } __private_b.set(Err(joined_err)); } }; let __private_b_task = Task::new(1u32, &[], &__private_b_fn); Scheduler::execute([__private_a_task, __private_b_task]); let (b) = (unsafe { __private_b.get_owned() }); if let Some(path) = write_path { visualize_errors(path, &[&b.as_ref().map(|_| ())], dot); } (b) }; Graph::new( execution_function, "digraph {\ngraph [rankdir=\"TB\"]\na [shape=box label=\"a\"]\nb [shape=box label=\"b\"]\na -> b\n"___process_output\" [label=Output]\nb -> \"___process_output\"\n}\n", )};
To the uninitiated, this looks like a mess; however, this contains all the information determined by dagger at compile time, and sets up DAG-based execution of the process. The proceeding sections will walk through the core data structures that enable this execution.
ProcessData
The output value of each node’s execution must:
Be initialized on the stack as uninitialized memory
Be written to (mutated) when the node finishes executing
Be read (immutable borrow) by child nodes
Have its drop implementation called at the end of execution
This immediately violates two of Rust’s core safety invariants, namely:
Memory cannot be uninitialized or null
Reading/writing a value across threads should be either done atomically (e.g., AtomicUsize/AtomicBool) or synchronized (e.g., Mutex/RwLock)
As such, dagger dips into the dark arts of unsafe Rust. In Rust, the unsafe keyword allows developers to write code which the compiler cannot guarantee upholds Rust’s invariants. As such, it is up to the developer to construct the program in a manner such that these invariants are guaranteed by the program or library. In dagger, each node has an associated ProcessData instance:
Finally, at the end of scope, all memory allocated by nodes is cleaned up by calling the inner type’s drop implementation:
impl<T> Drop for ProcessData<T> { fn drop(&mut self) { unsafe { self.0.get_mut().assume_init_drop() } }}
Scheduler
dagger’s thread runtime is managed by a Scheduler. This data structure manages a list of nodes (each node is referred to here as a task), and schedules them accordingly onto threads (using a thread Scope):
Initially, each orphan (parentless) task is spawned onto a separate thread. These tasks contain information that helps them track when they should execute, and what children inherit from them:
You may find it interesting that children is a list of numerical indices; these correspond to elements in the tasks field in Scheduler. Rather than looking up children in a hash table, all relationships are pre-computed at compile-time; this enables fast constant-time lookups. As such, when a task completes, the scheduler analyzes each child process, and checks if all its parent processes have terminated (by comparing num_parents with completed_parents). If so, the task is scheduled on the first available free thread, and if no free threads are available, a new thread is constructed.
Stacking to the moon
By relying on slicing rather than hash tables, dagger does not need to allocate a HashMap on the heap. In fact, dagger does not allocate any memory on the heap as part of its scheduling system. Even the threads field of Scheduler is stack-allocated:
It is implemented as an array of uninitialized memory, with its size set to the number of tasks in the scheduler (the theoretical limit on the number of threads needed is one per node). Additionally, a cursor keeps track of the current number of elements in the array (the number of threads spawned), ensuring uninitialized memory is never read.
Copy that 🗣️ 🔥
When a worker thread is spawned, a channel is opened so that the supervisory thread can communicate with it. Specifically, the scheduler holds a Sender object
which allows it to send new tasks to the thread. The newly spawned thread is also given a Sender (with the main thread holding a Receiver), allowing it to notify the main thread when it has finished a task.
I know what you did, and I know where you live.
dagger internally holds two error types, NodeError and GraphError. NodeError is a public API; every dagger node must return a Result<T, NodeError>. This error type is defined as follows:
where error holds the captured outer error, and caller holds a Location struct. Location is a Rust API designed to inform a user where a crash (panic) occurs; however, NodeError uses it to store the location where an error occurred so that the user can better introspect process failures. GraphError holds a list of NodeErrors (if both parents of a node fail, for instance, they are propagated through the same GraphError object) along with information about which nodes failed.
As an example, the following graph will always fail on nodes b and e:
fn failing_function() -> NodeResult<String> { Err(NodeError::msg("Function failed!"))}let result = dagger! { a :: Ok(5); b :: failing_function(); c :: Ok(a + 5); d :: Ok((b.clone(), *c)); e :: failing_function(); f :: Ok((d.clone(), e.clone())); return f;}.exe();println!("{result:#?}");
When it is executed, the user is presented with clear information on the source of the error, both in terms of which nodes failed, and where the error originated in the source code (filename/line/column):
Libraries must be held to the highest standards of correctness, and concurrent processes are notoriously difficult to test. Parallelism is difficult to reason about, and is stochastic in nature, making reproducibility enormously difficult. Furthermore, dagger’s use of unsafe means that adherence to Rust invariants must be checked manually rather than by the compiler.
Enter: miri. An acronym for mid-level intermediate representation interpreter, miri compiles Rust code to rustc’s mid-level intermediate representation (one of the “compiler layers” we discussed earlier), and runs this intermediate representation in an interpreted rather than compiled context. As such, the miri runtime is able to dynamically introspect the program’s execution to:
Identify violations of Rust’s core memory invariants (i.e., data races)
Catch undefined behavior (i.e., accesses to uninitialized memory)
Identify memory leaks
Simulate concurrent execution in a deterministic (reproducible) manner
As such, dagger’s integration suite was extensively tested under miri. This actually enabled us to catch two bugs:
An infrequent atomicity violation causing uninitialized memory access
A memory leak caused by a drop implementation not being called
Now, all dagger integration tests run consistently under miri without any errors being reported.
The all-seeing eye
miri provides one additional benefit: it can profile the interpreted program. This is particularly useful as a means to visualize how dagger schedules processes on threads, achieved by exporting the measureme data to a Chromium performance profile. Let us consider the following example:
dagger! { a :: expensive(5); b :: expensive(a + 3); c :: expensive(a + 5); d :: expensive(b + c); e :: expensive(d + 3); f :: expensive(d + 5);};
Here, expensive just runs an expensive computation and returns its input back (used to make tasks require non-trivial amounts of time to execute). The above example results in the execution graph
and, when profiled using miri, generates the following visualization in Chromium DevTools (coloured portions correspond to node executions):
We can see that scheduling occurs exactly as expected:
One thread runs a
b and c are run in parallel
One thread runs d
e and f are run in parallel
Leveraging miri has become so intrinsic to dagger’s validation that two scripts are provided in dagger’s GitLab repository for testing. miri.sh runs the specified process under miri with strict checks enabled, and miri_crox.sh generates a chrome_profiler.json file.
Implications and Usage
dagger lays the groundwork for both miso, our firmware UI framework that drives hardware, and maestro, our workflow execution framework.
miso
Every second, a polling loop executes which:
reads every sensor
writes this data to a CSV file
writes this data to a memory buffer
sends this data to each web client
Originally, this was implemented sequentially: the sensors would be read, then each post-processing step would run in series. In this implementation, at most 0.9s were allocated for sensor reads to complete, and 0.1s for post-processing steps. However, this would occasionally be overrun, leading to inconsistent polling cycles.
This polling logic was moved into a dagger! block, requiring only minor adjustments to match dagger’s syntax:
which immediately rearranged all post-processing steps to run in parallel, greatly improving the polling subsystem’s efficiency (and ensuring cycles complete within their allocated time). The execution DAG is as follows:
This demonstrates how dagger is able to easily integrate into existing projects, and provide an easy, ergonomic API to parallelize processes based on data flow.
maestro
maestro has been purpose-built to leverage dagger for workflow parallelization. For more information, read the docs here.
Upon learning about dagger’s “execute as a DAG” mentality, Ryan immediately noted that leveraging this library for defining bioinformatics workflows would be a logical next step. He indicated that expressing parallelism via the flow of data appears to be a highly intuitive paradigm.
Ryan McLaughlin
PhD Candidate, Bioinformatics
Future Directions
Error reporting
The current stable version of dagger reports invalid syntax by forwarding the default error messages in the syn parsing library. At times, these error messages are descriptive:
but other times, they are less helpful:
As such, an active area of development in dagger involves overhauling the error reporting system to return custom errors which better explain what dagger expects.
A synless implementation
As mentioned, dagger relies on the syn library for parsing macro input into a data structure. This is extremely common amongst Rust macro libraries; syn is reportedly the most downloaded library on crates.io. However, syn is a relatively large library. As a result, it takes a relatively long time to compile, which in turn slows down compilation of any project that depends on dagger.
As such, dagger’s parser is being actively rewritten to remove dependency on syn, and directly leverage proc_macro APIs instead. Though this implementation will be more verbose, the compile time improvement is expected to be non-negligible.