# Case study: LA compiler
In this section, we will learn about functionalities egglog provides as a Rust library. Besides providing many basic operations over the E-graphs in Rust like its predecessor egg, egglog allows extensions in many aspects: base values, primitive functions, containers, schedulers, commands, etc.
For the first half of the case study, we will use the E-graph operations egglog provides to build a compiler for linear algebra. For the second half, we will showcase how we can extend egglog with application-specific scheduler and cost models.
The code is accessible at this GitHub directory.
To see the answer to each question, please take a look at the solutions
branch.
Language for Linear Algebra
The input language for our compiler is simple straghtline programs consisting of a list of declarations, followed by a list of bindings. Each declaration declares an input from the user and has the form
V: Type;
where V
is a variable name and Type
is either R
(reals) or [R; NxM]
(matrix of dimension N x M
, where N
and M
are natural numbers).
Each binding has the form
V = E;
where V
is a variable name and E
is a variable, a number, or binary expressions with
operators *
, /
, +
, -
.
For example, below is a General Matrix-Multiply (GEMM) operation expressed in our DSL.
a: R;
b: R;
A: [R; 32x32];
B: [R; 32x32];
C: [R; 32x32];
R = a*(A*B)+b*C;
Note that the semantics of binary operators are overloaded: *
can mean any of matrix
multiplication, scalar-matrix multiplication, or scalar multiplication, depending on the types
of its operands.
The parser for this language has been implemented and you shouldn't need to worry about it
for the purpose of this section. The result of parsing is stored in a src/ast.rs:CoreBindings
struct.
Part 1: Building the initial e-graph
In this part, we will process the parsed expressions and turn them into egglog ASTs.
To start, take a look at the schema definitions in src/defn.egg
.
There are several ways of running egglog in Rust: the user can call
EGraph::parse_and_run_program
with the program string, call EGraph::run_program
with a
command AST, or define rules using the convenience method provided in egglog::prelude::*
. In
this part, we will use EGraph::parse_and_run_program
for Problems 1 and 2, and
EGraph::run_program
for Problem 3.
Part 2: Running optimizations
We then run the rules we defined in src/defn.egg
to grow the e-graph (Problem 4).
After the e-graph is grown, we will run e-graph extraction to obtain the optimized program
(Problem 5). To extract a program from the e-graph, you can use
EGraph::extract_value_with_cost_model
, which requires a cost model as the input.
Since we used dynamic cost (via set-cost
) in our egglog program, we need to use
DynamicCostModel
from egglog-experimental
to ensure the cost incorporates those dynamically set.
Part 3: Custom scheduling and cost models
In this part, we will further customize egglog's capability.
In Problem 7, we will implement FirstNScheduler
, which applies at most N
matches in each iteration.
Every scheduler in egglog
needs to implement a filter_matches
method, which takes rule metainfo and a Matches
struct.
The Matches
struct encapsulates a set of matches and allows users to choose the
subset of matches to be applied.
Matches that are not chosen for application will be delayed for scheduling to the next iteration.
Additionally, filter_matches
returns a boolean flag, indicating whether the next run of the
rule needs to search the database again to fetch more matches.
For instance, if the scheduler decides to not mark any match for application in the current
iteration, it may also not want to request more matches until the current batch of matches is
applied.
In problem 8, we will customize the cost model. We will define a cost model that prefers terms
with smaller depth. Such a cost model is useful when e.g., we want to minimize the depth of
computation for maximal parallelism. The cost model can be defined in egglog by implementing
the CostModel
trait, which requires the user to define the cost of an e-node, a container
node, a base value, and additionally, the cost of an AST given the cost of the node itself
and its children.