# E-class analysis

In this lesson, we learn how to combine the power of equality saturation and Datalog. We will show how we can define program analyses using Datalog-style deductive reasoning, how EqSat-style rewrite rules can make the program analyses more accurate, and how accurate program analyses can enable more powerful rewrites.

Our first example will continue with the path example in lesson 2. In this case, there is a path from e1 to e2 if e1 is less than or equal to e2.

(datatype Expr
    ; in this example we use big 🐀 to represent numbers
    ; you can find a list of primitive types in the standard library in `src/sort`
    (Num BigRat)
    (Var String)
    (Add Expr Expr)
    (Mul Expr Expr)
    (Div Expr Expr))

Let's define some BigRat constants that will be useful later.

(let zero (bigrat (bigint 0) (bigint 1)))
(let one (bigrat (bigint 1) (bigint 1)))
(let two (bigrat (bigint 2) (bigint 1)))

We define a less-than-or-equal-to relation between two expressions. (leq a b) means that a <= b for all possible values of variables.

(relation leq (Expr Expr))

We define rules to deduce the leq relation. We start with transitivity of leq:

(rule (
    (leq e1 e2)
    (leq e2 e3)
) (
    (leq e1 e3)
))

We can define a few axioms for deciding when one expression is less than or equal to another.

Base case for leq for Num:

(rule (
    (= e1 (Num n1))
    (= e2 (Num n2))
    (<= n1 n2)
) (
    (leq e1 e2)
))

Base case for leq for Var:

(rule (
    (= v (Var x))
) (
    (leq v v)
))

Recursive case for leq for Add:

(rule (
    (= e1 (Add e1a e1b))
    (= e2 (Add e2a e2b))
    (leq e1a e2a)
    (leq e1b e2b)
) (
    (leq e1 e2)
))

Note that we have not defined any rules for multiplication. This would require a more complex analysis on the positivity of the expressions.

On the other hand, these rules by themselves are pretty weak. For example, they cannot deduce x + 1 <= 2 + x. But EqSat-style axiomatic rules make these rules more powerful:

(birewrite (Add x (Add y z)) (Add (Add x y) z))
(birewrite (Mul x (Mul y z)) (Mul (Mul x y) z))
(rewrite (Add x y) (Add y x))
(rewrite (Mul x y) (Mul y x))
(rewrite (Mul x (Add y z)) (Add (Mul x y) (Mul x z)))
(rewrite (Add x (Num zero)) x)
(rewrite (Mul x (Num one)) x)
(rewrite (Add (Num a) (Num b)) (Num (+ a b)))
(rewrite (Mul (Num a) (Num b)) (Num (* a b)))

To check our rules

(let expr1 (Add (Var "y") (Add (Num two) (Var "x"))))
(let expr2 (Add (Add (Add (Var "x") (Var "y")) (Num one)) (Num two)))

(fail (check (leq e1 e2)))
(run-schedule (saturate (run)))
(check (leq e1 e2)) ; should pass

A useful special case of the leq analysis is if an expression is upper bounded or lower bounded by certain numbers, i.e., interval analysis:

(function upper-bound (Expr) BigRat :merge (min old new))
(function lower-bound (Expr) BigRat :merge (max old new))

In the above functions, unlike leq, we define upper bound and lower bound as functions from expressions to a unique number. This is because we are always interested in the tightest upper bound and lower bounds, so

(rule (
    (leq e (Num n))
) (
    (set (upper-bound e) n)
))

(rule (
    (leq (Num n) e)
) (
    (set (lower-bound e) n)
))

We can define more specific rules for obtaining the upper and lower bounds of an expression based on the upper and lower bounds of its children.

(rule (
    (= e (Add e1 e2))
    (= (upper-bound e1) u1)
    (= (upper-bound e2) u2)
) (
    (set (upper-bound e) (+ u1 u2))
))

(rule (
    (= e (Add e1 e2))
    (= (lower-bound e1) l1)
    (= (lower-bound e2) l2)
) (
    (set (lower-bound e) (+ l1 l2))
))

... and the giant rule for multiplication:

(rule (
    (= e (Mul e1 e2))
    (= le1 (lower-bound e1))
    (= le2 (lower-bound e2))
    (= ue1 (upper-bound e1))
    (= ue2 (upper-bound e2))
) (
    (set (lower-bound e)
         (min (* le1 le2)
              (min (* le1 ue2)
              (min (* ue1 le2)
                   (* ue1 ue2)))))
    (set (upper-bound e)
         (max (* le1 le2)
              (max (* le1 ue2)
              (max (* ue1 le2)
                   (* ue1 ue2)))))))

Similarly,

(rule (
    (= e (Mul x x))
) (
    (set (lower-bound e) zero)
))

The interval analysis is not only useful for numerical tools like Herbie, but it can also guard certain optimization rules, making EqSat-based rewriting more powerful!

For example, we are interested in non-zero expressions

(relation non-zero (Expr))
(rule ((< (upper-bound e) zero)) ((non-zero e)))
(rule ((> (lower-bound e) zero)) ((non-zero e)))
(rewrite (Div x x)         (Num one) :when ((non-zero x)))
(rewrite (Mul x (Div y x)) y         :when ((non-zero x)))

This non-zero analysis lets us optimize expressions that contain division safely.

; 2 * (x / (1 + 2 / 2)) is equivalent to x
(let expr3 (Mul (Num two) (Div (Var "x") (Add (Num one) (Div (Num two) (Num two))))))
(let expr4 (Var "x"))
(fail (check (= expr3 expr4)))
(run-schedule (saturate (run)))
(check (= expr3 expr4))

; (x + 1)^2 + 2
(let expr5 (Add (Mul (Add (Var "x") (Num one)) (Add (Var "x") (Num one))) (Num two)))
(let expr6 (Div expr5 expr5))
(run-schedule (saturate (run)))
(check (= expr6 (Num one)))

Debugging tips!

print-size is used to print the size of a table. If the table name is omitted, it prints the size of every table. This is useful for debugging performance, by seeing how the table sizes evolve as the iteration count increases.

(print-size leq)
(print-size)

print-function extracts every instance of a constructor, function, or relation in the e-graph. It takes the maximum number of instances to extract as a second argument, so as not to spend time printing millions of rows. print-function is particularly useful when debugging small e-graphs.

(print-function leq 15)

extract can also take a second argument, which causes it to extract that many different "variants" of the first argument. This is useful when trying to figure out why one e-class is failing to be unioned with another.

(extract expr3 3)

Exercises:

(Free variable analysis) One analysis that is frequently useful is free variable sets. While it is possible to simulate "sets" using only functions, egglog provides containers to make this less tedious and more efficient. A container is essentially a value that can store other values; some examples are Set, Map, and Vec. However, before we can construct a container, we have to tell egglog what sort to use inside the container. This is done with an overload of the sort command.

(sort FreeVarSet (Set String))

Now, we can construct sets with set-of.

(extract (set-of "1" "1"))

We will need a function to store the results of our free variable analysis. We have to use set intersection for the merge function, because of rewrites like x / x => 1.

(function FreeVars (Expr) FreeVarSet :merge (set-intersect old new))

Finally, we will need you to write the rules for the free variable analysis. You should have one rule for every variant of Expr. Here's an example rule for Add:

  (rule (
      (= e (Add a b))
      (= f (FreeVars a))
      (= g (FreeVars b))
  ) (
      (set (FreeVars e) (set-union f g))
  ))

If everything worked, expr5 should only have "x" as a free variable.

 (run-schedule (saturate (run)))
 (extract (FreeVars expr3))