Declarative Program Analysis and Optimization

CS294-260, Spring 2024

Efficient E-matching for SMT Solvers

Wednesday, February 28, 2024
Discussion lead: Federico, Jacob, Jiwon

Reading guide due Monday, February 26 at 2:30pm

Paper link

Summary

SMT solvers are good at “ground reasoning” (solving boolean combinations of formulas with no free or bound variables). To support quantifiers, these solvers need to somehow connect quantified formulas (which are not ground since they contain bound variables) to ground formulas. One (now) standard way to do this is to treat universal (existential) quantifiers as infinite conjunctions (disjunctions) of ground formulas (each called an instance of the quantifier) and then heuristically selecting a subset of these instances to include in the ground reasoning. This paper goes deep into the nitty gritty details of the heuristic selection process used by Z3 and how it is efficiently implemented.

At a high-level, the heuristic selection process works as follows for a single quantified formula. First, take subterms from the body of the quantified formula to be patterns (the bound variables become the pattern variables). Second, search the e-graph built from the currently asserted set of ground formulas for terms that match the previously collected patterns. Each match will be demonstrated by a substitution. Finally, select the ground formulas obtained by applying these substitutions to the body of the quantifier. This approach is based on [1] (and there is a healthy amount of copy-pasting to show it). The main novel contributions seem to be (1) improving the empirical performance of matching using code trees and (2) improving the empirical performance of incremental matching using inverted path indices.

E-matching code-trees

The main idea is to compile sets of patterns into instructions by partially evaluating the matching algorithm [2]. Code trees are used to remove redundant work when dealing with multiple patterns, offering performance improvements over a naive approach. When the solver encounters a quantified formula, a pattern is added to a “code tree” with branching instructions. Instructions that match with multiple patterns are encoded in the same node, allowing for compact representation of a set of patterns. The paper formulates an algorithm for inserting patterns into this tree. Instead of matching a term individually against each pattern, the term is used to “run” the code tree, which matches with a set of multiple patterns at once. “Running” a code tree on a term (from your ground terms) will generate (repeatedly yield) substitutions that demonstrate how a pattern (from the set of patterns represented by the code tree) matches that input term.

Incremental matching

Even with code-trees, a set of instructions introduces a lot of redundant matching processes. To decrease the number of matching to perform, the authors integrated optimization techniques from the Simplify prover [1]. We are interested in finding relevant terms and patterns producing new matches for union operation. When union(t1,t2) is executed, ancestors of t1 and t2’s set of congruent terms might change. The mod-time optimization finds relevant terms by marking the already checked terms to prevent redundant matching and pattern-element optimization finds relevant patterns by looking at the relation of terms t1 and t2, such as parent-child or parent-parent. However, mod-time and pattern optimization still introduce redundant matches. The authors introduce inverted path string for further optimization. The inverted path string can be thought of as a path to the term from the subterm of interest. By collecting all the terms that can be reached from the subterm we can identify all relevant terms for the new matches and using the prefix of the inverted path string we can identify which pattern matching can be skipped.

Reading Questions

What’s the deal with free variables? What happens to them?
What instances would be selected for the quantified formula $\forall x P(x + 1)$ using the pattern $x + 1$ if the currently asserted set of ground formulas is ${P(1 + a)}$? Would we be able to prove $(\forall x P(x+1)) \Rightarrow P(1+a)$ with that set of instances? Why or why not?
If we trigger the matching process every time that a new ground term is created, then we may enter an infinite loop. Can you give an example formula and pattern that demonstrates this?

Discussion Questions

Which optimizations were impactful empirically? Why do you think this is? What could they have evaluated that they didn’t? What numbers would you have liked to see?
How impactful is the pattern, what is their strategy, and what else could you do to get these patterns? What makes a “good” pattern?
Is this the only way to do quantifier instantiation? What other ways can you think of that might be fun to try? What would be the tradeoffs? In other words, if e-matching is NP-Hard, do we really need to do it?

References

[1] https://www.hpl.hp.com/techreports/2003/HPL-2003-148.pdf

[2] Leo’s slide

[3] paper

[4] Andrew’s slide

[5] https://youtu.be/GOE9CpqXOHw?feature=shared (start at 14:55 and 25:43)

[6] https://leodemoura.github.io/files/qsmt.pdf (slide 32)

[7] https://leodemoura.github.io/files/SAT-SMT-2012.pdf (slides 39 to 57)