Declarative Program Analysis and Optimization

CS294-260, Spring 2024

From Datalog to Flix: A Declarative Language for Fixed Points on Lattices

Summary

This paper presents Flix, an extension of Datalog to support lattices and monotone functions. The authors show how to naturally extend the model theoretic semantics of Datalog to Flix, although some care is taken to ensure that the naive and semi-naive evaluation strategies still work. Finally, Flix is used to implement several program analyses from the literature that are practically infeasible with Datalog. The results show that an unoptimized Flix runtime is able to scale with optimized baselines, although between 2.5-3.1x slower.

Background Notes

Lattices and program analysis. The paper gives the definition of a (complete) lattice, as well as some example lattices used in program analyses. To summarize at a higher level, lattice values correspond to abstractions of concrete program values. Since the set of possible concrete values is typically (practically) infinite, tractable analysis can only be performed over a smaller abstracted set of values. In general, we are concerned with the “small” lattice elements (i.e. those closer to ⊥), since those elements are more “specific” in a sense. This is why it’s natural to extend the minimal model semantics of Datalog to Flix by requiring minimality to respect the lattice orders.

Minimal model semantics of Datalog. Here is another way to read some of the definitions in this section:

The minimal model satisfies all “input facts” (the extensional database) as well as all facts derived from the input (the intensional database), with no “extra” facts.

Extending Datalog to Flix

The authors extend Datalog to Flix by making every relation effectively a map from non-lattice values to lattice values. Thus, a relation $A(x_1,…,x_n)$ from Datalog becomes $A_L(x_1,…,x_n,v)$ in Flix, where L is a lattice and v is from L. Each relation $A_L$ and “key” $(x_1,…,x_n)$ defines a “cell” of the newly extended Herbrand base. Since the key is drawn from the non-lattice universe, it follows that there are finitely many cells, although the cells may be infinite in size (if the underlying lattice is infinite).

This motivates the notion of “compactness”, which states that every cell has a single value. Compactness is what lets us actually run the naive and semi-naive evaluation strategies on Flix.

Finally, a fact is satisfied in Flix as long as the interpretation has a larger fact (of the same cell) with respect to the lattice order.

Lattice Operation in Flix

In this subsection, we correspond operations in lattices to those in the language flix.

Suppose we have fact A(x) and B(x)

Join in a lattice

Join operation over a set of lattice S computes the least upper bound of the lattice in S.

In flix, joining of lattice could be computed as

R(x) :- A(x)
R(x) :- B(x)

Meet in lattice domain

Meet operation over a set of lattice S computers the greatest lower bound of the lattice in S.

In flix, meet of a set of lattice could be expressed as

R(x) :- A(x), B(x)

Executing Flix

A few key points:

Semi-Naive Evaluation in flix

In flix, one key difference is made for semi-naive evaluation from Datalog. In datalog, the increments of a new set is the tuples added to the old set. However, in flix, the increments is computed as the set of ground atom in the new set which is not LEQ than the ground atom of the same cell as the original set. This new requirement is because the compactness of flix programs.

Evaluation

Most of this section can be skimmed, since it needs a good bit of background (see the Strong Update, IFDS, and IDE papers).

The takeaway is that the declarative similarities between the analyses becomes much more apparent when the operational details (such as complex mutual recursion) are elided.

Finally, the results show that Flix can:

  1. scale beyond naive Datalog encodings of lattice-based analyses, although lagging much behind a native c++ implementation
  2. match the scaling of an imperative implementation of IFDS, although with a consistent ~3x slowdown.

Reading Questions

Discussion Questions

See also: