Declarative Program Analysis and Optimization

CS294-260, Spring 2024

From Datalog to Flix: A Declarative Language for Fixed Points on Lattices

Monday, February 12, 2024
Discussion lead: Tianrui, Altan

Reading guide due Saturday, February 10 at 2:30pm

Paper link

Summary

This paper presents Flix, an extension of Datalog to support lattices and monotone functions. The authors show how to naturally extend the model theoretic semantics of Datalog to Flix, although some care is taken to ensure that the naive and semi-naive evaluation strategies still work. Finally, Flix is used to implement several program analyses from the literature that are practically infeasible with Datalog. The results show that an unoptimized Flix runtime is able to scale with optimized baselines, although between 2.5-3.1x slower.

Background Notes

Lattices and program analysis. The paper gives the definition of a (complete) lattice, as well as some example lattices used in program analyses. To summarize at a higher level, lattice values correspond to abstractions of concrete program values. Since the set of possible concrete values is typically (practically) infinite, tractable analysis can only be performed over a smaller abstracted set of values. In general, we are concerned with the “small” lattice elements (i.e. those closer to ⊥), since those elements are more “specific” in a sense. This is why it’s natural to extend the minimal model semantics of Datalog to Flix by requiring minimality to respect the lattice orders.

Minimal model semantics of Datalog. Here is another way to read some of the definitions in this section:

the Herbrand universe $U$ is the set of all constant values in your Datalog program,
if $F$ is the set of all relation symbols your program, then the Herbrand base $B$ is the set of all $A(v_1,…,v_n)$, where $A$ is from $F$ (of arity $n$) and $v$ is from $U$.

The minimal model satisfies all “input facts” (the extensional database) as well as all facts derived from the input (the intensional database), with no “extra” facts.

Extending Datalog to Flix

The authors extend Datalog to Flix by making every relation effectively a map from non-lattice values to lattice values. Thus, a relation $A(x_1,…,x_n)$ from Datalog becomes $A_L(x_1,…,x_n,v)$ in Flix, where L is a lattice and v is from L. Each relation $A_L$ and “key” $(x_1,…,x_n)$ defines a “cell” of the newly extended Herbrand base. Since the key is drawn from the non-lattice universe, it follows that there are finitely many cells, although the cells may be infinite in size (if the underlying lattice is infinite).

This motivates the notion of “compactness”, which states that every cell has a single value. Compactness is what lets us actually run the naive and semi-naive evaluation strategies on Flix.

Finally, a fact is satisfied in Flix as long as the interpretation has a larger fact (of the same cell) with respect to the lattice order.

Lattice Operation in Flix

In this subsection, we correspond operations in lattices to those in the language flix.

Suppose we have fact A(x) and B(x)

Join in a lattice

Join operation over a set of lattice S computes the least upper bound of the lattice in S.

In flix, joining of lattice could be computed as

R(x) :- A(x)
R(x) :- B(x)

Meet in lattice domain

Meet operation over a set of lattice S computers the greatest lower bound of the lattice in S.

In flix, meet of a set of lattice could be expressed as

R(x) :- A(x), B(x)

Executing Flix

A few key points:

To ensure compactness, collapse each cell to the least upper bound of its contents before applying the immediate consequence operator.
Filter functions (which are just monotone functions to the boolean lattice) connect lattice values to the “relational” side.
General monotone functions must be executed after the body; once you’ve determined a fact on the RHS, the monotone function tells you where to put it (e.g. sum(x, y) rather than x or y).
Direct products of flix are available, however logical and reduced products (which are more precise than direct products) are not available (needs manual implementation). This means that results in flix are over-approximation of the original result.

Semi-Naive Evaluation in flix

In flix, one key difference is made for semi-naive evaluation from Datalog. In datalog, the increments of a new set is the tuples added to the old set. However, in flix, the increments is computed as the set of ground atom in the new set which is not LEQ than the ground atom of the same cell as the original set. This new requirement is because the compactness of flix programs.

Evaluation

Most of this section can be skimmed, since it needs a good bit of background (see the Strong Update, IFDS, and IDE papers).

The takeaway is that the declarative similarities between the analyses becomes much more apparent when the operational details (such as complex mutual recursion) are elided.

Finally, the results show that Flix can:

scale beyond naive Datalog encodings of lattice-based analyses, although lagging much behind a native c++ implementation
match the scaling of an imperative implementation of IFDS, although with a consistent ~3x slowdown.

Reading Questions

Flix does not support negation (in the paper), but not necessarily due to the difficulty of extending stratified negation from Datalog. What would it mean to negate a lattice element? Is it even well defined, generally?
What do we lose by requiring compactness, if anything?
This version of Flix does not have any query optimization. Is it as “simple” as doing what Doop did, or do lattices and transfer functions make it harder?
Flix chokes on Strong Update compared to the C++ baseline. The authors claim this is due to some kind of implicit representation in the C++ implementation. Is there an opportunity for compression or sharing in Flix, or even Datalog? Is this related to BDDs?

Discussion Questions

How to implement logical and reduced product in flix manually
Can you think of how to improve flix to avoid excessively representing lattice values of $\top$ and $\bot$?
Safety guarantee provided by Datalog is still not implemented by flix. Which part in its generalization do you think makes it difficult?