Declarative Program Analysis and Optimization

CS294-260, Spring 2024

Slog: Higher-Order, Data-Parallel Structured Deduction

Wednesday, February 14, 2024
Discussion lead: Jiwon, Altan

Reading guide due Monday, February 12 at 2:30pm

Paper link

Slog Reading Guide

Summary

This paper presents Slog, an extension of Datalog to higher-order relations, as well as a runtime system for massively parallel execution of Slog programs. The key difference between Slog and regular Datalog is native support for recursive/nested facts, which is especially useful for implementing various reduction systems. This is achieved by aggressively interning terms (c.f. hash consing), together with special handling of the equality relation. Slog’s runtime system is essentially a parallel relational algebra engine, utilizing distributed hash joins with additional optimizations (including a new improvement in all-to-all communication by many of the same authors). Finally, Slog is evaluated on a variety of program analysis tasks, and shows good scalability under a massively parallel context.

Extending Datalog to Higher-Order Relations

The main motivation of Slog is to enable reasoning over (tree-structured) terms. Per the paper, to encode the term (lam "x" (ref "x")), one needs to “flatten” the term into something of the form

(= lam-id (lam "x" ref-id))
(= ref-id (ref "x"))

It’s worth noting here that this is essentially the A-normal form for functional languages, which is also related to continuation-passing style (CPS).

Formally, Slog (DL_S) extends Datalog by letting the Herbrand base be an inductively defined set, rather than a finite set of values. The paper gives the example of natural numbers, which can be encoded by

V = {0} U {(succ v) | v in V}.

Then, when determining the truth value of a Slog rule under a given interpretation, the variables in the rule may be substituted with any (compound) terms from V, rather than just literals. This adds an additional layer of recursion to Datalog.

Slog’s Syntactic Sugar (and Further Extensions)

Slog provides syntactic sugar for interacting with the interning indices. For example, to simultaneously bind a term (with free variables) and refer to it in another rule, one can use the ? syntax:

(free ?(ref x) x) ~> [(= e-id (ref x)) --> (free e-id x)].

Slog also allows for disjunctions in the body, unlike Datalog which defines unions of conjunctive queries separately. This lets free variables from the branches to appear jointly in the head, as shown in the example

(or (free Ef x) (free Ea x)) --> (free ?(app Ef Ea) x).

Finally, Slog enables on-demand fact generation through the ! syntax, which produces facts when the surrounding rule is sufficiently satisfied. Slog can also support functions through the {} syntax.

Evaluation and Implementation

This is a very dense section of the paper, so we won’t dive too much into the details as it requires a good chunk of PL, HPC, and databases background. Instead, we provide references for the interested reader.

The conclusion of this section is that Slog is able to implement a good variety of program analyses previously impractical or impossible to implement in pure Datalog. Additionally, the implementation is able to scale to thousands of threads, due to highly optimized parallel relational algebra and communication primitives.

Evaluation

Implementation

Reading/Discussion Questions

Slog seems to allow nonterminating programs.
- Is this ever useful for Slog programs? (c.f. egraph might never saturate…)
- If not, is there a reasonable way to prevent it from happening? (halting problem…)
How does Slog relate to egglog? Both seem to treat equality specially, and do hash consing.
How does the ! operator actually work?