CS 294-163: Decentralized Security: Theory and Systems
Fall 2019

Algorand
Guest lecture by Derek Leung

target length: 80 minutes (short breaks after {30, 50} min)


Notes (1 min)
=============

plan: i'll cover three different things in each third of the lecture:
 1. background: BFT       (covered earlier in class)
 2. internet-scale BFT    (the paper)
 3. practical lessons     (learned while implementing)
lecture outline will be online for reference


Cryptocurrency review (10 min)
==============================

cryptocurrency goals:
 - allow users to exchange money
 - in a decentralized way

requirements:
 - security
    - stealing money
    - "double spending problem"
 - decentralized and scalable
    - distributed system, internet scale

stealing money: solved problem; use digital signatures
model: state machine has giant dictionary mapping pk -> money
transitions are txs

double spending: need to agree on _ordering_ of transactions!
state machine-operation log duality

another way to think of cryptocurrency:
as a _decentralized_, _highly-consistent_ database


Agreeing on ordering (20 min total)
===================================

Byzantine Generals Problem (Lamport, Shostak, Pease) (5 min)
------------------------------------------------------------

assume:
 - authenticated channels
 - f bad actors (goal: either break consistency or availability)
 - message delays are unbounded
fundamental result:
requires >=2f+1 good actors (>=3f+1 total) to ensure consistency

def: _quorum_ is any 2f+1 actors
why not f+1 (majority)? because _slow_ actors indistinguishable from _bad_ actors

actor id:               1  2  3     (let f=1)
suppose the actors say: 1  1  0
actor 1's view:         1  1  X     (drop)

can we commit to "1"? no: actor 2 may be malicious and _equivocate_:

actor 3's view:         X  0  0

exercise: prove that 2f+1 necessary and sufficient

(intuition)
why 2f+1 sufficient? because even if f bad actors,
at least f+1 (i.e., majority of honest actors) see correct state.


PBFT review (5 min)
--------------------

general solution in byzantine fault tolerance algorithms
 - solves exactly this problem! agree on operation log
 - big result: PBFT paper - it is practical (Castro, Liskov)

overview of PBFT:
1. designated leader listens for request from client.
   leader broadcasts request; i.e., proposes
2. everyone listens for signed proposal
   everyone on network broadcasts signed ack
3. everyone listens for quorum on (3), i.e. the ack
   everyone on network broadcasts ack of ack
4. leader listens for quorum on (4), i.e., the ack of the ack
   leader replies to client with quorum, which _proves_ correct commitment

if this fails, attempt to recover
leadership rotates if request from client is dropped after timeout
recovery protocol is quite complicated!

(see Mickens for entertaining commentary)

exercise: prove that we cannot get rid of either step 2. or 3.
(solution to this in literature, or contact me)


Decentralized BFT (10 min)
--------------------------

PBFT benefits
 - proof of commitment
 - fairly fast
 - high tolerance for bad actors
can we use PBFT to solve transaction ordering on cryptocurrency? not by itself.

problems:
1. tx proposer can equivocate: quadratic communication complexity
   possibly 1M users on internet: can't scale!
   solution: perform sample of users (the _committee_)
2. how to sample?
   need sybil resistence
   solution: proof of stake, recursively
             publish public keys + stake and use to establish weights for next tx
3. DoS'ing the committee
   want non-interactive sampling mechanism                (_sortition_ in paper)
   solution: modify protocol: each user sends one message (see paper; slightly outdated)
    - leader election tricky--must elect many leaders
   solution: ephemeral signature for votes                (see paper)
    - idea similar to TLS certificates: hierarchical keys
   solution: verifiable random function


> Break (2 min) <


Scaling agreement to the Internet (20 min total)
================================================

Non-interactive sampling and Verifiable Random Functions (10 min)
-----------------------------------------------------------------

how to sample committees?
naive approach: distributed random generation
1. everyone generates random bitstring Si
2. everyone commits H(Si)
3. after all H(Si) published, reveal Si
"unbiased" bitstring is then S = S1 + S2 + ... + SN ("+" is "xor")
you are selected if H(S || i) < difficulty

problem: DoS the system until you get a favorable S by refusing to reveal

def: _verifiable random function_ (Micali, Rabin, Vadhan)
3 algorithms:
1. keygen -> (pk, sk)
2. hash(sk, data) -> output, proof_output
3. verify(pk, data, output, proof_output) -> ok

properties:
 - output is uniformly distributed
 - output is unpredictable without knowledge of sk
 - given (pk, data, proof_output), verify is efficient
 - (practical) prove, keygen are also efficient

implemented via Schnorr proof over curve25519, standardizing @ IETF (Goldberg et al.)

now S_(block k) = VRF(leader secret, S_(block k-1))
pick leaders and committee members for block k by checking if
VRF(i, S_(block k)) < difficulty


VRF bias and reasoning about committee size (10 min)
---------------------------------------------------

what is the power of the attacker?
target cryptographic security (2^128 tries)

P(attacker wins leadership enough to control S) - need to reason about this
(not covering here; see paper for details. involves markov chain-like analysis)

BFT is safe (but not live) against attacker winning leadership:
committees check result
(recovery protocol also handles case where no leaders are selected)
essential for committees to be honest (in aggregate)!

P(attacker wins committee)?
suppose someone owns N coins
coin gives you a vote if VRF(...) < difficulty
where difficulty scales with committee size

what is max power of attacker?
binomial distribution over coins N coins, with p = 1/total_money_supply
need committees to be honest (recall the quorum)
analyze binomial distribution over two variables: honest and dishonest money
pick committee size such that chance of compromise negligible

safety of committees important; liveness less so
can use this to tune committee size

computing the binomial right in practice is messy


> Break (2 min) <


Practical lessons we learned (20 min)
=====================================

everything must be committed to forever
triaging severity of problems:
0. secret key compromise
    - unfixable
    - party's over, pack up and go home
1. corruption of state
    - e.g., money-printing bug
2. forks
3. denial-of-service: permanent
4. denial-of-service: temporary
(0-2) are really bad (permanent damage)
build system in such a way that we failover into (3)

reducing risk of secret key compromise
 - multisignature
 - support physical isolation
 - use delegated key for BFT voting
(c.f. physical attacks on Bitcoin keys)

be careful with security guarantees!
attacker gets to generate vrf secret for voting
attacker can manipulate vrf as follows:
 - split money across many coins
 - register *same* pk for each coin
   - either no coins are selected, or all are
   - attacker wins the same in expectation
   - but distribution is terrible: attacker occasionally wins whole committee!
 - solution: hash pk into vrf output

how to ensure security of code?
BFT algorithm is fairly complex (ref. Mickens joke)
how to debug failures in field? some failures take ~10 steps
trick: use single thread (goroutine actually); record and replay
 - caveat: cryptography is expensive
 - solution: global thread pools; prioritization of voting over tx verification

handling untrusted data
every tx is potentially attacker-controlled
 - use Go type system strategically (verified vs. unverified types; type for money)
 - e.g., integer overflow in txs (c.f. Bitcoin overflow bugs)
    - avoid raw math; wrap ops in checked functions

version upgrades
 - bad encodings are dangerous

performance
 - critical factor: in-node, end-to-end tx verification latency
    - batch txs in blocks
    - block size trades off betwen lowest latency and throughput
      as BFT has constant overhead
 - expected bottlenecks: network, crypto
 - surprise: disk reads on random accounts take up a lot of time
    - use SSDs
 - surprise: encoding takes up a lot of time (serializing, deserializing)
 - batching decisions
    - should we cut up blocks? maybe
 - difficulty of making applicable benchmarks
 - better perf => complexity => harder to reason about security

fun bugs we ran into
 - (GitHub bug #29000) go runtime heap corruption??
 - sqlite serializability doesn't work??


Conclusion (5 min)
==================

lot of unusual problems in this space
pro: unusual solutions
con: hard to tell whether something is important

this is a cool class

about me


> Questions (rest of class) <