CS 294-163: Decentralized Security: Theory and Systems Fall 2019 Algorand Guest lecture by Derek Leung target length: 80 minutes (short breaks after {30, 50} min) Notes (1 min) ============= plan: i'll cover three different things in each third of the lecture: 1. background: BFT (covered earlier in class) 2. internet-scale BFT (the paper) 3. practical lessons (learned while implementing) lecture outline will be online for reference Cryptocurrency review (10 min) ============================== cryptocurrency goals: - allow users to exchange money - in a decentralized way requirements: - security - stealing money - "double spending problem" - decentralized and scalable - distributed system, internet scale stealing money: solved problem; use digital signatures model: state machine has giant dictionary mapping pk -> money transitions are txs double spending: need to agree on _ordering_ of transactions! state machine-operation log duality another way to think of cryptocurrency: as a _decentralized_, _highly-consistent_ database Agreeing on ordering (20 min total) =================================== Byzantine Generals Problem (Lamport, Shostak, Pease) (5 min) ------------------------------------------------------------ assume: - authenticated channels - f bad actors (goal: either break consistency or availability) - message delays are unbounded fundamental result: requires >=2f+1 good actors (>=3f+1 total) to ensure consistency def: _quorum_ is any 2f+1 actors why not f+1 (majority)? because _slow_ actors indistinguishable from _bad_ actors actor id: 1 2 3 (let f=1) suppose the actors say: 1 1 0 actor 1's view: 1 1 X (drop) can we commit to "1"? no: actor 2 may be malicious and _equivocate_: actor 3's view: X 0 0 exercise: prove that 2f+1 necessary and sufficient (intuition) why 2f+1 sufficient? because even if f bad actors, at least f+1 (i.e., majority of honest actors) see correct state. PBFT review (5 min) -------------------- general solution in byzantine fault tolerance algorithms - solves exactly this problem! agree on operation log - big result: PBFT paper - it is practical (Castro, Liskov) overview of PBFT: 1. designated leader listens for request from client. leader broadcasts request; i.e., proposes 2. everyone listens for signed proposal everyone on network broadcasts signed ack 3. everyone listens for quorum on (3), i.e. the ack everyone on network broadcasts ack of ack 4. leader listens for quorum on (4), i.e., the ack of the ack leader replies to client with quorum, which _proves_ correct commitment if this fails, attempt to recover leadership rotates if request from client is dropped after timeout recovery protocol is quite complicated! (see Mickens for entertaining commentary) exercise: prove that we cannot get rid of either step 2. or 3. (solution to this in literature, or contact me) Decentralized BFT (10 min) -------------------------- PBFT benefits - proof of commitment - fairly fast - high tolerance for bad actors can we use PBFT to solve transaction ordering on cryptocurrency? not by itself. problems: 1. tx proposer can equivocate: quadratic communication complexity possibly 1M users on internet: can't scale! solution: perform sample of users (the _committee_) 2. how to sample? need sybil resistence solution: proof of stake, recursively publish public keys + stake and use to establish weights for next tx 3. DoS'ing the committee want non-interactive sampling mechanism (_sortition_ in paper) solution: modify protocol: each user sends one message (see paper; slightly outdated) - leader election tricky--must elect many leaders solution: ephemeral signature for votes (see paper) - idea similar to TLS certificates: hierarchical keys solution: verifiable random function > Break (2 min) < Scaling agreement to the Internet (20 min total) ================================================ Non-interactive sampling and Verifiable Random Functions (10 min) ----------------------------------------------------------------- how to sample committees? naive approach: distributed random generation 1. everyone generates random bitstring Si 2. everyone commits H(Si) 3. after all H(Si) published, reveal Si "unbiased" bitstring is then S = S1 + S2 + ... + SN ("+" is "xor") you are selected if H(S || i) < difficulty problem: DoS the system until you get a favorable S by refusing to reveal def: _verifiable random function_ (Micali, Rabin, Vadhan) 3 algorithms: 1. keygen -> (pk, sk) 2. hash(sk, data) -> output, proof_output 3. verify(pk, data, output, proof_output) -> ok properties: - output is uniformly distributed - output is unpredictable without knowledge of sk - given (pk, data, proof_output), verify is efficient - (practical) prove, keygen are also efficient implemented via Schnorr proof over curve25519, standardizing @ IETF (Goldberg et al.) now S_(block k) = VRF(leader secret, S_(block k-1)) pick leaders and committee members for block k by checking if VRF(i, S_(block k)) < difficulty VRF bias and reasoning about committee size (10 min) --------------------------------------------------- what is the power of the attacker? target cryptographic security (2^128 tries) P(attacker wins leadership enough to control S) - need to reason about this (not covering here; see paper for details. involves markov chain-like analysis) BFT is safe (but not live) against attacker winning leadership: committees check result (recovery protocol also handles case where no leaders are selected) essential for committees to be honest (in aggregate)! P(attacker wins committee)? suppose someone owns N coins coin gives you a vote if VRF(...) < difficulty where difficulty scales with committee size what is max power of attacker? binomial distribution over coins N coins, with p = 1/total_money_supply need committees to be honest (recall the quorum) analyze binomial distribution over two variables: honest and dishonest money pick committee size such that chance of compromise negligible safety of committees important; liveness less so can use this to tune committee size computing the binomial right in practice is messy > Break (2 min) < Practical lessons we learned (20 min) ===================================== everything must be committed to forever triaging severity of problems: 0. secret key compromise - unfixable - party's over, pack up and go home 1. corruption of state - e.g., money-printing bug 2. forks 3. denial-of-service: permanent 4. denial-of-service: temporary (0-2) are really bad (permanent damage) build system in such a way that we failover into (3) reducing risk of secret key compromise - multisignature - support physical isolation - use delegated key for BFT voting (c.f. physical attacks on Bitcoin keys) be careful with security guarantees! attacker gets to generate vrf secret for voting attacker can manipulate vrf as follows: - split money across many coins - register *same* pk for each coin - either no coins are selected, or all are - attacker wins the same in expectation - but distribution is terrible: attacker occasionally wins whole committee! - solution: hash pk into vrf output how to ensure security of code? BFT algorithm is fairly complex (ref. Mickens joke) how to debug failures in field? some failures take ~10 steps trick: use single thread (goroutine actually); record and replay - caveat: cryptography is expensive - solution: global thread pools; prioritization of voting over tx verification handling untrusted data every tx is potentially attacker-controlled - use Go type system strategically (verified vs. unverified types; type for money) - e.g., integer overflow in txs (c.f. Bitcoin overflow bugs) - avoid raw math; wrap ops in checked functions version upgrades - bad encodings are dangerous performance - critical factor: in-node, end-to-end tx verification latency - batch txs in blocks - block size trades off betwen lowest latency and throughput as BFT has constant overhead - expected bottlenecks: network, crypto - surprise: disk reads on random accounts take up a lot of time - use SSDs - surprise: encoding takes up a lot of time (serializing, deserializing) - batching decisions - should we cut up blocks? maybe - difficulty of making applicable benchmarks - better perf => complexity => harder to reason about security fun bugs we ran into - (GitHub bug #29000) go runtime heap corruption?? - sqlite serializability doesn't work?? Conclusion (5 min) ================== lot of unusual problems in this space pro: unusual solutions con: hard to tell whether something is important this is a cool class about me > Questions (rest of class) <