[0. Summary]

1. Why does Helen use the ADMM formulation of the training algorithm instead of SGD? 

2. If the MPC (Sec 6.4) running among the parties were to take as input every record/sample of each participant's dataset, it would be very slow. How does Helen avoid scaling the MPC coordination in $n$?