Backdoor in the Middle: Attacking Pipeline Parallelism
Most works on adversarial robustness in distributed training focus on data parallelism, e.g., poisoned gradients, malicious clients, or aggregation attacks. Pipeline parallelism has a different attack surface. Here, the model itself is partitioned across nodes.