Subgradient Methods for the Lagrangian Dual

Maximize the Lagrangian dual function $L(\lambda)=\min_{x\in X} c^T x + \lambda^T(b-Ax)$ using subgradient methods. Covers computing subgradients of $L,$ the projected subgradient update, and step size rules including the Polyak rule.

Step 1 of 157%

Tutorial

Subgradients of the Lagrangian Dual

Subgradients of the Lagrangian Dual

For a primal problem

\min_{x\in X}\; c^T x\quad\text{s.t.}\quad Ax \geq b,

dualizing the complicating constraints with multipliers $\lambda \geq 0$ produces the Lagrangian dual function

L(\lambda) = \min_{x \in X}\Big\{ c^T x + \lambda^T(b - Ax) \Big\}.

The function $L$ is concave -- it is the pointwise minimum of affine functions of $\lambda$ -- but typically not differentiable: the inner minimizer $x(\lambda)$ jumps discretely as $\lambda$ varies, so $L$ has kinks.

At such points we replace the gradient with a subgradient: a vector $g$ such that

L(\mu) \leq L(\lambda) + g^T(\mu - \lambda)\quad\text{for all } \mu.

Key fact. If $x(\lambda)$ is any optimal solution of the inner Lagrangian subproblem at $\lambda,$ then

g(\lambda) \;=\; b - A\,x(\lambda)

is a subgradient of $L$ at $\lambda.$

In words, the subgradient is the residual of the dualized constraint evaluated at $x(\lambda).$ A positive residual means the constraint is currently violated, so increasing $\lambda$ locally raises $L.$

Illustration. If the dualized constraint is $x_1 + x_2 + x_3 \geq 2$ and the inner minimizer is $x(\lambda) = (1, 0, 0),$ then

g(\lambda) = 2 - (1 + 0 + 0) = 1.