Often I end up minimizing a quadratic objective subject to linear equality constraints:
min ½x'Qx subject to Ax=b
This can be achieved with a single linear system solve via the Lagrange multiplier method. You add an auxiliary variable for each constraint in a vector λ
and find the solution to:
saddle_x,λ ½x'Qx + λ'(Ax - b)
setting gradients with respect to each x
and λ
to zero leads to the linear system of Karush–Kuhn–Tucker conditions:
/ Q A' \ / x \ = / 0 \
\ A 0 / \ λ / \ b /
The second equation ensures that the constraint Ax = b
is satisfied and the top equation uses A'λ
to pull the solution away from Qx=0
(the solution of the unconstrained problem).
What happens if we change that 0
in the bottom right corner of the system matrix to an identity matrix? Or rather a scaled identity matrix?
I'll consider a special choice of placing an Identity matrix scaled by -1/ω
:
/ Q A' \ / x \ = / 0 \ (1)
\ A -1/ω I / \ λ / \ b /
Solving the bottom equation for λ we have:
λ = ω(Ax - b)
Plugging this into the top equation we have
(Q + ωA'A) x = ω A' b
This looks familiar! Returning to our original minimization problem, we could have chosen to enforce the constraint Ax=b in a least squares sense, say, with a weight of ω
:
min ½x'Qx + ω/2 ‖Ax - b‖²
Finding the minimizer results from setting the gradient of this energy with respect to x to zero and solving the system:
(Q + ωA'A) x = ω A' b
So, algebraically these methods are the same. Solving the direct penalty method involves a |x| by |x| system, where as solving the KKT system in (1)
is a larger, |x|+|b| by |x|+|b| system.
However, if ω
is very large or the conditioning on A'A
is poor then adding + ωA'A
might blow the lid off your numerical solver (supposedly; I haven't actually confirmed this on my own examples, but read it).
Solving the larger system involving 1/ω I
and A
instead should have better conditioning and stabler numerics.