Partition of Unity Constraints for Sparse Quadratic Matrix Programs

Alec Jacobson

February 22, 2024

weblog/

This note demonstrates how to efficiently enforce partition of unity constraints for a specific class of convex quadratic matrix optimizations. Compared to vectorization and Lagrange multiplier method, this approach requires solving against a smaller (positive definite) system. Compared to vectorization and the nullspace method, this approach maintains sparsity.

Problem Statement

We consider equality constrained optimizations where we seek a matrix \(\mathbf{X}∈\mathbb{R}^{n×k}\) that minimizes a sum of quadratic cost functions of its \(k\) columns while ensuring partition of unity across its \(n\) rows: \begin{equation} \label{equ:sum} \min_\mathbf{X}\, ∑_{j=1}^k (\frac{1}{2}\mathbf{x}_j^\top \mathbf{Q}\mathbf{x}_j - \mathbf{x}_j^\top \mathbf{f}_j) \ \ \text{ subject to } ∑_{j=1}^k x_{ij} = 1 \ ∀ i∈[1,n], \tag{1} \end{equation} where \(\mathbf{x}_j\) is the \(j\)th column of \(\mathbf{X}\). The linear coefficients \(\mathbf{f}_j ∈ \mathbb{R}^n\) may vary for each column, but the quadratic coefficients must be the same symmetric positive definite matrix \(\mathbf{Q}∈ \mathbb{R}^{n×n}\).

Finding the unconstrained solution, each column of \(\mathbf{X}\) could be solved independently and could reuse a factorization of the (possibly sparse) \(n×n\) \(\mathbf{Q}\) matrix (i.e., \(\mathbf{x}_j = \mathbf{Q}^{-1} \mathbf{f}_j\)). However, the constraints in Equation (\ref{equ:sum})  couple the columns together.

Let us write the problem in Equation (\ref{equ:sum}) in matrix form: \[\begin{aligned} \min_\mathbf{X} \, \mathop{\text{tr}\left(\frac{1}{2}\mathbf{X}^\top\mathbf{Q}\mathbf{X}- \mathbf{X}^\top \mathbf{F}\right)} \quad \text{subject to } \mathbf{X}\unicode{x1D7D9}_k = \unicode{x1D7D9}_n \end{aligned}\] where \(\unicode{x1D7D9}_k ∈ \mathbb{R}^k\) is a vector of \(k\) ones.

If we write the Lagrangian \[\begin{aligned} \mathcal{L}(\mathbf{X},λ) = \min_\mathbf{X} \, \mathop{\text{tr}\left(\frac{1}{2}\mathbf{X}^\top\mathbf{Q}\mathbf{X}- \mathbf{X}^\top \mathbf{F}+ λ\left(\mathbf{X}\unicode{x1D7D9}_k - \unicode{x1D7D9}_n\right)^\top\right)}, \end{aligned}\] where \(λ∈\mathbb{R}^n\) is a vector of Lagrange multiplier values, then we can identify the optimal values for \(\mathbf{X}\) and \(λ\) as the solution to the system of equations formed by setting all partial derivatives to zero: \begin{align} \label{equ:dLdX} \frac{∂\mathcal{L}}{∂\mathbf{X}} & = \mathbf{Q}\mathbf{X}- \mathbf{F}+ λ \unicode{x1D7D9}_k^\top = 0, \tag{2}\\ \label{equ:dLdl} \frac{∂\mathcal{L}}{∂λ} & = \mathbf{X}\unicode{x1D7D9}_k - \unicode{x1D7D9}_n = 0. \tag{3} \end{align} We can solve Equation (\ref{equ:dLdX}) to express \(\mathbf{X}\) in terms of \(λ\): \begin{align} %\Q \X - \F - \one_k λ^\top = 0 %\Q \X = + \F + \one_k λ^\top \label{equ:X} \mathbf{X}= \mathbf{Q}^{-1}\left( \mathbf{F}- λ \unicode{x1D7D9}_k^\top\right), \tag{4} \end{align} and then substitute this Equation (\ref{equ:X}) into (\ref{equ:dLdl}) to solve for \(λ\): \begin{align} \mathbf{Q}^{-1}\left( \mathbf{F}- λ \unicode{x1D7D9}_k^\top\right)\unicode{x1D7D9}_k - \unicode{x1D7D9}_n &= 0, \\ \label{equ:l} λ & = \frac{1}{k}\mathbf{F}\unicode{x1D7D9}_k - \frac{1}{k}\mathbf{Q}\unicode{x1D7D9}_n. \tag{5} \end{align} Substituting Equation (\ref{equ:l}) back into (\ref{equ:X}), we identify the optimal \(\mathbf{X}\): \[\begin{aligned} \mathbf{X}&= \mathbf{Q}^{-1}\left( \mathbf{F}+ \left( \frac{1}{k}\mathbf{F}\unicode{x1D7D9}_k - \frac{1}{k}\mathbf{Q}\unicode{x1D7D9}_n \right) \unicode{x1D7D9}_k^\top\right), \\ \text{or more simply} \\ \mathbf{X}&= \mathbf{Q}^{-1}\left(\mathbf{F}- \frac{1}{k} \mathbf{F}\unicode{x1D7D9}_{kk}\right) - \frac{1}{k} \unicode{x1D7D9}_{nk}, \end{aligned}\] where \(\mathbf{F}- \frac{1}{k} \mathbf{F}\unicode{x1D7D9}_{kk}\) subtracts the average over the columns of \(\mathbf{F}\) from each of its columns and \(-\frac{1}{k} \unicode{x1D7D9}_{nk}\) subtracts the scalar \(\frac{1}{k}\) from each entry. The sparse (Cholesky) factorization of the \(n×n\) positive definite matrix \(\mathbf{Q}\) can be reused to solve against the \(k\) columns on its right hand side.

Pitfalls of Vectorization

It is tempting to vectorize Equation (\ref{equ:sum}) by stacking columns of \(\mathbf{X}\): \begin{align} \label{equ:vec} \min_{\mathbf{x}_1, …, \mathbf{x}_k} & \frac{1}{2} %\begin{bmatrix}\x_1^\top & … & \x_k^\top \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix}^\top \begin{bmatrix} \mathbf{Q}& & \\ & \ddots & \\ & & \mathbf{Q} \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix} - \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix}^\top \begin{bmatrix}\mathbf{f}_1 \\ \vdots \\ \mathbf{f}_k \end{bmatrix} \tag{6} \\ \text{subject to } & \begin{bmatrix} \mathbf{I}_n & \dots & \mathbf{I}_n \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix} = \unicode{x1D7D9}_n \end{align} where \(\mathbf{I}_n ∈ \mathbb{R}^{n×n}\) is the identity matrix.

Applying the Lagrange multiplier method, we can identify the minimizer as a solution to the KKT system: \begin{equation} \begin{bmatrix} \mathbf{Q}& & & \mathbf{I}_n \\ & \ddots & & \vdots \\ & & \mathbf{Q}& \mathbf{I}_n \\ \mathbf{I}_n & … & \mathbf{I}_n & \\ \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \\ λ \end{bmatrix} = \begin{bmatrix}\mathbf{f}_1 \\ \vdots \\ \mathbf{f}_k \\ \unicode{x1D7D9}_n \end{bmatrix}, \end{equation} where \(λ∈\mathbb{R}^n\) is the vector of auxiliary Lagrange multipliers. However, (as written) this requires solving against a \(n(k+1) × n(k+1)\) matrix. Moreover, even if \(\mathbf{Q}\) is positive definite, this KKT system will be indefinite, requiring less efficient factorization methods (e.g., LDLT instead of Cholesky).

Another possibility would be to build a matrix \(\mathbf{N}∈ \mathbb{R}^{nk × n(k-1)}\) that spans the nullspace of the vectorized constraints, so that \begin{align} \label{equ:y} \begin{bmatrix} \mathbf{I}_n & \dots & \mathbf{I}_n \end{bmatrix} \mathbf{N}\mathbf{y}+ \tilde{\mathbf{x}} = \unicode{x1D7D9}_n, ∀ \mathbf{y}∈ \mathbb{R}^{n(k-1)}, \tag{7} \end{align} where \(\tilde{\mathbf{x}} ∈ \mathbb{R}^{nk}\) is some feasible solution, for example \(\tilde{\mathbf{x}} = \text{vec}(\frac{1}{k}\unicode{x1D7D9}_{nk})\).

Substituting Equation (\ref{equ:y}) into (\ref{equ:vec}) and solving for \(\mathbf{y}\) reveals: \[\mathbf{y}= \left( \underbrace{ \mathbf{N}^\top \begin{bmatrix} \mathbf{Q}& & \\ & \ddots & \\ & & \mathbf{Q} \end{bmatrix}\mathbf{N}}_{\tilde{Q}}\right)^{-1} \begin{bmatrix}\mathbf{f}_1 \\ \vdots \\ \mathbf{f}_k \end{bmatrix}.\] Although the matrix \(\tilde{\mathbf{Q}} ∈ \mathbb{R}^{n(k-1)×n(k-1)}\) is smaller, it will be dense even if \(\mathbf{Q}\) and \(\mathbf{N}\) are sparse, ruling out efficient sparse direct solvers.