MATH 3850; What makes some optimization problems easy and others nightmarishly hard
PART AAnalogy: Finding the lowest point in the dark.
Imagine you're dropped into a landscape at night with a flashlight that only shows the ground right under your feet.
A function $f$ is convex if for any two points $x$ and $y$, and any $\alpha \in [0, 1]$:
$$f(\alpha x + (1-\alpha)y) \leq \alpha f(x) + (1-\alpha)f(y)$$
This looks scary but it's just saying: "the curve lies below any straight line drawn between two of its points."
In plain English:
We verify the convexity inequality by picking two concrete points and checking that the function value at the midpoint is below the line connecting the function values at those points.
Test with $x = -2$, $y = 3$, $\alpha = 0.5$ (midpoint):
Left side: $f(\alpha x + (1-\alpha)y)$; "value of f at the midpoint"
Midpoint = $0.5(-2) + 0.5(3) = -1 + 1.5 = 0.5$
$f(0.5) = (0.5)^2 = $ $0.25$
Right side: $\alpha f(x) + (1-\alpha)f(y)$; "midpoint of the LINE between f values"
$f(-2) = 4, \quad f(3) = 9$
$0.5(4) + 0.5(9) = 2 + 4.5 = $ $6.5$
Check: Is left $\leq$ right?
$0.25 \leq 6.5$ ✓ YES
The curve at the midpoint (0.25) is way below the line at the midpoint (6.5). Convex!
In your course, $f(x) = \frac{1}{2}x^TQx$ with $Q$ positive definite is always convex. Every assignment you've done (A1, A2, Quiz 4, Quiz 6) used convex quadratics; that's why gradient descent always found the minimum.
Analogy: Holding a bowl vs. a hat.
Cup your hands upward (like holding water); that's a convex function (curves up). Turn your hands downward (like wearing a hat); that's concave (curves down). The second derivative tells you which one you have.
For one variable:
$f''(x) \gt 0$ everywhere ⇒ convex (curves up, like a bowl)
$f''(x) \lt 0$ everywhere ⇒ concave (curves down, like a hat)
For multiple variables:
Hessian $\nabla^2 f(x)$ is positive definite everywhere ⇒ convex
(Positive definite means all eigenvalues are positive; the surface curves "up" in every direction)
Examples from your assignments:
A2: $Q = \begin{bmatrix}2&0\\0&1\end{bmatrix}$; eigenvalues are 2 and 1, both positive ⇒ convex
Quiz 6: $Q = \begin{bmatrix}2&0\\0&3\end{bmatrix}$; eigenvalues are 2 and 3, both positive ⇒ convex
Your assignments (A1, A2, Quiz 4, Quiz 6) all used convex quadratics; nice bowl-shaped functions. Adam's convergence proof (Theorem 4.1) applies to these.
The paper's experiments (Section 6) test Adam on non-convex problems like neural networks. Adam still works well; the theory just can't prove WHY yet. For your demo, you can say: "The theory proves Adam works on convex problems. Empirically, it works on non-convex too."