Page 02: Convexity

MATH 3850; What makes some optimization problems easy and others nightmarishly hard

PART A

The Big Idea: Bowls vs. Bumpy Landscapes

Analogy: Finding the lowest point in the dark.

Imagine you're dropped into a landscape at night with a flashlight that only shows the ground right under your feet.

Convex vs non-convex functions
Left (convex): $f(x) = x^2$ is a bowl. ONE minimum at the bottom. The red dashed line connects two points on the curve; notice the curve is always BELOW the line. That's the visual test for convexity.

Right (non-convex): A bumpy function with multiple valleys. Gradient descent might find the purple dot (local minimum) instead of the yellow star (global minimum). It depends on where you start. This is why non-convex problems are hard.
Convex (bowl) Curve always below the line Non-convex (bumpy) local min global min Curve pokes ABOVE the line
PART B

The Formal Definition (and What It Actually Means)

A function $f$ is convex if for any two points $x$ and $y$, and any $\alpha \in [0, 1]$:

$$f(\alpha x + (1-\alpha)y) \leq \alpha f(x) + (1-\alpha)f(y)$$

This looks scary but it's just saying: "the curve lies below any straight line drawn between two of its points."

$f$
The function we are testing for convexity (e.g., $f(x) = x^2$)
$x, y$
Any two points in the domain; we pick them freely
$\alpha$
A blending weight between 0 and 1; controls where we sit on the line segment from $x$ to $y$
$\alpha x + (1-\alpha)y$
A point partway between $x$ and $y$ (the "midpoint" when $\alpha = 0.5$)
Left side
$f$ evaluated at that in-between point: the actual height of the curve
Right side
The corresponding point on the straight line connecting $f(x)$ and $f(y)$
$\leq$
Convexity means the curve never rises above the line

In plain English:

Hand Computation: Is $f(x) = x^2$ convex?

We verify the convexity inequality by picking two concrete points and checking that the function value at the midpoint is below the line connecting the function values at those points.

Test with $x = -2$, $y = 3$, $\alpha = 0.5$ (midpoint):

Left side: $f(\alpha x + (1-\alpha)y)$; "value of f at the midpoint"

Midpoint = $0.5(-2) + 0.5(3) = -1 + 1.5 = 0.5$

$f(0.5) = (0.5)^2 = $ $0.25$

Right side: $\alpha f(x) + (1-\alpha)f(y)$; "midpoint of the LINE between f values"

$f(-2) = 4, \quad f(3) = 9$

$0.5(4) + 0.5(9) = 2 + 4.5 = $ $6.5$

Check: Is left $\leq$ right?

$0.25 \leq 6.5$   ✓ YES

The curve at the midpoint (0.25) is way below the line at the midpoint (6.5). Convex!

In your course, $f(x) = \frac{1}{2}x^TQx$ with $Q$ positive definite is always convex. Every assignment you've done (A1, A2, Quiz 4, Quiz 6) used convex quadratics; that's why gradient descent always found the minimum.

PART C

The Quick Test: Second Derivative

Analogy: Holding a bowl vs. a hat.

Cup your hands upward (like holding water); that's a convex function (curves up). Turn your hands downward (like wearing a hat); that's concave (curves down). The second derivative tells you which one you have.

For one variable:

$f''(x) \gt 0$ everywhere ⇒ convex (curves up, like a bowl)

$f''(x) \lt 0$ everywhere ⇒ concave (curves down, like a hat)

$f''(x)$
The second derivative of $f$; measures how fast the slope is changing (curvature)
$\gt 0$
Curvature is positive everywhere, so the function curves upward like a bowl
$\lt 0$
Curvature is negative everywhere, so the function curves downward like a hat

For multiple variables:

Hessian $\nabla^2 f(x)$ is positive definite everywhere ⇒ convex

(Positive definite means all eigenvalues are positive; the surface curves "up" in every direction)

Examples from your assignments:

A2: $Q = \begin{bmatrix}2&0\\0&1\end{bmatrix}$; eigenvalues are 2 and 1, both positive ⇒ convex

Quiz 6: $Q = \begin{bmatrix}2&0\\0&3\end{bmatrix}$; eigenvalues are 2 and 3, both positive ⇒ convex

PART D

See It in 3D

3D convex vs non-convex surfaces
Left: A convex function in 3D; a smooth bowl with one valley. Gradient descent from ANY starting point rolls to the yellow dot at the bottom. Guaranteed.

Right: A non-convex function in 3D; like an egg carton. Many little valleys everywhere. Gradient descent rolls into whichever valley is nearest, which might not be the deepest one. This is what neural network loss functions look like.
PART E

Why This Matters for the Adam Paper

Adam's THEORY (Section 4)

  • Assumes the problem is convex
  • Proves a regret bound: Adam's error decreases like $O(1/\sqrt{T})$
  • This means: for bowl-shaped problems, Adam is provably good

Adam's PRACTICE (Section 6)

  • Used on non-convex problems (neural networks)
  • Theory doesn't guarantee it works here
  • But empirically, Adam works great anyway
  • This gap between theory and practice is normal in ML

The practical takeaway

Your assignments (A1, A2, Quiz 4, Quiz 6) all used convex quadratics; nice bowl-shaped functions. Adam's convergence proof (Theorem 4.1) applies to these.

The paper's experiments (Section 6) test Adam on non-convex problems like neural networks. Adam still works well; the theory just can't prove WHY yet. For your demo, you can say: "The theory proves Adam works on convex problems. Empirically, it works on non-convex too."

Why were all your assignments convex? Because convex problems are "solved"; we have strong guarantees. The course builds your understanding on solid ground before tackling the messy real world. Adam bridges that gap: proven on convex, practical on everything.

Glossary; Convexity

Convex function
Bowl-shaped. Any local minimum is THE global minimum. The curve always lies below any line drawn between two of its points. Safe for optimization.
Non-convex function
Bumpy landscape. Multiple local minima. Gradient descent might get stuck in a valley that isn't the deepest. Neural networks are non-convex.
Local minimum
A valley floor; lower than everything nearby. But there might be a deeper valley elsewhere.
Global minimum
THE lowest point of the entire function. What we actually want to find.
Positive definite
A matrix property meaning "the surface curves upward in every direction." Guarantees convexity for quadratic functions. Check: all eigenvalues positive.
Hessian $\nabla^2 f$
The matrix of second partial derivatives. For your 2D assignments: a 2×2 matrix. If positive definite → convex. If indefinite → saddle point.