The Art of Lagrange Multipliers: Navigating Optimization
Written on
Chapter 1: Understanding Optimization
Optimization stands as a fundamental issue in mathematical physics, economics, engineering, and numerous fields of applied mathematics. The essence of optimization involves identifying the maximum or minimum values of an objective function and determining the corresponding input values that yield these extremes. In basic single-variable calculus, this task is straightforward, typically requiring the identification of the x-values that result in a zero derivative (df/dx = 0).
However, multivariable optimization introduces complexity due to the presence of constraints, which can often be framed as g(x,y,z) = c, where g is a function and c is a constant. These constraints are referred to as equality constraints, and the points that satisfy these constraints form what is known as the feasible set, denoted as S. For instance, to maximize f(x,y) on the unit circle, we can use g(x,y) = x² + y², since the circle is defined by x² + y² = 1.
The conventional method employed to tackle such problems is the method of Lagrange multipliers. This article will explore the derivation of this method and illustrate its application through several examples.
This paragraph will result in an indented block of text, typically used for quoting other text.
Section 1.1: Motivation and Derivation
A function f reaches a local extremum (either a maximum or minimum) at point P when the partial derivatives of f concerning each independent variable equal zero, and the point is not a saddle point (a situation we will not focus on here). We can succinctly express this condition as ∇f = 0, where ∇ signifies the gradient operator.
In constrained optimization, however, just because f achieves a local maximum or minimum within a point in the feasible set does not imply that f attains its optimal value at that point.
For instance, consider the optimization scenario where we find that f has a local minimum at (0,1) with f(0,1) = 0, but f(0,-1) = -4. This demonstrates that while (0,1) is a local minimum of f(x,y), it does not represent the minimum value on the unit circle.
As another illustration, let's maximize f(x,y) = y on the unit circle. Here, the gradient of f(x,y) = y is a constant vector, indicating that f(x,y) has no local maxima or minima. Yet, on the unit circle, f achieves a maximum value of 1 at the point (0,1).
The reason for this discrepancy is that the gradient evaluates the rate of change of f(x,y,z) across three-dimensional space, while our aim is to assess the rate of change of f(x,y,z) relative to its position within the feasible set. In three dimensions, S may represent a curve or surface, and generally, for n-variable problems, S will form a hypersurface of a dimension less than n.
The most intuitive approach is to analyze curves within S. Let P be a point in S where f(P) achieves a locally optimal value, even if ∇f(P) is not necessarily zero. Let r(t) represent the parameterization of any curve in S passing through P.
If S is a curve, then r(t) serves as its parameterization. Moreover, if we let r(T) = P, with r′(T) ≠ 0, we can define h(t) = f(r(t)), ensuring that h′(T) = 0. Using the chain rule, we can expand h′(t):
Since r(t) remains in S, the velocity vector r′(t) must be tangent to the surface or curve of S. Consequently, ∇f(P) is perpendicular to any tangent vector at P. Thus, if P is where f reaches a maximum or minimum within S, then ∇f(P) is perpendicular to S.
Conversely, since r′(T) is non-zero and always parallel to S, if ∇f(P) is perpendicular to S, then h′(T) = ∇f(P) ∙ r′(T) = 0.
But how do we locate points where ∇f is perpendicular to S? The answer lies in recognizing that S comprises all points where g(x,y,z) = c; thus, S forms a level set of g. This implies that ∇g is perpendicular to S for every point in S, establishing that ∇f and ∇g are parallel at points in S where f achieves optimal values. Therefore, f reaches optimal values in S precisely when ∇f = λ∇g for some constant λ, known as the Lagrange undetermined multiplier—hence the method's name.
To solve the optimization problem for f, we must now identify four unknowns x, y, z, and λ that satisfy these four equations.
In other words, our objective shifts to optimizing the function f - λg regarding x, y, and z without constraints, by locating where ∇(f - λg) = 0.
What occurs when multiple constraints are present? Suppose we need to optimize f(x,y,z) subject to g(x,y,z) = c and h(x,y,z) = d. The approach remains similar. To optimize f given g(x,y,z) = c, we substitute f with f - λg, addressing the first constraint, and then we optimize f - λg subject to the constraint h(x,y,z) = d. This results in a single constraint problem, so we introduce an additional Lagrange multiplier μ, transforming our problem into optimizing f - λg - μh without constraints.
Let's proceed with some examples.
Section 1.2: Maximizing Area of a Rectangle
Consider the following question: What is the maximum area of a rectangle given a fixed perimeter? This can be formulated as a Lagrange multiplier problem. Let x be the width and y the height; our goal is to maximize f(x,y) = xy under the constraint g(x,y) = 2x + 2y = c. The resulting system of equations will guide us to the solution.
The first two equations reveal that x = y, indicating that the maximum area occurs when the rectangle is a square. Substituting this into the third equation yields x = y = c/4, thus the maximum area of a rectangle with perimeter c is c²/16.
Now, let's examine a modified Putnam problem that presents its own challenges.
Chapter 2: Exploring the Modified Putnam Problem
The Putnam mathematical competition is a challenging exam for undergraduate students held each December. Participants have six hours to solve twelve complex problems, resulting in an average score often near zero. In our discussion, we'll adapt problem A3 from the 2018 exam for simplicity.
This problem involves two significant hurdles. First, the objective function comprises a sum of cosines, which can complicate matters. Second, the constraint function does not integrate neatly into the objective function, which can be misleading for those familiar with high school math competitions that emphasize trigonometric identities.
On encountering a problem demanding "maximize f subject to g = 0," we should immediately consider the method of Lagrange multipliers. Setting up our system of equations allows us to apply trigonometric identities effectively, using the double-angle formula.
The outcome leads us to:
Now, we can cancel the sine function on both sides and sum the equations, revealing that λ = 0. As a consequence, the values of w, x, y, and z correspond to integer multiples of π/2, leading us to three distinct scenarios for satisfying the constraint equation.
In conclusion, the application of Lagrange multipliers extends beyond optimizing functions; it can also be utilized to optimize functionals.
Section 2.1: Thomson's Theorem
Rather than merely optimizing a function, we can employ the Lagrange multipliers method to optimize a functional, such as definite integrals. This demonstration will illustrate how this method can determine the function that minimizes a definite integral.
We will explore Thomson's Theorem, which states that a conductor in electrostatic equilibrium exhibits no electric field within its body. This theorem posits that the energy of a system in stable mechanical equilibrium is minimized, leading us to consider the charge distribution function ρ(r) and the potential φₑₓₜ due to external sources.
The total electrostatic energy can be expressed as a functional U of ρ(r):
These integrals represent the energy distributions within the conductor. Our goal is to minimize U, subject to the total charge being independent of the charge distribution.
Fortunately, Lagrange multipliers can also be utilized for functionals with constraints. By introducing a Lagrange undetermined multiplier, we can transition from constrained optimization on U[ρ] to unconstrained optimization on the following functional:
By analogy with derivatives, we aim to find the charge distribution ρ such that when we vary ρ by a small change δρ, the change in U, denoted as δU, equals zero.
To illustrate this further, we can expand U[ρ + δρ], simplifying as necessary. Since δρ is minimal, we may disregard terms like (δρ)².
This leads us to a formulation that reveals the total potential at every point inside the conductor, thus demonstrating that ρ(r) is the distribution that maintains a constant potential throughout the conductor.
This approach holds valid even if the electrostatic energy of the conductor is maximized, which indicates unstable equilibrium. However, in practice, conductors rarely exist in such states, as thermal motion quickly drives the system back into stable equilibrium.
Conclusion
This article has delved into the intricacies of Lagrange multipliers within mathematical physics. If you found this discussion engaging, consider exploring my other articles on Laplace's equation, the heat equation, and Feynman integration.
All images not cited are my original works. The second example is derived from problem A3 of the 2018 Putnam Exam, and the proof in the third example is based on the discussion in Zangwill's "Modern Electrodynamics" (2013). These examples adhere to fair use guidelines.
EDIT: I revised the second example after posting due to significant issues I identified.