Optimization, Step 3: Extreme Value Theorem

We can finally get to the point. The extreme value theorem is a fundamental theorem about continuous functions that most engineers don’t learn. Most engineering math courses use what’s valuable and hence skip out on a lot of details covered by other courses. That’s okay, because the situation is being rectified here. The extreme value theorem is rather simple. Bernard Bolzano and Karl Weierstrass made a much more complete version that we will be using in higher dimensions. We’re restricting ourselves to the one dimensional case still, and so we’ll just call it the extreme value theorem for now.

The extreme value theorem says that if a function f is continuous on a closed and bounded interval in \mathbb R, then f attains both its minimum and maximum, at least once for each. We know what functions are; we also know what it means to be closed in \mathbb R (in a very non-rigorous sense) and what intervals are. What’s missing is a better definition for continuous. Two other important things are going to be introduced here in order to motivate you to read past Step 3 and move on to Step 4 (which will include the idea of convexity).

Thing 1: The extreme value theorem is actually a subset of two sub-theorems, which nobody really talks about: lower semi-continuous functions on closed bounded subsets of \mathbb R reach their minimums (at least once), and upper semi-continuous functions on closed bounded subsets of \mathbb R reach their maximums (at least once). Optimization is normally formulated to only concern itself with minimizing things because maximization problems can be turned into minimization problems by flipping them. (Maximize -x^2 is in a certain sense the same problem as minimize x^2.) A function is continuous if and only if it is both lower-semi-continuous and upper semi-continuous.

Thing 2: Compactness. Two famous mathematicians named Heine and Borel are famous for a theorem called the Heine-Borel theorem. Sets in \mathbb R^n are called compact if and only if they are closed and bounded. This is a post I decided to turn into Optimization, Step 4.

Ok, I’ve wet your whistle. Let’s get to the math.


Recall that a function being continuous is a property of topological spaces. Since metric spaces are a subset of topological spaces, it is sometimes easier to define continuity using metrics. We normally do the same for the definition of openness, neighbourhoods, etc. It’s okay to do this because I don’t optimize things in strict topological spaces that don’t have metrics. I want at least the structure that a metric gives us before I even think about optimizing, so it’s okay to formula neighbourhoods, openness, and continuity in the context of a metric space and it still be ok. metric-hilbert-spaces

There are many definitions of continuity, and they are all the same. They may not look the same, but they fundamentally are. First we need the definition of a ball as well as a neighbourhood (in the context of a metric space).

3.1 Definition: a ball
Let (A, d) be a metric space (recall A is the set, d is the metric, jointly referred to as a metric space). The open ball of radius r>0 centered at a point p \in A is defined by:
B_r(p):=\{ x \in A \text{ }|\text{ } d(x,p)<r \}.
Closed balls (with radius r centered at p) are:
B_r(p):=\{ x \in A \text{ }|\text{ } d(x,p) \leq r \}.

Balls in one dimension are just intervals on the line, centered at p with radius r out to each side. An open ball at =3 with radius 4 is: x \in (p-r,p+r) = (3-4, 3+4) = (-1, 7). (Why? d(x,p) := |x-p| < r, so (x-p) < r and -(x-p)<r leads to x < r+p and -x < r - p \Rightarrow x > p-r, the lower bound.) Closed balls include the endpoints, of course. So then what’s a neighbourhood? A neighbourhood is an open set that (we’ll call it V such that there is an open ball centered at p with radius r such that B_r(p) is contained in V. Here’s the formal definition.

3.2 Definition: a neighbourhood
Given a metric space (A,d), a set V is called a neighbourhood of a point p if there exists an open ball with centre p and radius r>0 such that:
B_r(p) \subsetneq V.

That is, it has to be a proper subset, i.e. fully contained within V. If a ball of any positive radius fits inside of V then $latex $V$ is a neighbourhood. Now, I repeat, these definitions are equivalent.

3.3a Definition: continuity at a point
A function f:A \rightarrow B is continuous at a point x_0 \in A if
for all \epsilon > 0 there is a $\delta > 0$ such that d(f(x_0),f(x))<\epsilon if d(x_0,x)<\delta.


3.3b Definition: continuity at a point
A function f:A \rightarrow B is continuous at a point x_0 \in A if the limit of f(x) as x approaches $x_0$ through the domain of f exists and is equal to f(x_0), i.e. \begin{aligned}\lim_{x\to x_0} = f(x_0)\end{aligned}.

3.4 Definition: continuity
A function is continuous if it is continuous at every x \in A.

The Wikipedia page does a great job with continuity. Definition 3.3a has some metrics in it, and it can be reformulated into a nice sentence about neighbourhoods: “More intuitively, we can say that if we want to get all the f(x) values to stay in some small neighbourhood around f(x_0), we simply need to choose a small enough neighbourhood for the x values around x_0, and we can do that no matter how small the f(x) neighbourhood is; f is then continuous at f(x_0).

3.5 Theorem
If f:A \rightarrow B is a function between metric spaces which is continuous and (a_n) is a sequence which converges to a point x_0 where each a_n \in A, then (f(a_n)) is a sequence which converges to f(x_0 and where each f(a_n)\in B.

There’s a good proof of this here. In English, this means that continuous functions map convergent sequences to convergent sequences. This is a very important property of continuous functions and actually characterizes them (i.e. it can be used to define continuity, like a definition 3.3c, equal and equivalent to the others).

Bounded Functions

We already know what a bounded set is: bounded sets have both an upper and lower bound.

3.6 Definition: boundedness of a function
A function f(x) is called bounded from above if there is a real number M such that M \geq f(x) for all x \in A. The number M is called an upper bound of f(x).
A function f(x) is called bounded from below if there is a real number m such that m \leq x for all x \in A. The number m is called a lower bound of f(x).
A set is bounded if it has both lower and upper bounds.

Sometimes we don’t care if the lower bound and upper bound differ, just that the function is indeed bounded.

1. -3 \leq f(x) \leq 3 \Rightarrow |f(x)| \leq 3
2. -4 \leq f(x) \leq 5 \Rightarrow |f(x)| \leq 5$
Why? If -4 \leq f(x), surely -5 \leq f(x) and thus -5 \leq f(x) \leq 5 \Rightarrow |f(x)| \leq 5.

Note one final subtlety  with functions. Sets don’t necessarily have maximums or minimums. Open sets like (0,1) don’t have a maximal or minimal element, but they do have supremums and infimums. If A = [a,b]\text{, } a,b \in \mathbb R and f:A \rightarrow B is a continuous function, it stands to reason from the definition of continuity that B=[c,d]\text{, } c,d \in \mathbb R for some c and d. But is it true that if m\leq f(x) \leq M (i.e. f is bounded) then $latex m = f(c)$? Yes. It is true. The first part of this idea called the boundedness theorem. The boundedness theorem is used to prove the extreme value theorem, which is what we set out to do in this post.

The Extreme Value Theorem

The boundedness theorem says:

3.7 Theorem: the boundedness theorem
If f is continuous function  in the closed and bounded interval [a,b], then q \leq f(x) \leq Q for some q,Q \in \mathbb R.

The interval [a,b] is bounded because a,b are real numbers, and \infty is not a real number. What the boundedness theorem does not say, is that the number q is the infimum of f(x), or that Q is the supremum of f(x). That is what the extreme value theorem is for. The proof of the boundedness theorem is so important, it gets its own post here (once I make it, I will link it).

3.8 Theorem: the extreme value theorem
If f is a continuous function on the closed and bounded interval [a,b], then f must attain its maximum and minimum, each at least once.

Recall that if a maximum exists, it is the supremum. A supremum can exist even if a maximum doesn’t. Recall that for a given subset S \subset A, in order for a maximum to exist it has to be an element of S itself, whereas the supremum can lie in A. Let A = \mathbb R and S = (0,1). Then \max A does not exist, but \sup A = 1. If A = \mathbb R and S = (0,1] then \max A = 1 and \sup A = 1.  This logic about supremums/maximums is true for sets and for functions since functions map to sets.

From Wikipedia: “The extreme value theorem enriches the boundedness theorem by saying that not only is the function bounded, but it also attains its least upper bound as its maximum and its greatest lower bound as its minimum.” So the boundedness theorem means \sup f exists but we don’t know if the maximum exists, and the extreme value theorem says \max f = \sup f. It says the same for the min/infimum.


Functions map inputs (elements) from a set A called the domain, to a set B called the codomain. We ask that A and B be subsets of a complete ordered field, i.e. A \subset \mathbb R and B \subset \mathbb R. We write a function f as f:A \rightarrow B. If A is a closed (meaning it contains all its boundary points) and bounded (meaning it’s not infinity in either direction, required since \mathbb R is closed) interval (sets of the form A = [a,b]) and if f is a continuous function (meaning it can’t change too quickly), then f reaches its maximum and minimum (on B, where since B is also a subset of a complete ordered field, it can have supremums and infimums).


Why did I introduce metrics? To deal with the idea of completeness. The function f(x)= \frac{1}{(x-\sqrt(2))^2} is continuous in \mathbb Q but it’s not continuous in \mathbb R. This is where the idea of \mathbb R being the nice set we can do calculus on comes from. You can define similar functions that break rules if the underlying fields are not complete.

It turns out that simple topological spaces have something called pseudometrics that you can always define on them, and they differ from metrics in that the distance between two points can be zero even if it is not the same point.
E.g. For a metric d(x,x)=0, x \in \mathbb A. That’s the only time the metric can be zero, is if it’s the same element you’re asking for the distance between. With a pseudometric, it can be true that d_p(x,y)=0 for a given x and y such that $x,y\in \mathbb A, x \neq y$.

Thing 1: semi-continuity

3.9 Definition: lower semi-continuity
A function f is lower semi-continuous if it looks ‘sharp’ when looked at from the bottom. The modification to the extreme value theorem is that lower semi-continuous functions on closed bounded subsets of \mathbb R attain their minimum.

3.10 Definition: upper semi-continuity
A function f is upper semi-continuous if it looks ‘sharp’ when looked at from the top. The modification to the extreme value theorem is that upper semi-continuous functions on closed bounded subsets of \mathbb R attain their maximum.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s