Optimization, Step 2

We’ll get really close to finding out why functions have minimums or maximums in this section. In order to do that, we’re going to need a few more tools. We’ll start with functions from \mathbb R \rightarrow \mathbb R, which are one dimensional in nature. Why are they one dimensional? The superscript on the first \mathbb R is implicitly a 1, i.e. \mathbb R^1 \rightarrow \mathbb R^1  in that we’re looking at functions of one independent variable. Soon we’ll generalize the results to n dimensions, and eventually to infinitely many dimensions, but we have to start with one dimension.

For now, we’ll introduce the ideas of completeness as well as some terminology surrounding functions and how confusing that terminology is. To do that, we’ll look at expanding the field we introduced in Step 1.

Ordered Fields

This part is easy. In the fields we use, we want there to be an ordering: a way of comparing whether two elements are greater than, less than, or equal to each other (or any combination of those). Check the Wikipedia article for ordered fields, for more information, but the most fundamental points for a given field F are:
1. If a \leq b then a + c \leq b + c for all a,b,c \in F.
2. If 0 \leq a and 0 \leq b then 0 \leq a \cdot b for all a,b \in F.
There are more points to ordered fields than are listed here but I just want to emphasize, it’s exactly what you think it is: we let inequalities into the mix. This branch of mathematics is called order theory

Recall from last time that of all the number sets, only the rationals and reals were fields (because they’re the only ones that have both additive and multiplicative inverses). So in our case, both \mathbb Q and \mathbb R are ordered fields, but \mathbb C has no ordering. (Is 3+4i > 3+2i? Such questions don’t make sense in \mathbb C.) We’re eventually going to show that we need to get rid of rationals as an option because we want our field to have a property called completeness (which we get into later).

Functions

Next, we need to learn what a function is and the terrible terminology surrounding it.

2.1 Definition: a function
A function is a map that takes elements from a set of inputs and assigns them to elements in a set of outputs where each input is related to exactly one output.

For us, those inputs and outputs will be number sets, obviously. We now need to get into the confusing topic of the domain and range of a function. How are the domain and range confusing? First we will clarify some notation.

We give the name f to a mapping or a rule that associates inputs to unique outputs. So f is just a name, and if the set of inputs is A and the set of outputs is B we tend to use the notation f:A \rightarrow B. The element x is sometimes referred to as the argument of the function. If we let x be an arbitrary element from the input set A, then is it traditional to write f(x) and interpret it as an element of B.

Let me repeat that one more time for clarity (and I’m stealing this formulation from Sir Tim Gowers’ blog):
If f is a function from A to B and x is an element of A, then f(x) is an element of B.

We are going to take a step into abstraction. What if I had an element x from the input set A, and an element y from the output set B, i.e. x\in A, y \in B. A function can be any way of associating with each  x\in A a unique y \in B. I might not be able to write down this function as something that I can do with symbols involving ‘x’. Both functions and sets can become extremely weird without trying very hard. (Can you picture what the set of all continuous functions from A to B looks like? What about the set of all sets?)

If I have a function f and I have some rule for taking elements x from a set A, it could very well be that my rule doesn’t work properly in mapping everything in A to elements in B. Why won’t the rule work? The limitation could be due to, for example, the rules of math, limits in the English language, or rules in place by requiring that A be a field (a set with some rules for addition, subtraction, multiplication, and division). It may not be any of those, but the point is that the rule you’re using might not fully map to all of the output set, B, and that’s okay. If x \in A , then f(x) is called the image of x (or the image of x under f. The set of all images is called the image of f.

Let’s give names to A and B. We’ll call A the domain and B the codomain. I’d like to start phasing out the use of the term ‘range’ because it is extremely confusing. See the Wikipedia article on the range of a function and notably, the graphic on the right. “Sometimes ‘range’ refers to the image and sometimes to the codomain.” Some examples:

  • f: \mathbb R \rightarrow \mathbb R \text{ and } f(x)=x^2 \text{ for every } x \in \mathbb R
  • g: \mathbb R_+ \rightarrow \mathbb R_+ \text{ and } f(x)=x^2 \text{ for every } x \in \mathbb R_+

If you’ve never seen the symbol \mathbb R_+ before, it stands for the set of non-negative real numbers: \mathbb R_+ = \{x \in \mathbb R \text{ | } x \geq 0 \} .
So what is the domain of f? Surely, it’s just \mathbb R. What’s the codomain of f? It’s \mathbb R as well, but the image of f here is only the non-negative real numbers, \mathbb R_+. For g, the domain is \mathbb R_+, the codomain is \mathbb R_+, and the image is \mathbb R_+.

“Two functions are considered equal if they have the same domain and the same codomain and they do the exact same thing to everything in the domain. In other words, there is more to a function than what it does.” This is from Sir Tim Gowers’ blog again, but the examples are simple enough and appear on the Wikipedia article for codomain. So I’m going to try not to be tricky and say something like “if h(x) = x^2 and j(x)=x^2, are h and j the same function? Ha! Tricked you! They’re not, I didn’t tell you the domain and codomain are different for both you get a D minus!”

I’ll leave you with one question that doesn’t have an objective answer. Is it wrong to think of functions like f(x)=x^2 as a function without thinking about its domain, codomain, and image, or should we always think of functions as having a rule (even if that rule isn’t something simple like f(x)=\sqrt{x}), a domain, and a codomain? Some people say yes, functions should be treated as rules, domains and codomains while others say no, it’s fine to think about the rule itself and ignore the domain and codomain in most cases. I don’t know the answer, but I will try not to trick you as we continue.

Metric Spaces

Ok, so I’ve given you what a field is, and now we know what a function is. I’m going to define a function called a metric, but we’re not going to analyze this metric for minimums, we’re going to use it to add some structure to the underlying sets A and B (the domain and codomain, which in reality are going to be subsets of \mathbb R, i.e. A \subseteq \mathbb R, B \subseteq \mathbb R). In one dimension we will use the terms ‘set’ and ‘space’ to be synonymous. This isn’t proper, but we’ll correct the situation when we get to n dimensions. If I haven’t said it explicitly yet, we want A and B to be fields or subsets of fields (i.e. A \subset F, B \subset F), and recall so far only \mathbb Q and \mathbb R fit the bill for this field F.

2.2 Definition: a metric
A metric on a set A is a function (called the distance function or simply the distance)
d:A \times A \rightarrow [0,\infty)
where for all x,y,z \in A the following conditions are satisfied:
1. d(x,y) \geq 0
2. d(x,y) = 0 \Leftrightarrow x=y
3. d(x,y) = d(y,x)
4. d(x,z)\leq d(x,y) + d(y,z)

A few things to explain here. First, the symbols A \times A mean that we need to take an element from A and then choose another element (possibly the same one, but a second time) from A. So the terminology is that what we label d is going to be a function of two arguments, both from the same set A. You will sometimes see this written d: A^2 \rightarrow [0,\infty) . So, metrics always need to be mapped to the non-negative real numbers; I never said [0,\infty) was explicitly over the reals, but it is. I’m not trying to trick you.
The \Leftrightarrow symbol, if you’ve never seen it before, means that whatever is on the left implies what is on the right, and vice versa. Finally, note that we don’t need an ordered field for a metric space, only a set. (We can define metrics on the complex numbers \mathbb C for example, but the field of complex numbers does not have a total ordering that makes any sense.)

The most common metric you’re used to is the Euclidian metric, which is just the Euclidian distance you’re probably familiar with: d(x,y) := \sqrt{(x-y)^2} = |x-y|. Note that I used the colon version to define d here, and I did that to outline the fact that the square root of the square of something is the absolute value (in one dimension). Either one works, and they’re the same thing (in one dimension).

I’m going to go back to the image from before to see what we’ve done.

f-A-to-B-metric

We’ve added on a function that applies to the set A to give A some structure. Don’t think of d as functions we’re interested in finding the minimums of. We may want to use the absolute value as a function f and find the minimum of it in the future, but for now we are just adding structure to A. Imbuing a set with structure in this way is called giving Atoplogy.

In general, we have the following set of inclusions.

\text{pre-Hilbert Space} \subset \text{normed space} \subset \text{metric space} \subset \text{toplogical space} .

We’ll get to what pre-Hilbert and normed spaces are, but it’s interesting that we jumped right to metric space. Given what we know about Venn diagrams, metric spaces are also toplogical spaces, so why didn’t I start there? All we need to know is that the properties of importance we get from our set A being a topological space are:
1. What it means for a set to be open
2. What it means for a set to be closed
3. What continuous functions are
4. What neighbourhoods are
Topological spaces have something called a pseudometric, which lets definitions of open, closed, etc. to be written easily. Having a metric instead of a pseudometric just makes things easier. This subject is called topology and there are many great books to introduce you to the subject. So, just to reiterate – while we’re playing in the metric space topology, we’re going to be talking about some concepts that relate to all metric spaces being topological spaces, and some properties (such as completeness) which only apply to the substructure afforded by metrics.

metric-hilbert-spaces

We’re playing in the metric space land. I didn’t include continuity in the topological space structure but that’s where it lives. If a metric space satisfies the property of completeness we get what we call a complete metric space. Note that the words written above the spaces are where we can define them. Just because I’m playing with topological spaces, doesn’t mean all the spaces I will play with will be open or closed. Similarly, I can see if a property satisfies being ‘complete’ once I’m in a metric space. If it does satisfy this, then I’m playing in a sub-category of metric spaces called complete metric spaces. Normed spaces that are complete (whatever that means) are called Banach spaces, and pre-Hilbert spaces that are complete (again, whatever that means) are called Hilbert spaces. Banach spaces have metrics (they have to: they’re a subset of normed spaces, which are a subset of metric spaces). In normed and Banach spaces we’ve just added more structure on top of there already being a metric. See here for more.

Open, Closed

You probably already know these, and this will be far from formal. I plan on writing a formal post on these concepts once we start looking at functions in n dimensions, and you can find the link for that here, (which I’ll add once I make it).

Open sets in \mathbb R (one dimension) have round brackets, like the set (3,4). Closed sets have square brackets, like the set [0,1]. The open bracket means it doesn’t include its endpoint, a closed bracket means it includes the endpoint. Sets like the above are called intervals if they don’t have holes in them (this property of not having holes in them is called connected). A set such as [0,1)\cup(1,2] is not connected, hence not an interval.

Sets such as (2,3) \cup (4,5) are also considered open. Sets such as [0,1] \cup [2,3]  are closed as well. Intervals such as (2,3] are sometimes called left-open or half-open intervals. Similarly, [10,13) are called right-open, for obvious reasons.

Bounded Sets

2.3 Definition: boundedness of a set
A set A of real numbers is called bounded from above if there is a real number k such that M \geq x for all x \in A. The number M is called an upper bound of A.
A set A of real numbers is called bounded from below if there is a real number m such that m \leq x for all x \in A. The number m is called a lower bound of A.
A set is bounded if it has both lower and upper bounds.

This is a healthy reminder that \infty is not a real number. If there is a subset, say S of A, i.e. S \subset A, a subtle point here is that the upper bound, if it exists, doesn’t have to belong to the subset. The open set A:=(0,1)\subset \mathbb R has any number greater than or equal to 1 as an upper bound. Indeed, every element of A is less than 2, i.e. for all x \in A, x \leq 2. The number 2 is thus an upper bound on the set. We want the smallest of these so-called upper bounds, and the smallest upper bound for this set would be 1. In a similar way, there are an infinite number of lower bounds of A, but the largest such lower-bound is 0. The reason we don’t look for the maximum element of the set is that for A= (0,1) there is no maximal element. There is also no minimal element. The least upper bound is called the supremum of A and is written \sup A, where here \sup A = 1. The infimum is the greatest lower bound: \inf A = 0. Obviously, we don’t run into this problem for closed sets: their supremums and infimums lie in the sets themselves. When this happens we can think of the supremum as the max of a set and the infimum as the minimum, but now you can see the subtle difference. Let B = [0,1]. Then \sup B = \max B = 1 and \inf B = \min B = 0. One other subtlety is that if sets have no upper bound there can be no supremum. Let B = [0,\infty. Then \sup B does not exist. The same is true for the infimum.

Sequences, Completeness

This is the crux of this post. We’ll finally get to what completeness means and why it’s important. Sequences are just lists of numbers. A few examples:
1. 2, 4, 6, 8, 10, … A sequence of positive, even numbers.
2. 1, 1, 2, 3, 5, 8, … The Fibonacci Sequence
3. 1, 4, 9, 16, 25, … A sequence of squares of the natural numbers
4. \frac{1}{1}, \frac{1}{2}, \frac{1}{3}, ... The harmonic sequence

Sequences can be thought of as functions from the set of natural numbers to whatever number set the sequence is in. You can think of examples 1-3 as having elements in \mathbb N or in \mathbb R, for example. Example 4 could have either \mathbb Q or \mathbb R as its underlying set.

If we let all of the terms in the Fibonacci Sequence be matched to terms a with a subscript denoting their position in the sequence \begin{array}{cccccc}1 & 1 & 2 & 3 & 5 & ...\\ a_1 & a_2 & a_3 & a_4 & a_5 & ... \end{array}
we can treat a_n as the generic term. It’s normal to treat functions from the set of natural numbers as a_n := a(n) whereby the function’s argument appears as a subscript instead of an argument in brackets. The function for the Fibonacci Sequence can be thought of as a: \mathbb N \rightarrow \mathbb N or a: \mathbb N \rightarrow \mathbb R or anything in between.

2.4 Definition: a convergent sequence
A sequence x_1, x_2, x_3, ... is said to converge to a limit L if \begin{aligned} \lim_{x\to\infty} x_n = L \end{aligned}, if L exists.

One of the problems with this definition, is that if the terms are say x_n \subset \mathbb Q, the number L that they converge to might be a real number, i.e. L \in \mathbb R. That’s not a very desirable property of convergent sequences. There’s something called a Cauchy sequence that we need to investigate, and we’ll do it in the context of a metric space.

2.5 Definition: a Cauchy sequence
A sequence x_1, x_2, x_3, ... in a metric space (A,d) (where A is the underlying field and d is the metric) is called a Cauchy sequence if for every positive real number \epsilon there is a positive integer N such that for all natural numbers m,n > N , it holds that d(x_m,x_n) < \epsilon .
In one dimension with the Euclidian metric, this is the same as |x_m - x_n| < \epsilon.

Cauchy sequences have elements that get arbitrarily close together. Check out the Wikipedia article for more. Cauchy sequences are essentially convergent sequences but you don’t have to define what the limit L is. This may seem like a weird definition: why can’t we just let two consecutive terms get closer to each other, something like x_n - x_{n+1}) < \epsilon? Terms getting ‘close’ to one another in this sense isn’t a good enough criteria for convergence.

2.6 Theorem: convergent sequences are Cauchy sequences
Every convergent sequence is a Cauchy sequence. Not all Cauchy sequences are convergent.

The long and short of this is that Cauchy sequences converge. The problem is that the limit of a Cauchy sequence can sometimes leave the set of numbers that the elements in the sequence are in. What?

The sequence a_1 = 1, a_n = \frac{a_n + \frac{2}{a_n} }{2} consists of rational numbers, i.e. a_n \in \mathbb Q for all n \in \mathbb N. The sequence looks something like (1,\frac{3}{2}, \frac{17}{12}, ...). The sequence, however, converges to a number that is not a rational number: \sqrt{2}. We can write a sequence in \mathbb Q but its limit is outside of  \mathbb Q. We say that the rationals are incomplete because of this property.

2.7 Definition: completeness
A metric space A in which every Cauchy sequence converges to an element in A is called complete.

Notice that the definition of completeness relies on the underlying space to be a metric space. If you go back to the large Venn diagram above, you can see I put completeness as being a property of metric spaces. It turns out, there’s an idea of completion of a set, where you can take an incomplete set like \mathbb Q and complete it. If you do such a thing, you get the real numbers \mathbb R. No matter what sequence you create, if its terms are in \mathbb R and it is a Cauchy sequence, its limit will be in \mathbb R.

Summary

We had a lot going on here. Two separate thoughts happened: we added a lot of structure to our set A. We quickly defined what a function is so that we could define a function called a metric. A metric can be defined on any arbitrary set: the underlying set does not need to be an ordered field, however the codomain of a metric is the set of non-negative reals. We then called the set together with a metric a metric space, and using that metric space we defined a property called being complete. It’s worth re-iterating that completeness is a property of some metric spaces, and metric spaces that are complete are called complete metric spaces.

The real numbers are an ordered field. We didn’t ask that our set A be a field in the definition of the metric d: A \times A \rightarrow \mathbb R_+. We want A to be a subset of a complete ordered fieldA\subset \mathbb R. We like complete ordered fields because we can do calculus on them.

f-A-to-B-metric-2

A subset of a field is not necessarily a field on its own. A subset like [0,1]\subset \mathbb R loses the fact that while \frac{1}{2}\in [0,1], its multiplicative inverse 2 isn’t. Finally, I haven’t said it yet we will also want B to be a subset of a complete ordered field, and also for there to be a metric on B.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s