Optimization, Step 1

Some preliminaries.

Engineers often take for granted that functions have minimums and maximums. In general, the underlying goals of machine learning and optimization are to find the minima and maxima of functions. As things get more complicated and more complex ideas as thrown into the mix of real-world problems, it’s not always clear that a goal of finding these maxima or minima is even attainable. Luckily there is a solid framework from which mathematicians stand in order to know that algorithms are going to achieve their goals, i.e. solutions will be found.

In this series of posts I’m going to try to outline some of the functional analytic tools and ideas and present them in a much more easy-going way than you would normally find them in white papers or textbooks. They focus on outlines of the structures we need to build the framework of optimization. In some cases I will get into the necessary conditions for something to be true.

What type of background do you need to be able to understand this content in this series? Some undergraduate math courses. Maybe none; maybe you’re a keener. I’ll do my best to explain as many things as are necessary, but that will entail skipping out on some ideas. No homework, just examples and solved problems to motivate what’s needed where it’s needed. There won’t be many proofs, but I’ll link to as many as I can if you want/need the rigour.


This is not the precise definition of a function, but it will do for this section. Functions are mappings from (elements of) one set to another. In math we want the sets to be number sets, and additionally we want these number sets to be a field. Step 2 will have a proper definition of a function and more information about it, but for now we are going to look at what sets are.


You’re familiar with these, but we’re going to narrow the view to an optimization standpoint. We need a couple of definitions:

1.1 Definition: a set 
A collection of distinct objects is called a set. These objects are referred to as members or elements of the set.

If A is a set, and x is an element of A then we write x\in A and we say many different things to denote this: x is an element of A or x belongs to A or x is in A. Ex: If we let A = \{1,2,3,4\}, then surely 2 \in A. We also have the notation \notin which means ‘not in’, and for the set A = \{1,2,3,4\}, then 5\notin A. Writing something like \{2,3\} \in \{1,2,3,4\} is wrong. However, we can and do write  2,3 \in \{1,2,3,4\} but there’s a big difference between the two. In the first case we’re misusing the symbol \in and in the second case, we mean “both 2 and 3 are in the set \{1,2,3,4\}“. Using the symbol \in means on the left of it, there needs to be an element and on the right side a setSometimes things get tricky when the symbol is flipped, whereby we write A \ni x; now the element needs to be on the right and the set needs to be on the left.

There are some subtle rules about sets that are worth mentioning here:
1. The order of elements in a set is unimportant, hence the sets \{1,2,3\}, \{2,1,3\}, and \{3,2,1\} are the same.
2. There is a set that contains no elements called the empty set and it is denoted by the symbols \varnothing, \emptyset or \{\}.
3. Sets are equal if and only if they have the same elements.
4. Sets can have sets as elements. The empty set \emptyset has no elements. The set \{\emptyset\} has one element, the empty set. The set \{\emptyset,\{\emptyset\}\} has two elements, etc.
5. All members of a set must be distinct. Well, sort of. The set \{2,2,7\} is actually the set \{2,7\}, but I don’t think anyone would have trouble with you calling \{2,2,7\} a set, it’s just not the ‘proper’ set we want to use here.
6. Sets need to be well defined. You need to be able to tell whether a given object is a member of a given set or not, there is no maybe! Ex: The collection of all “nice” people in Calgary is not a set, since there is no good criteria to define what a nice person is.

Ex: The set M = \{x \text{ }|\text{ } x^2 + 1 = 0, \text{ } x \in \mathbb R \} has no elements. This is because the equation x^2 + 1 = 0 has no solution in the reals, and it follows that M is the empty set, M=\emptyset.

Now, above I said you can’t write something like \{2,3\} \in \{1,2,3,4\} because that’s not how we use \in, but have no fear, there is another symbol we use to denote such a thing. The symbol is \subset. So we can write \{2,3\} \subset \{1,2,3,4\} since the set of \{2,3\} forms a sort of sub-set of \{1,2,3,4\}. We really should formalize this into a proper definition though:

1.2 Definition: a subset 
Let A and B be sets. The set A is a subset of B if every element in A is also an element of B; we write A \subseteq B and say A is contained in B or A is a subset of B.

Ex: Let A = \{2,-1\}, B = \{ t \text{ }|\text{ } t^2-t-2=0\} . Obviously A=B (see point 3 above). You might say “Aha! But what if t\in\mathbb N, then they aren’t the same!” We try not to be that tricky. Sometimes we leave the underlying set that we choose the numbers to be in.

There are some tricky things we can do with this notation. Every set B has two obvious subsets, \emptyset \subseteq A and A \subseteq A. Any other subset A of B is called a proper subset, and we write A\subset B; we say A is strictly contained in B or A is a proper subset of B. The empty set is a proper subset of every nonempty set.

NOTE: This idea of ‘strictness’ exists a lot in mathematics. If I’m looking for a number x that is less than, say, 5, I can write x < 5. But for that same sentence, to add emphasis I can also say that ‘x is strictly less than 5’ and I really mean the same thing. I’m just adding emphasis to outline that x \lneq 5 . You may have heard someone say “x is less than five” when x \leq 5 is written, and it’s because saying “x is less than or equal to five” is much more cumbersome to say. I’m not saying it’s right, I’m just saying it happens.

CAUTION: Not every mathematician uses the symbol \subseteq . They treat \subset and \subseteq as one and the same. I don’t know how I feel about this notation style, so in this blog if you see \subset it should always be treated as a proper subset. Above, I said that the empty set is a proper subset of every nonempty set. By that, we have that \emptyset \subseteq \emptyset \emptyset \subseteq A, and \emptyset \subset A if and only if A \neq \emptyset. Think of \subseteq as being English for “could be a proper subset or could be the same set”.

NOTE: If both A \subseteq B and B \subseteq A then A = B . We can also write this: If both A \subseteq B and A \supseteq B then A = B . The opposite symbol means exactly what you’d think it means, and is called a superset.

Some examples would now be useful.

  • Ex: The set of all vowels \{a,e,i,o,u\} is a subset of \{a,b,c,...,x,y,z\}.
  • Ex: A=\{1,3\} \subset \{x \text{ }|\text{ } 0 < x < 5, x \in \mathbb R\} = B
  • Ex: The set of natural numbers \{1,2,3,...\} is a superset of the even positive integers \{2,4,6,8\} in that \{1,2,3,...\}\supset\{2,4,6,8\}.

CAUTION: We really abused some notation here. If you weren’t familiar with the symbols \in and \subset before, we outlined here that the first one has elements on the left, sets on the right and the second has sets on both sides. You may also see in some literature the symbol := used. It basically says, I have this placeholder which is the letter “A”, and from here on I want to use it interchangeably with the set \{1,3\}. The second example above could be re-written:

\begin{aligned}A:=\{1,3\} \subset \{x \text{ }|\text{ }0 < x < 5, x \in \mathbb R\} =: B \end{aligned}

It saves us having to write:

\begin{aligned} \text{Let} A:=\{1,3\} \text{ and } B := \{x \text{ }|\text{ }0 < x < 5, x \in \mathbb R\}.\end{aligned}

\begin{aligned}\text{It follows that:} A \subset B. \end{aligned}

Number Sets

When doing math, we choose our sets have elements that are numbers, not abstract sets like ‘the set of all dog breeds’. We call these sets number sets and there are many different types of number sets to choose from. You may recall that the set of numbers \mathbb N = \{1,2,3,...\} is called the set of natural numbers. There is also a set called the integers \mathbb Z = \{...-3,-2,-1,0,1,2,3,...\} , the set of rationals \mathbb Q = \{ \frac{a}{b}\text{ }|\text{ }a,b \in \mathbb Z , b \neq 0 \} , and the set of real numbers or reals \mathbb R . Indeed, you may also recall that \mathbb N \subset \mathbb Z \subset \mathbb Q \subset \mathbb R.

Venn Diagram of the usual number sets

Since we’re dealing with numbers, it’s natural to want to have a few operations defined on numbers. The natural ones to want are addition, subtraction, multiplication, and division. We also want some properties to hold for these operations. These extra properties add structure to the elements in the sets we’re interested in: number sets.


Wikipedia does a great job on fields, so I’ll outline the basic things we need. It turns out that all we need are addition and multiplication. Subtraction and division are just inverse operations to addition and multiplication respectively. Here’s what we ask of these so-called fields:

Let F be a set. Let addition and subtraction be as we expect them to be. If all of the axioms below hold for a given set F, then we call F a field.
Closure of addition and multiplication
1. For all a,b \in F, both a+b and a \cdot b are in F.
Ex: 1/2 \in \mathbb Q, 2/3 \in \mathbb Z then 1/2+2/3= 7/6  \in \mathbb Z
2. For all a,b,c \in F, the following equalities hold: a+(b+c) = (a+b)+c and (a\cdot b) \cdot c = a\cdot ( b \cdot c )  .
3. For all a,b \in F, the following equalities hold: a+b = b+a and a\cdot b = b \cdot a .
Identity Elements Exist
4. There exists an element of F called the additive identity element denoted by 0, so that for all a\in F, a+0=a. Similarly, there exists an element of F called the multiplicative identitity and denoted by 1, so that for all a \in F, a \cdot 1 = a. The additive and multiplicative identities must be separate elements.
Inverse Elements Exist (Definition of Subtraction and Division)
5. For every a\in F there exists an element -a \in F such that a + (-a) = 0. Similarly, for every a\in F there exists an element a^{-1} \in F such that a \cdot a^{-1} = 1. In these ways we define subtraction to be the additive inverse and division to be the multiplicative inverse.
6. For all a, b, \text{ and }c \in F the following equality holds: a \cdot (b+c) = (a \cdot b) + (a \cdot c) .

Of all of the number sets listed above, only the rationals and reals fulfill these criteria. The reals are the only number set we really want to use. The rationals are missing the concept of completeness, which is a term that will be defined in Step 2 and emphasized later on.


Not so hard so far. We gave a name to the number sets that fulfill enough properties to be called a field. We started to think of functions as things that take elements from these number sets and map them to other sets. In \mathbb R this is all very straight forward: a function takes a number as its input and churns out another number. Things do however get harder, so I really just wanted to make sure we have a base set of terminology to move onward and upward from.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s