Elementary discrete mathematics

Table of contents

0. What is discrete mathematics?

0.1 It’s math you do quietly, right?

The phrase “discrete mathematics” is often used nebulously to refer to a certain branch of mathematics that, from the college point of view, is “not calculus” or perhaps is applicable to computer science. In this section, we will try to understand what it means for a topic to fall under the umbrella of discrete mathematics.

0.2 But really, what is it?

Here’s a question: What is the next number after 11? Your answer depends on “where you live,” so to speak. If you live in the rational numbers (numbers that can be expressed as fractions, like 3/73/7) or the real numbers (including numbers that can’t be written as fractions, like π\pi), you can’t answer the question. If you give be a real or a rational number that’s a little more than 11, say 1.11.1, I can find something even smaller; say, 1.011.01.

If you live in the set of natural numbers, which is the set of numbers that are useful for counting things in the world, the question does have an answer: 22. The natural numbers do not suffer from the crowding issue that the rationals and the real do; all of their elements are sufficiently “far apart” in this way. There is 11, and then 22, and nothing in between. In this sense they are discrete. In this book, we will (usually) live in the natural numbers.

0.3 Some examples

Here are some questions we might try to answer using discrete mathematics. In fact, you will be able to answer all these questions by the end of the book.

Suppose we have a program that will activate condition 4 if conditions 1 and 2 are met but not condition 3. How do we know when condition 4 is active?

We can use logic to represent this program with the statement (pq¬r)s(p \wedge q \wedge \neg r) \to s, and then use truth tables to determine exactly when condition 4 will trigger.

Which of these two programs is more efficient?

A sequence is a function of natural numbers. An example is the number (an)(a_n) of steps it takes to run a computer program with nn inputs. If (an)(a_n) grows must faster than (bn)(b_n) does, that can tell us that one program scales worse for large quantities of data.

Some members of a community are sick. How do we know who is at risk of being infected?

Relations encode relationships between sets of objects. We may compose relations to iterate those relationships over and over again. If person xx is sick and has come into contact with person yy, and yy has come into contact with person zz, and zz has since seen ww, then ww is at risk.

0.4 How to use this book

Chapters 1-8 must be done in order. Each chapter has a section of exercises located on the problems page. These exercises should be attempted after each chapter. If you are in MATH 174 at Coastal Carolina University, these exercises are your homework.

Then, chapters 9-13 (on combinatorics) and chapters 14-18 (on relations) may be attempted in that order, or in the order 14-18; 9-13. Be aware that some of the exercises in 14-18 assume that the reader has learned the combinatorial material. So, if you follow the alternate order, leave these exercises for later.

This book is paired with a four playlists on my YouTube channel. Playlist 1 (sets and logic) maps to chapters 1-3, playlist 2 (proof and recursion) maps to chapters 4-8, playlist 3 (combinatorics) maps to chapters 9-13, and playlist 4 (relations) maps to chapters 14-18.

These videos are very short demonstrations and examples of the material pursued in depth by the book. I am not sure which ordering is better: book then videos, or videos then book. Choose your own adventure.

I try to use the invitational “we” throughout the book and maintain some level of formality, but sometimes my annoying authorial voice pokes through. I apologize if that bothers you.

Each chapter ends with several “takeaways” that summarize the chapter. Starting with the fourth chapter, if necessary, each chapter begins with hint to go back and look at some things you may not have thought about in a while.

0.5 The point of this book

– is to teach you discrete mathematics, of course.

But much more broadly, this book is intended to teach you to think mathematically. You may be a student who has never seen any math beyond calculus or algebra, or perhaps you dread doing math. The point of this book is to get you to learn to think deeply, communicate carefully, and not shy away from difficult material.

As you proceed, have something to write with on hand. Work each example along with the book. Then, at the end of each chapter, work the exercises. Some are more difficult than others, but are well worth doing to improve your understanding. Work along with someone at roughly your skill level, share solutions and ask each other questions.

Everyone can be a “math person!” So, let’s get started.

1. Sets

Videos:

1.1 Membership

For nearly all interested parties, sets form the foundation of all mathematics. Every mathematical objects you have ever encountered can be described as a set, from numbers themselves to functions and matrices.

We will say that a set is an unordered collection of objects, called its elements. When xx is an element of the set XX, we write xXx \in X.

Examples. Our starting point will be the set of all natural numbers,

N={0,1,2,3,}. \mathbf{N} = \{0,1,2,3,\ldots \}.

Certain sets are special enough to get their own names, with bold or “blackboard bold” face. Sometimes (especially in ink) you will see N\mathbb{N} instead. Note that the elements of the set are enclosed in curly braces. Proper notation is not optional; using the wrong notation is like saying “milk went store” to mean “I went to the store to get milk.” Finally, not everyone agrees that 00 is a natural number, but computer scientists do.

If we take N\mathbf{N} and include its negatives, we get the integers

Z={,2,1,0,1,2,}. \mathbf{Z} = \{\ldots, -2, -1, 0, 1, 2, \ldots \}.

Notice that in the case of an infinite set, or even a large finite set, we can use ellipses when a pattern is established. One infinite set that defies a pattern is the set of prime numbers. You may recall that a prime number is a positive integer pp with exactly two factors: itself and 11. We may list a few: 2,3,5,7,11,13,17,19,23,2, 3, 5, 7, 11, 13, 17, 19, 23, but there is no pattern that allows us to employ ellipses.

Fortunately there is another way to write down a set. All previous examples have used roster notation, where the elements are simply listed. It may be more convenient to use set-builder notation, where the set is instead described with a rule that exactly characterizes this element. In that case, the set of primes can be written as

{p  p is a prime number}. \{ \, p \, | \; p \text{ is a prime number} \}.

To provide one last example, consider the set of all U.S. states. In roster notation this set is a nightmare: {AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY}. But in set-builder notation we simply write

{SS is a U.S. state}. \{ \, S \, | \, S \text{ is a U.S. state} \}.

Definition 1.1.1. Two sets XX and YY are equal if they contain exactly the same elements, in which case we write X=YX=Y.

Remember that sets are unordered, so even {a,b}={b,a}\{a,b\}=\{b,a\}.

Definition 1.1.2. Let XX be a finite set. Its cardinality, denoted X|X|, is the number of elements in XX.

Example. If X={a,b,c}X=\{a,b,c\}, then X=3|X|=3.

We can talk about the cardinality of infinite sets, but that is far beyond the scope of this course.

1.2 Containment

One of the most important ideas when dealing with sets is that of a subset.

Definition 1.2.1. Let XX and YY be sets. We say XX is a subset of YY if every element of XX is also an element of YY.

Examples. Put X={a,b,c}X = \{a,b,c\}, Y={a,c}Y = \{a,c\}, and Z={b}Z=\{b\}. Then YXY \subseteq X, ZXZ \subseteq X, but Y⊈ZY \not\subseteq Z and Z⊈YZ \not\subseteq Y.

In mathematics, true statements are either definitions (those taken for granted) or theorems (those proven from the definitions). We will learn to prove more difficult theorems later, but for now we can prove a simple one.

Theorem 1.2.2. Every set is a subset of itself.

A proof is simply an argument that convinces its reader of the truth of a mathematical fact. Valid proofs must only use definitions and the hypotheses of the theorem; it is not correct to assume more than you are given. So the ingredients in the proof of this theorem are simply going to be a set and the definition of the word “subset.”

Proof. Let XX be a set. Suppose xx is an arbitrary element of XX. Then, xx is also an element of XX. Because XX was shown to have any arbitrary element of XX, then XXX \subseteq X. \square

Notice that the actual make-up of the set XX is irrelevant. If XX has an element, then XX also has that element, and that is all it takes to conclude XXX \subseteq X. It would be incorrect to try to prove XXX \subseteq X from an example: just because {0}\{0\} is a subset of itself doesn’t necessarily mean that {0,1}\{0,1\} is!

We will discuss logic and proof-writing at length as we continue. For now, let’s continue learning about sets and proving the “simple” theorems as we go.

Definition 1.2.3. The set with no elements is called the empty set, denoted {}\{ \} or \varnothing.

With a little logical sleight-of-hand, we can prove another (much more surprising) theorem.

Theorem 1.2.4. The empty set is a subset of every set.

Proof. Let XX be a set. We may conclude X\varnothing \subseteq X if every element of \varnothing is an element of XX. Observe that this statement is equivalent to saying that no elements of \varnothing are not elements of XX. Because there are no elements of \varnothing at all, this statement is true; and we are done. \square

Perhaps you already agree with the central claim of the proof, that there are no difference between the statements "everything in \varnothing is in XX" and "nothing in \varnothing is not in XX". If you do not, take it on faith for now. When we study predicate logic later you will have the tools to convince yourself of the claim.

Such a statement that relies on the emptiness of \varnothing is said to be vacuously true.

When XYX \subseteq Y, we say YY contains XX or that YY is a superset of XX. With the idea of containment in hand we can construct a new type of set.

Definition 1.2.5. Let XX be a set. Its power set is the set P(X)\mathcal{P}(X) whose elements are the subsets of XX. That is,
P(X)={EEX}. \mathcal{P}(X) = \{ \, E \, | \, E \subseteq X \, \}.

Wait - our elements are sets? Yes, and this is perfectly fine! Elements can be all sorts of things, other sets included. When we have a “set of sets” we typically refer to it as a family or a collection to avoid using the same word and over and over.

Example. Let X={a,b,c}X = \{a,b,c\}. Then

P(X)={,{a},{b},{c},{a,b},{a,c},{b,c},X}. \mathcal{P}(X) = \{ \varnothing, \{a\}, \{b\}, \{c\}, \{a,b\}, \{a,c\}, \{b,c\}, X\}.

Notice that the number of sets in P(X)\mathcal{P}(X) can be derived from the number of elements in XX.

Theorem 1.2.6. If XX is finite, then P(X)=2X|\mathcal{P}(X)|=2^{|X|}.

We will come back to prove this theorem much later, once we have learned how to count!

1.3 The algebra of sets

You are used to word “algebra” as it refers to a class where you must solve some equations for xx. If you think about those classes, they are really about understanding operations and functions of real numbers. Think about it: a successful algebra student will know how to manipulate sums, products, polynomials, roots, and exponential functions – ways of combining and mapping real numbers.

So, “algebra” in general refers to the ways objects may be combined and mapped. In this section, we will see many ways that sets can be combined to produce new sets.

Definition 1.3.1. Let XX and YY be sets. Their union is the set XYX \cup Y of all elements that are in at least one of XX or YY; in set builder notation this is
XY={xxX or xY}. X \cup Y = \{\, x \, | \, x \in X \text{ or } x \in Y \}.

Definition 1.3.2. Let XX and YY be sets. Their intersection is the set XYX \cup Y of all elements that are in both XX and YY; in set builder notation this is
XY={xxX and xY}. X \cap Y = \{\, x \, | \, x \in X \text{ and } x \in Y \}.

A useful way to visualize combinations of sets is with a Venn diagram. The Venn diagrams for union and intersection are shown below.

On the left, two sets X and Y are pictured as overlapping circles. The region within both circles together - the union of the sets - is shaded in blue. On the right, only the region that is part of both circles - their intersection - is shaded in pink.

Figure. On the left, the union of two sets; on the right, the intersection.

Example. Let X={1,2,3,4}X = \{1,2,3,4\} and Y={2,4,7,9}Y=\{2,4,7,9\}. Their union is
XY={1,2,3,4,7,9} X \cup Y = \{1,2,3,4,7,9\} and their intersection is XY={2,4}.X \cap Y = \{2,4\}. Notice that we do not repeat elements in a set. An element is either in the set or it isn’t. (A generalization, the multiset, will be developed in Chapter 11.) Therefore, XY|X \cup Y| (recall, the number of elements in the union) won’t just be X+Y|X|+|Y|.

Theorem 1.3.3. If XX and YY are finite sets, then XY=X+YXY.|X \cup Y| = |X|+|Y|-|X\cap Y|.

Proof. If an element is only in XX, it will be counted by X|X|. If an element is only in YY, it will be counted by Y|Y|. Therefore any elements in XYX \cap Y will be counted twice. There are exactly XY|X \cap Y| of these elements, so subtracting that amount gives XY|X \cup Y|. \square

Definition 1.3.4. Let XX and YY be sets. The difference XYX-Y is the set of all elements in XX but not YY; i.e. XY={xxX and x∉Y}.X-Y=\{ \, x \, | \, x \in X \text{ and } x \not\in Y\}.

Example. Like before, let X={1,2,3,4}X = \{1,2,3,4\} and Y={2,4,7,9}Y=\{2,4,7,9\}. Then XY={1,3}X-Y=\{1,3\}. Notice that YX={7,9}Y-X=\{7,9\}; just like with numbers, taking a diference is not commutative. (That is, the order matters.)

Often when working with sets there is a universal or ambient set, sometimes denoted Ω\Omega, that all of our sets are assumed to be subsets of. This set is typically inferred from context.

Definition 1.3.5. Let XX be a set contained in some universal set Ω\Omega. The complement of XX (relative to Ω\Omega), denoted X\overline{X}, is all the elements not in XX. In other words, X=ΩX\overline{X}=\Omega-X.

On the left, two sets X and Y are pictured as overlapping circles. The region within X, excluding the portion that is also part of Y, is shaded in pink. This is a picture of the difference X minus Y. On the right, a set X, pictured as a circle, is shown contained in its universal set, pictured as a rectangle. The region between the rectangle and circle is shaded in blue, representing the complement of X.

Figure. On the left, the difference of two sets shaded in pink. On the right, the complement of a set shaded in blue.

Example. Suppose X={a,b,c}X=\{a,b,c\} is contained in the universal set Ω={a,b,c,d,e}\Omega=\{a,b,c,d,e\}. Then X={d,e}\overline{X}=\{d,e\}.

Definition 1.3.6. Let XX and YY be sets. Their Cartesian product (in this book, just “product”) is the set X×Y={(x,y)xX and yY}.X\times Y = \{(x,y)\,|\, x \in X \text{ and } y \in Y\}. The elements of the product are called ordered pairs.

As the name implies, the ordering of the entries in an ordered pair matters. Therefore, we will adopt the convention that unordered structures are enclosed in {\{ curly braces }\} and ordered structures are enclosed in (( parentheses )). The immediate consequence to this observation is that X×YX \times Y is different from Y×XY \times X.

A picture of the product of two sets is shown. One element of the first set and one element of the second combine to make an ordered pair in the product.

Figure. A visualization of the product of two sets.

Examples. You are likely familiar with R2=R×R\mathbf{R}^2=\mathbf{R}\times\mathbf{R}, the Cartesian plane from algebra and calculus. The elements of this (infinite) set are ordered pairs (x,y)(x,y) where both xx and yy are real numbers.

To make a discrete example, put X={a,b,c}X=\{a,b,c\} and Y={1,2}Y=\{1,2\}. Then
X×Y={(a,1),(a,2),(b,1),(b,2),(c,1),(c,2)}. X \times Y = \{(a, 1), (a,2), (b,1), (b,2), (c,1), (c,2)\}. As an exercise, calculate Y×XY \times X yourself. The pair (1,a)(1,a) is an element.

Theorem 1.3.7. If XX and YY are finite sets, then X×Y=XY|X \times Y| = |X||Y|.

Proof. The elements of X×YX\times Y may be tabulated into a rectangle, where each row corresponds to an element of XX and each column corresponds to an element of YY. There are X|X| rows and Y|Y| columns and therefore XY|X||Y| elements in the table. \square

1.4 Operations involving more than two sets

Note: While this section is cool, we will not make use of its content very often in this course. You may choose to skip this section for now and return to it after Chapter 8, when you are more comfortable with the material.

There is one last thing to say about sets (for now, anyway). There is no reason to limit ourselves to two sets in union, intersection, and product. If we have a family of nn sets X1,X2,,XnX_1, X_2, \ldots, X_n, we may take nn-ary unions, intersections, and products.

The nn-ary union i=1nXi=X1X2Xn\bigcup_{i=1}^n X_i = X_1 \cup X_2 \cup \cdots \cup X_n is the set of elements that are in at least one of the sets XiX_i.

The nn-ary intersection i=1nXi=X1X2Xn\bigcap_{i=1}^n X_i = X_1 \cap X_2 \cap \cdots \cap X_n is the set of elements that are in all of the sets XiX_i.

The nn-ary product i=1nXi=X1×X2××Xn\prod_{i=1}^n X_i = X_1 \times X_2 \times \cdots \times X_n is the set of ordered nn-tuples (x1,x2,,xn)(x_1, x_2, \ldots, x_n), where the element xix_i is a member of the set XiX_i.

Suppose instead the family X1,X2,X_1, X_2, \ldots is infinite. We will, exactly once in this book, need to take an infinite union. The infinite union i=1Xi=X1X2\bigcup_{i=1}^\infty X_i = X_1 \cup X_2 \cup \cdots is the set of all elements that are in at least one of the XiX_i. (Notice that the definition hasn’t changed – just the number of sets!) Likewise the infinite intersection i=1nXi=X1X2\bigcap_{i=1}^n X_i = X_1 \cap X_2 \cap \cdots is the set of elements in all the XiX_i.

Takeaways

Chapter 1 problems

2. Propositional logic

Videos:

In mathematics, we spend a lot of time worried about whether a sentence is true or false. This is fine in the case of a simple sentence such as “The number 22 is prime.” (That sentence is true.) But there are much more complicated sentences lurking around, and whole constellations of sentences whose truth depends on one another. (An unsolved problem in mathematics is deciding whether the Riemann hypothesis, a statement about the prime numbers, is true. Most people believe it is, and many results in the literature are true given the hypothesis. The Riemann hypothesis is going to make or break a lot of research.)

So, we are interested in tools that will allow us to judge whether complicated statements are true. Such models of truth are called logic.

2.1 Statements and connectives

Definition 2.1.1. A statement is a sentence that is either true or false.

Examples. “The number 22 is prime” and “It is perfectly legal to jaywalk” are statements. Non-statements include questions (“Is it raining?”), commands (“Do your homework”), and opinions (“Mint chocolate chip ice cream is the superior flavor”). It is important to note, not necessarily for mathematical purposes, that opinions are statements of pure taste. Claims whose truth values exist but are unknown are beliefs.

Atomic propositional statements of the type discussed in this chapter are flatly always true or false. You may think of them in the same way you think of constant functions in algebra. They are denoted with lowercase letters pp, qq, rr, etc. If the only statements were atomic, logic would be boring. Fortunately, we can combine statements with connectives. In the next chapter, we will look at predicate statements whose truth values depend on an input. Likewise, these may be thought of as the non-constant functions.

Compound statements arise from combining atomic statements with connectives. They are denoted using Greek letters like φ\varphi (“phi”), ψ\psi (“psi”), and γ\gamma (“gamma”). The truth of a compound statement depends on the connectives involved and the truth values of the constituent statements. A tool for determining the truth of a compound statement is called a truth table.

In a truth table for the compound statement φ\varphi, the left side of the table is the atomic statements involved in φ\varphi and the right side of the table is φ\varphi itself. Each row gives a possible combination of truth values for the atomic statements, and the corresponding truth value for φ\varphi. At this point, it is easier to start introducing connectives and seeing some truth tables.

Definition 2.1.2. Let pp be a statement. Its negation is the statement ¬p\neg p, the statement that is true exactly when pp is false and vice-versa.

The statement ¬p\neg p is typically read "not pp" or “it is not the case that pp is true.” If we have an English rendering of pp, we try to write ¬p\neg p in a natural-sounding way that also conveys its logical structure.

Example. Suppose pp represents the statement “It is raining.” Then, ¬p\neg p stands for “It is not raining.”

Below is the truth table for ¬p\neg p.

pp ¬p\neg p
TT FF
FF TT

As you can see, on the left are columns corresponding to the atomic statements involved in ¬p\neg p (just pp), and each row is a possible combination of truth values for these statements. Here is a quick, useful theorem without proof:

Theorem 2.1.3. If the compound statement φ\varphi involves nn different atomic statements, then its truth table has 2n2^n rows.

Look familiar? We will prove this theorem when we prove the related theorem about power sets later on.

Negation is called a unary connective because it only involves one statement. Our remaining connectives are all binary connectives, as they combine two statements.

Definition 2.1.4. Let pp and qq be statements. Their conjunction is the statement pqp \wedge q, which is true only when pp and qq are both true.

The symbol in the conjunction is called a “wedge,” and pqp \wedge q is read "pp and qq." This definition is meant to be analogous to our English understanding of the word “and.” The statements pp and qq are called conjuncts.

Example. Letting pp be “It is raining” and qq be “I brought an umbrella,” the statement pqp \wedge q is “It is raining and I brought an umbrella.” The statement is true only if it is currently raining and the speaker brought an umbrella. If it is not raining, or if the speaker did not bring their umbrella, the conjunction is false.

Here is the truth table for pqp \wedge q:

pp qq pqp \wedge q
TT TT TT
TT FF FF
FF TT FF
FF FF FF

Notice that this table is organized intentionally, not haphazardly. The rows are divided into halves, where pp is true in one half and then false in the other half. The rows where pp is true are further subdivided into half where qq is true and half where qq is false. If there were a third letter, we would subdivide the rows again, and so on. This method of organizing the truth table allows us to ensure we did not forget a row. Our leftmost atomic letter column will always be half true followed by half false, and the rightmost atomic letter column will always alternate between true and false.

Definition 2.1.5. Let pp and qq be statements. Their disjunction is the statement pqp \vee q, which is true if at least one of pp or qq is true.

The symbol in the conjunction is called a “vee” and pqp \vee q is read "pp or qq." This definition is meant to be analogous to our English understanding of the word “or.” The statements pp and qq are called disjuncts. (Did you notice that I was able to copy and paste from an above paragraph? Conjunction and disjunction share a very similar structure.)

Example. Letting pp be “It is raining” and qq be “I brought an umbrella,” the statement pqp \vee q is “It is raining or I brought an umbrella.” This time, the speaker only needs to meet one of their conditions. They are telling the truth as long as it is raining or they brought an umbrella, or even if both statements are true.

The truth table for pqp \vee q follows.

pp qq pqp \vee q
TT TT TT
TT FF TT
FF TT TT
FF FF FF

Of great importance to computer science and electrical engineering, but not so much in mathematics (so it will not appear again in this book), is the exclusive disjunction (or “exclusive-or”). It is a modified disjunction where exactly one of the disjuncts must be true.

Definition 2.1.6. If pp and qq are statements, their exclusive disjunction is the statement pqp \oplus q, which is true if and only if exactly one of pp and qq is true.

The statement pqp \oplus q is read "pp exclusive-or qq."

Example. The statement “You can have the soup or the salad” is likely meant as an exclusive disjunction.

In mathematics, disjunctions are generally considered inclusive unless stated otherwise. (An example for us all to follow!)

Our next connective is probably the most important, as it conveys the idea of one statement implying another. As we discussed earlier in the chapter, mathematics is nothing but such statements. It is also the only connective that is not relatively easy to understand given its English interpretation. So we will begin with the truth table, and think carefully about each row.

Definition 2.1.7. If pp and qq are statements, the conditional statement pqp \to q is true if pp can never be true while qq is false.

There are many ways to read pqp \to q, including “if pp, then qq,” "pp implies qq“, and "pp is sufficient for qq.” The statement before the arrow is the antecedent and the following statement is the consequent. Here is the conditional’s truth table.

pp qq pqp \to q
TT TT TT
TT FF FF
FF TT TT
FF FF TT

That pqp \to q is true in the bottom two rows may surprise you. So, let’s consider each row separately.

Example. Let pp be “You clean your room” and qq be “I will pay you $10,” so that pqp\to q is “If you clean your room, then I will pay you $10.”

In the case that both pp and qq are true, the speaker has told the truth. You cleaned, and you were paid. All is well.

In the case that pp is true but qq is false, the speaker has lied. You cleaned the room, but you did not earn your money.

What if pp is false? Well, then the speaker did not lie. The speaker said that if you clean your room you will be paid. If the room isn’t cleaned, then pqp \to q cannot be falsified, no matter where qq is true or false.

If that explanation is not sufficient for you, reread the definition of the conditional: pqp \to q being true means that pp cannot be true without qq. If you are unsatisfied by the English connection, then consider pqp \to q to be a purely formal object whose definition is the above truth table.

Definition 2.1.8. If pp and qq are statements, the biconditional statement pqp \leftrightarrow q is true if pp and qq are both true, or both false.

The statement pqp \leftrightarrow q may be read as "pp if and only if qq, "pp is necessary and sufficient for qq," or "pp is equivalent to qq." Here is the truth table.

pp qq pqp \leftrightarrow q
TT TT TT
TT FF FF
FF TT FF
FF FF TT

Example. Let pp be “You clean your room” and qq be “I will pay you $10,” so that pqp\leftrightarrow q is “I will give you $10 if and only if you clean your room.” (The statements were reordered for readability. We will see later in the chapter that this is the same statement, but this wouldn’t work for pqp \to q.) This time, the clean room and the $10 paycheck are in total correspondence; one doesn’t happen without the other.

2.2 Connecting connectives

We are not required to use one connective at a time, of course. Most statements involve two or more connectives being combined at once. If φ\varphi and ψ\psi are statements, then so are: ¬φ\neg \varphi, φψ\varphi \wedge \psi, φψ\varphi \vee \psi, φψ\varphi \to \psi, and φψ\varphi \leftrightarrow \psi. This fact allows us to combine as many connectives and statements as we like.

How should we read the statement pqrp \wedge q \to r? Should it be read "pp; also, if qq then rr"? Or, “If pp and qq, then rr?” Just like with arithmetic, there are conventions defining which connective to apply first: negation, then conjunction/disjunction, then implication, then bi-implication. So, "If pp and qq, then rr" is correct. However, to eliminate confusion in this book we will always use parentheses to make our statements clear. Therefore, we would write this statement as (pq)r(p \wedge q) \to r.

Remember that a statement involving nn letters will have 2n2^n rows in its truth table. The column for the left-most statement letter should be true for half the rows then false for half the rows, and you should continue dividing the rows in half so that the right-most statement letter’s column is alternating between true and false.

Finally, it is a good idea to break a statement into small “chunks” and make a column for each “chunk” separately. Observe the following example.

Example. Write down the truth table for the statement (pq)(r¬s)(p \wedge q) \to (r \to \neg s). When is this statement false?

Since the statement has 44 letters, there will be 24=162^4=16 rows to the truth table. Let’s go through the first row very slowly, where all four atomic statements are true. Then pqp \wedge q is true and ¬s\neg s is false. Since rr is true and ¬s\neg s is false, that means r¬sr \to \neg s is false. Finally, because its antecedent is true and its consequent is false, the compound statement (pq)(r¬s)(p \wedge q) \to (r \to \neg s) would be false.

Here is the full truth table. Be sure you understand how each row is calculated.

pp qq rr ss pqp \wedge q ¬s\neg s r¬sr \to \neg s (pq)(r¬s)(p \wedge q) \to (r \to \neg s)
TT TT TT TT TT FF FF FF
TT TT TT FF TT TT TT TT
TT TT FF TT TT FF TT TT
TT TT FF FF TT TT TT TT
TT FF TT TT FF FF FF TT
TT FF TT FF FF TT TT TT
TT FF FF TT FF FF TT TT
TT FF FF FF FF TT TT TT
FF TT TT TT FF FF FF TT
FF TT TT FF FF TT TT TT
FF TT FF TT FF FF TT TT
FF TT FF FF FF TT TT TT
FF FF TT TT FF FF FF TT
FF FF TT FF FF TT TT TT
FF FF FF TT FF FF TT TT
FF FF FF FF FF TT TT TT

There are tricks to calculate truth tables more efficiently. However, it wouldn’t do you any good to read them; they must come with practice.

2.3 Inference and equivalence

Mathematics is all about relationships. If the statement φψ\varphi \to \psi is always true, then it tells us that whenever φ\varphi is true then ψ\psi must be as well. If the statement φψ\varphi \leftrightarrow \psi is always true, then it tells us that from the point of view of logic φ\varphi and ψ\psi are actually the same statement. That is, it is never possible for them to have different truth values.

Definition 2.3.1. A tautology is a statement that is always true, denoted TT.

Definition 2.3.2. A contradiction is a statement that is always false, denoted FF.

Definition 2.3.3. A rule of inference is a tautological conditional statement. If φψ\varphi \to \psi is a rule of inference, then we say φ\varphi implies ψ\psi and write φψ\varphi \Rightarrow \psi.

Remember that a conditional statement does not say that its antecedent and consequent are both true. It says that if its antecedent is true, its consequent must also be true.

There are many rules of inference, but we will explore two.

Definition 2.3.4. Modus ponens is the statement that says we may infer qq from pqp \to q and pp. Symbolically: [(pq)p]q.[(p \to q) \wedge p] \to q.
Example. Consider the conditional statement “If you clean your room, then I will give you $10.” If the speaker is trustworthy, and we clean our room, then we may conclude we will be receiving $10.

Another example: “If a function is differentiable, then it is continuous.” (You may know the meanings of these words, but it is not important.) If this statement is true, and the function ff is differentiable, then without any further work we know that ff is continuous.

Theorem 2.3.5. Modus ponens is a rule of inference.

Notice that in the definition no claim is made that modus ponens is always true. That is up to us.

Since we know how to use truth tables, we may construct one to show that [(pq)p]q[(p \to q) \wedge p] \to q is always true. Before reading the table below, try to make your own!

pp qq (pq)(p \to q) (pq)p(p \to q) \wedge p [(pq)p]q[(p \to q) \wedge p] \to q
TT TT TT TT TT
TT FF FF FF TT
FF TT TT FF TT
FF FF TT FF TT

Proof. The above table shows that [(pq)p]q[(p \to q) \wedge p] \to q is true no matter the truth of pp and qq. \square

Definition 2.3.6. In the context of conditional statements, transitivity is the rule that if pp implies qq and qq implies rr, then pp implies rr. Or, [(pq)(qr)](pr).[(p \to q) \wedge (q \to r)] \to (p \to r).

The wording of the definition above should you make you suspicious that this is not the only time we will see transitivity. In fact, you may have seen it before…

Example. If you give a mouse a cookie, then he will ask for a glass of milk. If he asks for a glass of milk, then he will also ask for a straw. Therefore, if you give a mouse of cookie then he will ask for a straw.

Theorem 2.3.7. Transitivity is a rule of inference.

We could simply write down another truth table and call it a day. In fact, go ahead and make sure you can prove transitivity with a truth table. Once you’re done, we’ll try something slicker to prove transitivity.

Proof. Suppose that pqp \to q and qrq \to r are true statements, and that moreover pp is true. By modus ponens, qq must also be true. Then, rr is true, by a second application of modus ponens. Therefore, if pp is true, rr must be also; this is the definition of prp \to r. \square

Pretty cool, right? We can cite proven theorems to prove a new one.

Does it work the other way? In other words: if I know that [(pq)(qr)](pr)[(p \to q) \wedge (q \to r)] \Rightarrow (p \to r), does that mean (pr)[(pq)(qr)](p \to r) \Rightarrow [(p \to q) \wedge (q \to r)]? Regrettably, no: consider the case there pp and rr are true but qq is false.

Some rules of inference do work both ways. These are called equivalences.

Definition 2.3.8. If φψ\varphi \leftrightarrow \psi is a tautology, then it is an equivalence and φ\varphi and ψ\psi are equivalent statements. We write φψ\varphi \equiv \psi.

Equivalent statements are not the same statements, but they are close enough from the point of view of logic. Equivalent statements are interchangeable, and may be freely substituted with one another.

We will quickly run through many equivalences before slowing down to prove a few.

Algebraic properties of conjunction, disjunction, and negation

Our first batch of statements should look familiar to you; these are the ways that statements “act like” numbers. As you read each one, try to justify to yourself why it might be true. If you aren’t sure of one, make a truth table.

Equivalence Name
p¬(¬p)p \equiv \neg (\neg p) Double negation
pqqpp \wedge q \equiv q \wedge p Commutative of conjunction
pqqpp \vee q \equiv q \vee p Commutative of disjunction
(pq)rp(qr)pqr(p \wedge q) \wedge r \equiv p \wedge (q \wedge r) \equiv p \wedge q \wedge r Associativity of conjunction
(pq)rp(qr)pqr(p \vee q) \vee r \equiv p \vee (q \vee r) \equiv p \vee q \vee r Associativity of disjunction
pTpp \wedge T \equiv p Conjunctive identity
pFpp \vee F \equiv p Disjunctive identity
pFFp \wedge F \equiv F Conjunctive absorption
pTTp \vee T \equiv T Disjunctive absorption
p¬pFp \wedge \neg p \equiv F Conjunctive cancellation
p¬pTp \vee \neg p \equiv T Disjunctive cancellation

Distributive properties

In our numbers, multiplication distributes over addition. This is one fact that is likely implemented twice in your head, once for a number: a(b+c)=ab+ac,a(b+c) = ab + ac, and once for “the negative” (truly the number 1-1): (b+c)=bc.-(b+c) = -b - c. However, these rules are one and the same; the distributive law. Likewise, we will have distributive laws for negation, conjunction, and disjunction.

Equivalence Name
¬(pq)¬p¬q\neg (p \wedge q) \equiv \neg p \vee \neg q DeMorgan’s law
¬(pq)¬p¬q\neg (p \vee q) \equiv \neg p \wedge \neg q DeMorgan’s law
p(qr)(pq)(pr)p \wedge (q \vee r) \equiv (p \wedge q) \vee (p \wedge r) Conjunction over disjunction
p(qr)(pq)(pr)p \vee (q \wedge r) \equiv (p \vee q) \wedge (p \vee r) Disjunction over conjunction

DeMorgan’s law is one of the most important facts in basic logic. In fact, we will see it three times before we are through. In English, DeMorgan’s law says that the negation of a conjunction is a disjunction and vice-versa.

To show that ¬(pq)\neg (p \wedge q) and ¬p¬q\neg p \vee \neg q are equivalent, we can show that the biconditional statement ¬(pq)¬p¬q\neg (p \wedge q) \leftrightarrow \neg p \vee \neg q is a tautology. Thinking about the definition of the biconditional gives us another way: we can show that the truth tables of ¬(pq)\neg (p \wedge q) and ¬p¬q\neg p \vee \neg q are the same; in other words, that both statements are always true and false for the same values of pp and qq.

Proof (of DeMorgan’s law for conjunction). Let pp and qq be statements. The statements ¬(pq)\neg (p \wedge q) and ¬p¬q\neg p \vee \neg q are equivalent, as shown by the below truth table.

pp qq ¬(pq)\neg (p \wedge q) ¬p¬q\neg p \vee \neg q
TT TT FF FF
TT FF TT TT
FF TT TT TT
FF FF TT TT

Therefore, ¬(pq)¬p¬q\neg (p \wedge q) \equiv \neg p \vee \neg q. \square

You’ll notice a few columns are skipped in the above truth table. After a few tables, it is more convenient to skip “easy” columns. For example, we know that pqp \wedge q is true only when pp and qq are both true, and false everywhere else. Therefore, we can “flip” that column to quickly get the column for ¬(pq)\neg (p \wedge q).

As an exercise, prove DeMorgan’s other law.

Properties of the conditional

Finally, we come to the properties of conditional (and biconditional) statements. These do not have “obvious” numerical counterparts, but are useful all the same.

Equivalence Name
pq¬pqp \to q \equiv \neg p \vee q Material implication
¬(pq)p¬q\neg (p \to q) \equiv p \wedge \neg q False conditional
pq¬q¬pp \to q \equiv \neg q \to \neg p Contraposition
pq(pq)(qp)p \leftrightarrow q \equiv (p \to q) \wedge (q \to p) Mutual implication

Material implication is the surprising result that the conditional is just a jumped-up disjunction! In fact, DeMorgan’s law, material implication, and mutual implication together prove that all of the connectives we’ve studied can be implemented with just two: negation and one other. However, this would make things difficult to read, so we allow the redundancy of five connectives.

Let’s prove material implication with a truth table, and then we will prove contraposition with another method.

Proof (of material implication). Let pp and qq be statements and observe that the truth tables for pqp \to q and ¬pq\neg p \vee q are the same:

pp qq pqp \to q ¬pq\neg p \vee q
TT TT TT TT
TT FF FF FF
FF TT TT TT
FF FF TT TT

Therefore, pq¬pqp \to q \equiv \neg p \vee q. \square

There is another way to think about material implication. The statement pqp \to q means that pp never happens without qq. Another way to say this is that qq happens, or else pp did not; so, ¬pq\neg p \vee q.

Before we prove contraposition by the other method, we must justify said method with a theorem.

Theorem 2.3.9. Statements φ\varphi and ψ\psi are equivalent if and only if φψ\varphi \Rightarrow \psi and ψφ\psi \Rightarrow \varphi are both rules of inference.

The phrase “if and only if” in a mathematical statement means that the statement “works both ways.” In this case, φψ\varphi \equiv \psi means φψ\varphi \Rightarrow \psi and ψφ\psi \Rightarrow \varphi are both rules of inference; and conversely, φψ\varphi \Rightarrow \psi and ψφ\psi \Rightarrow \varphi both being rules of inference means φψ\varphi \equiv \psi. It means that “φψ\varphi \equiv \psi” and “”φψ\varphi \Rightarrow \psi and ψφ\psi \Rightarrow \varphi are both rules of inference" are equivalent statements. Fortunately, we already have a rule that justifies this equivalence.

Proof (of Theorem 2.3.9). Mutual implication says that pq(pq)(qp)p \leftrightarrow q \equiv (p \to q) \wedge (q \to p). \square

It’s that easy. Whenever we have a conditional statement and its converse (the order reversed), we have the biconditional; likewise, we may trade in a biconditional for two converse conditional statements.

Now we will prove contraposition by showing that pqp\to q implies ¬q¬p\neg q \to \neg p, and vice-versa. Before you read the below proof, write down these rules: material implication, double negation, and commutativity of disjunction.

Proof (of contraposition). Suppose pp and qq are statements and that pqp \to q is true. By material implication, ¬pq\neg p \vee q is true. Commutativity says this is the same as q¬pq \vee \neg p. So, we may apply material implication backwards, negating qq this time, to get ¬q¬p\neg q \to \neg p; therefore, pqp \to q implies ¬q¬p\neg q \to \neg p.

Conversely, suppose ¬q¬p\neg q \to \neg p is true. Then material implication (and double negation) gives q¬pq \vee \neg p. We may commute the disjunction and apply material implication again, negating ¬p\neg p, to get pqp \to q.

By mutual implication, pq¬q¬pp \to q \equiv \neg q \to \neg p. \square

Takeaways

Chapter 2 problems

3. Predicate logic

Videos:

3.1 Predicates

Basic propositional logic as you learned it in the preceding chapter is powerful enough to give us the tools we will need to prove many theorems. However, it is deficient in one key area. Consider the following example.

Example. If Hypatia is a woman and all women are mortal, then Hypatia is mortal.

We can all agree that this statement is a tautology; that is, if it is true that Hypatia is a woman (she was; she was a Greek mathematician contemporary with Diophantus) and if it is true that all women are mortal (as far as we know this is the case) then it must follow that Hypatia is mortal (she was murdered in 415).

So the statement is a tautology. If so, the truth table would confirm it. Let (pq)r(p \wedge q) \to r symbolize the statement, where rr stands for the statement that Hypatia is mortal.

pp qq rr pqp\wedge q (pq)r(p \wedge q)\to r
TT TT TT TT TT
TT TT FF TT FF
TT FF TT FF TT
TT FF FF FF TT
FF TT TT FF TT
FF TT FF FF TT
FF FF TT FF TT
FF FF FF FF TT

Oh no! Our second row gives that the statement is false when rr is false but pp and qq are true. What gives?

Well, consider the statement “If pigs fly and the Pope turns into a bear, then Bob will earn an A on the exam.” Now it is not so clear that this is a tautology, because pigs, the Pope, and Bob’s grade presumably have nothing to do with one another.

But Hypatia and the mortality of women do. What we will need is the ability to signify that two statements are related, whether in subject (e.g. Hypatia) or predicate (e.g. mortality). This is where predicate logic comes in to help.

Definition 3.1.1. A predicate is a function P:Ω{T,F}P:\Omega \to \{T,F\} that takes one or more subjects (members of some domain Ω\Omega) and assigns to them all either true TT or false FF.

Predicates are typically denoted with capital letters like PP, QQ, and RR. Subjects that are known are given lowercase letters like aa, bb, and cc, unless there is something more reasonable to pick like hh for Hypatia. Subjects whose values are unknown or arbitrary are denoted with lowercase xx, yy, zz, etc.

The definition of predicate above is much more complicated than it needs to be; this is so that we can take the central idea (a function) and continue using it throughout the book. You know a function: an assigment f(x)f(x) that gives every input xx exactly one output f(x)f(x). We write f:XYf:X \to Y to signify that ff is a function whose inputs are elements of the set XX and whose outputs are elements of the set YY.

So, in the above definition, the predicate PP is a function that takes a subject, a member of the set Ω\Omega, and assigns it exactly one truth value.

(Take note that propositions, which you studied in the last chapter, may be regarded as constant predicates – functions whose value is the same regardless of the input, like f(x)=3f(x)=3.)

Examples. Letting W(x)W(x) be the predicate "xx is a woman" on the domain of humans and hh stand for Hypatia, W(h)W(h) is the statement “Hypatia is a woman.” Since this statement is true, the function WW assigns TT to hh.

Let P(x)P(x) be the predicate “x+3=5x + 3 = 5” on the domain Z\mathbf{Z} and let Q(x,y)Q(x,y) be the predicate "xx orbits yy" on the domain of celestial objects in the sky. The statement P(2)P(2) is true, but P(3)P(3) is false. If ee stands for the earth and ss for the sun, then Q(s,e)Q(s,e) is false but Q(e,s)Q(e,s) is true.

A predicate, once evaluated to a subject, is a statement. Therefore, these statements may be combined into compound statements using the logical connectives we have just studied.

Examples. As before let P(x)P(x) be the predicate “x+3=5x + 3 = 5” on the domain Z\mathbf{Z} and let Q(x,y)Q(x,y) be the predicate "xx orbits yy" on the domain of celestial objects in the sky, where ee stands for the earth and ss for the sun.

The statements P(2)Q(s,e)P(2) \vee Q(s,e), ¬P(3)\neg P(3), and Q(s,e)P(2)Q(s,e) \to P(2) are true.

The statements P(2)Q(s,e)P(2) \wedge Q(s,e) and P(2)Q(s,e)P(2) \to Q(s,e) ae false.

3.2 Quantifiers

This is all well and good, you think, but what about the statement “All women are mortal”? How can we make “all women” the subject of a statement?

Quantifiers allow us to say that a predicate is true for all, or for at least one undetermined, object in its domain.

Definition 3.2.1 Let P(x)P(x) be a predicate. The universal statement xP(x)\forall x P(x) says that P(x)P(x) is true for all xx in its domain. The symbol \forall (“for all”) is called the universal quantifier.

Example. The domain is very important. If P(x)P(x) is the predicate "xx takes discrete math" then xP(x)\forall x P(x) – “Everyone takes discrete math” – is true on the domain of computer science students at Coastal Carolina, but not, say, on the domain of all people in South Carolina.

Definition 3.2.2 Let P(x)P(x) be a predicate. The existential statement xP(x)\exists x P(x) says that P(x)P(x) is true for at least one xx in its domain. The symbol \exists (“there exists”) is called the existential quantifier.

Example. Again let P(x)P(x) be the predicate "xx takes discrete math". The existential statement xP(x)\exists x P(x) says “Someone takes discrete math.” This is now true for people in South Carolina – pick a student at Coastal, or Clemson, or USC – but not true on the domain of dogs. Wouldn’t that be cute, though?

The statement operated on by a quantifier is called its scope. Without parentheses, a quantifier is considered to apply only to the statement it is next to and no others.

Examples. Let P(x)P(x) be the predicate "xx takes discrete math" and Q(x)Q(x) be the predicate "xx is a math major".

There are two important quantified statements: “All PP's are QQ's” and "Some PP is a QQ". These statements are all over the place in mathematics. We like to characterize mathematical objects by saying all objects of one type also belong to another; or, by saying we can find an object of one type that is another type.

In “All PP's are QQ's”, the word “all” suggests that the universal quantifier is appropriate. What connective should be used to combine the predicates? Observe that the statement x(P(x)Q(x))\forall x (P(x) \wedge Q(x)) says that every xx is both a PP and a QQ. This isn’t what we mean to say; we mean to say if something is a PP then it will also be a QQ. So, the correct rendering of “All PP's are QQ's” is x(P(x)Q(x))\forall x (P(x) \to Q(x)).

For the statement “Some PP's are QQ's”, we mean to say that there is an object that is at once a PP and a QQ. Therefore, this time the conjunction is appropriate: we write x[P(x)Q(x)]\exists x [P(x) \wedge Q(x)].

Statement Implementation
“All PP's are QQ's” x(P(x)Q(x))\forall x (P(x) \to Q(x))
“Some PP's are QQ's” x(P(x)Q(x))\exists x (P(x) \wedge Q(x))

Finally, what is to be done with statements with two or more subjects? The ordering of the quantifiers matters. Here are the four possible combinations of universal and existential quantifiers for two subjects, as well as a diagram corresponding to each.

Examples. Let R(x,y)R(x,y) be the predicate "xx and yy are classmates" over the domain of students at some school. Note that, in this case, the order of the subjects does not affect the truth of the statement.

The statement xyR(x,y)\exists xy R(x,y) means “There are two students who are classmates.” In other words, there is at least one student who is paired with at least one other student. Some writers say xyR(x,y)\exists x \exists y R(x,y) instead, but when the quantifiers are the same we will “collapse” the notation.

On the left, one object is paired with one object (existential-existential). On the right, one object is paired with many objects (existential-universal).

Figure. On the left, one object is paired with one object (existential-existential). On the right, one object is paired with many objects (existential-universal).

The statement xyR(x,y)\exists x \forall y R(x,y) means “There is a student who is classmates with every student.” We have one student who (somehow!) is classmates with the entire student body.

The statement xyR(x,y)\forall x \exists y R(x,y) means “Every student has a classmate.” Notice that their classmate need not be the same person. If we quantify universally first, then each existential object may be different. In the preceding example, we quantified existentially first, so that one person was classmates with everybody.

On the left, many objects are each paired with an object (universal-existential). On the right, many objects are paired with many objects (universal-universal).

Figure. On the left, many objects are each paired with an object (universal-existential). On the right, many objects are paired with many objects (universal-universal).

The statement xyR(x,y)\forall xy R(x,y) means “Every student is classmates with every other student.” Every object in our domain is paired up with every other object in the domain, including themselves! (Don’t think too hard about that for this example.)

3.3 Inferences and equivalences for quantifiers

Quantifiers evaluated on a predicate are statements, since they are functions to {T,F}\{T,F\}. Therefore, some quantified statements imply others, and some pairs imply each other. In this section we will learn a few important rules for quantifiers, before finally proving that our opening statement about Hypatia is indeed a tautology.

Theorem 3.3.1 (DeMorgan’s laws) Let P(x)P(x) be a predicate. Then ¬xP(x)x¬P(x)\neg \forall x P(x) \equiv \exists x \neg P(x) and ¬xP(x)x¬P(x)\neg \exists x P(x) \equiv \forall x \neg P(x).

Does that name look familiar? Here is our second application of DeMorgan’s laws. (The third will appear in the exercises.) You will see how they are used in the proof.

Proof. Consider the universal statement xP(x)\forall x P(x) on the domain Ω={x1,x2,xn}\Omega = \{x_1, x_2, \ldots x_n\}. We may rewrite xP(x)=P(x1)P(x2)P(xn)\forall x P(x) = P(x_1) \wedge P(x_2) \wedge \cdots \wedge P(x_n)
and xP(x)=P(x1)P(x2)P(xn).\exists x P(x) = P(x_1) \vee P(x_2) \vee \cdots \vee P(x_n).
Therefore, ¬xP(x)=¬(P(x1)P(x2)P(xn)).\neg \forall x P(x) = \neg (P(x_1) \wedge P(x_2) \wedge \cdots \wedge P(x_n)).
Using DeMorgan’s law, ¬xP(x)=¬P(x1)¬P(x2)¬P(xn)),\neg \forall x P(x) = \neg P(x_1) \vee \neg P(x_2) \vee \cdots \vee \neg P(x_n)),
which is the same as the statement x¬P(x)\exists x \neg P(x).
That ¬xP(x)x¬P(x)\neg \exists x P(x) \equiv \forall x \neg P(x) may be proven in the same way. \square

The idea is that a universal statement is a “big conjunction” and an existential statement is a “big disjunction.” The proof follows from the DeMorgan’s laws you learned in the preceding chapter .

The next two theorems deal with how universal and existential statements may imply one another.

Theorem 3.3.2 (Universal instantiation) If aa is a member of the domain of the predicate PP, then xP(x)P(a)\forall x P(x) \Rightarrow P(a).

Proof. If PP is true for every member of its domain and aa is a member of said domain, PP must be true for aa. \square

Theorem 3.3.3 (Existential generalization) If aa is a member of the domain of the predicate PP, then P(a)xP(x)P(a) \Rightarrow \exists x P(x).

Proof. Let aa be in the domain of PP. If P(a)P(a) is true, then PP is true for some member of its domain. \square

There are two comments to make about these two theorems. First, universal generalization and existential instantiation are not valid:

Examples. “I was able to easily pay off my college loans, therefore everyone can” is an example of invalid universal generalization.

“Someone in this room killed the wealthy magnate, so it must have been you!” is an example of invalid existential instantiation.

Finally, transitivity gives us that a universal statement implies an existential one: xP(x)P(a)xP(x)\forall x P(x) \Rightarrow P(a) \Rightarrow \exists x P(x).

We are ready to prove our tautology is, in fact, a tautology. Remember that the statement was “If Hypatia is a woman and all women are mortal, then Hypatia is mortal.”

We have seen that “Hypatia is a woman” may be rendered as W(h)W(h) for the predicate W(x)W(x) being "xx is a woman" and hh being Hypatia. Likewise, “Hypatia is mortal” could be rendered M(h)M(h) where M(x)M(x) is "xx is mortal". Finally, the statement “All women are mortal” would be x(W(x)M(x))\forall x (W(x) \to M(x)).

Theorem 3.3.4 The statement [P(a)x(P(x)Q(x))]Q(a)[P(a) \wedge \forall x (P(x) \to Q(x))] \to Q(a) is a tautology.

We have changed our labeling to the more standard PP, QQ, and aa to emphasize that this statement is always true no matter what we are talking about.

Proof. Suppose that for some predicate PP and some aa in the domain of PP that P(a)P(a) is true, and that for some predicate QQ the statement x(P(x)Q(x))\forall x (P(x) \to Q(x)) is true.

By universal instantiation, x(P(x)Q(x))\forall x (P(x) \to Q(x)) means that P(a)Q(a)P(a) \to Q(a) is true.

By modus ponens on P(a)P(a) and P(a)Q(a)P(a) \to Q(a), we know Q(a)Q(a) must be true.

Therefore, [P(a)x(P(x)Q(x))]Q(a)[P(a) \wedge \forall x (P(x) \to Q(x))] \to Q(a) is always true. \square

Takeaways

Chapter 3 problems

Proofs

Review:

Videos:

Now we have things to talk about – sets and their elements – and ways of talking about those things – propositional and predicate logic. This opens the door to the true game of mathematics: proof.

Mathematics stands apart from the natural sciences because its facts must be proven. Whereas theories in the natural sciences can only be supported by evidence, mathematical facts can be shown to be indisputably true based on the agreed definitions and axioms.

For example, once you have agreed upon the definitions of “integer,” “even,” and the rules of arithmetic, the statement “An even integer plus an even integer is an even integer” is without doubt. In fact, you will prove this statement yourself in the exercises to this chapter.

Now, this introduction is not meant to imply that mathematical epistemology (the study of knowledge) is without its issues. For example: What definitions and axioms are good ones? (“Is there such a thing as a real number?” can entertain you for a while.) What sorts of people decide which mathematical research is worth doing, and who decides which proofs are convincing? Are all published mathematical proofs correct? (They aren’t.) What is the role of computers in mathematical proof?

Fortunately for us, we don’t have to worry about these questions. (Unless we want to. They’re quite interesting.) In this chapter we will simply learn what a mathematical proof is, learn some basic facts about the integers, and practice writing direct and indirect proofs about mathematical facts that are not particularly in dispute.

4.1 Direct proofs

What is a proof? In this book we will adopt Cathy O’Neil’s position that a proof is a social action between speaker/writer and listener/reader that convinces the second party that a statement is true.

Several proofs have been written in this book so far. Many begin with an italicized Proof. and end in a little \square, as a way to set them off from definitions, theorems, comments, and examples. Hopefully they have all been convincing to you. (DW: If not, my e-mail address is somewhere on this webpage.)

As you practice writing your own proofs, take for an audience someone else in your position: a new student of discrete mathematics. Imagine the reader knows slightly less than you do. A good proof gently guides the reader along, making them feel smart when they anticipate a move.

It is also this book’s position that talking about proofs is nearly useless. They must be read and practiced. With that in mind, let’s get to writing some proofs about integers.

We will take for granted that sums, differences, products, and powers of integers are still integers. (Depending on how the set of integers is defined, these qualities may be “baked in” to the definition.) Every other fact will either be stated as a definition or a theorem, and proven in the latter case.

Definition 4.1.1. An integer nn is even if there exists an integer kk such that n=2kn=2k.

Definition 4.1.2. An integer nn is odd if there exists an integer kk such that n=2k+1n=2k+1.

For examples, 28=2(14)28=2(14), and 39=2(19)+139=2(19)+1.

Theorem 4.1.3. If an integer nn is odd, so is n2n^2.

This statement may be obvious to you. If so, great; but we are trying to learn about the writing, so it is best to keep the mathematics simple. (It will only get a little more challenging by the end of the chapter.) To prove this statement, we must start with an arbitrary odd integer and prove that its square is odd. The integer in question must be arbitrary: if we prove the statement for 3, then we must justify 5, 7, 9, and so on.

Proof. Let nn be an odd integer.

The first step in the proof is to assume any of the theorem’s hypotheses. The theorem says that if we have an odd integer, its square we will be odd; so, we are entitled to assume an odd integer. However, there is nothing more we may assume. Fortunately, we know what it means for an integer to be odd.

That means there exists an integer kk such that n=2k+1n=2k+1.

A great second step is to “unpack” any definitions involved in the assumptions made. We assumed nn was odd. Well, what does that mean? However, now we are “out of road.” Let’s turn our attention to the object we mean to say something about; n2n^2.

Therefore, n2=(2k+1)2=4k2+4k+1=2(2k2+2k)+1, n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2+2k) + 1, which is an odd number by definition.

Remember: sums, powers and products of integers are integers – this is the only thing we take for granted, other than, y’know, arithmetic – and so 2k2+2k2k^2 + 2k is an integer, call it mm. Then 2m+12m+1 is odd. The final step – obviously rendering the result in the explicit form of an odd number – is a good one to take because we assume our reader to be as new at this as we are. They likely recognize 4k2+4k+14k^2+4k+1 is odd, but we take the additional step to ensure maximum clarity.

Thus, if nn is an odd integer so is its square. \square

Definition 4.1.4. Let aa be a nonzero integer and let nn be an integer. We say aa divides nn or that nn is divisible by aa if there exists an integer kk such that n=akn=ak. We write ana|n when this is the case.

Theorem 4.1.5. Divisibility has the following three properties:

(These three properties together define something called a partial order, which is studied in the last chapter. Foreshadowing!)

You will prove two of these properties for yourself in the exercises. Here is a demonstration of the proof of the second property.

Proof. Suppose that mm and nn are positive integers where mnm|n and nmn|m.

As before, we are assuming only what is assumed in the statement of the theorem: that mm and nn are positive divide one another. By the way: why did we have to say they are positive?

That means that there exist integers k1k_1 and k2k_2 such that n=k1mn=k_1m and m=k2nm=k_2n.

The second step is usually to apply relevant definitions.

Combining these two statements, we have m=(k1k2)mm=(k_1k_2)m. Since mm is not zero, we see 1=k1k21=k_1k_2. Because k1k_1 and k2k_2 must be integers by definition, the only way that k1k2=1k_1k_2=1 is for k1=k2=1k_1=k_2=1.

If we didn’t have the restriction that k1k_1 and k2k_2 were integers, then k1=1/k2k_1=1/k_2. However, this is only possible for integers when both numbers are one.

Therefore, if mm and nn are positive integers that divide one another, then m=nm=n. \square

In the next section we will explore indirect proofs and, from there, bidirectional proofs. You are highly recommended to try some exercises before moving on.

4.2 Indirect and bidirectional proofs

Consider the following theorem.

Theorem 4.2.1 If nn is an integer such that n2n^2 is odd, then nn is odd.

This theorem may seem the same as Theorem 4.1.3, but note carefully that it is not: it is the converse Q(n)P(n)Q(n) \to P(n) to the previous theorem’s P(n)Q(n)P(n) \to Q(n).

Let’s try to imagine a direct proof of this theorem. Since n2n^2 is odd, there exists an integer kk such that n2=2k+1n^2 = 2k+1. And then, seeking to prove a fact about nn, we take the positive square root of each side: n=2k+1n = \sqrt{2k+1}. And…ew. Have we even proven anything about square roots? (We haven’t, and we won’t.)

We shall, instead, employ some sleight of hand. Recall from Chapter 2 that a conditional statement pqp \to q is equivalent to its contrapositive ¬q¬p\neg q \to \neg p.

Proof (of Theorem 4.2.1). Suppose that nn is even. Then there exists an integer kk such that n=2kn=2k. In that case, n2=(2k)2=4k2=2(2k2),n^2 = (2k)^2 = 4k^2 = 2(2k^2), which is even. Therefore for n2n^2 to be odd, nn must also be odd. \square

Another equivalence from Chapter 2 was that pqp \to q and qpq \to p combine to create pqp \leftrightarrow q. That is, we can prove an “if and only if” statement by proving both the “forward” and “backward” directions (which is which is a matter of perception). Therefore, in this chapter we have proven the following statement:

Theorem 4.2.2 An integer nn is odd if and only if n2n^2 is.

Remember that an “if and only if” statement is true if the component statements are both true or both false. So, this theorem also tells us that an integer is even if and only if its square is.

Theorem 4.2.3 Let nn be an integer. The integer nn is odd if and only if n2+4n+1n^2+4n+1 is even.

Proof. Let nn be an integer. First, suppose nn is odd. That means that there exists an integer kk where n=2k+1n=2k+1. So, n2+4n+1=(2k+1)2+4(2k+1)+1=4k2+4k+1+8k+4+1, n^2 + 4n +1 = (2k+1)^2 + 4(2k+1) + 1 = 4k^2 + 4k + 1 + 8k + 4 + 1,which may be written as 2(2k2+6k+3),2(2k^2 + 6k + 3), which is even. Therefore if nn is odd then n2+4n+1n^2+4n+1 is even.

Conversely, suppose nn is even. That means that n=2kn=2k for some integer kk. In that case, n2+4n+1=(2k)2+4(2k)+1,n^2 + 4n + 1 = (2k)^2 + 4(2k) + 1,which is2(2k2+4k)+1,2(2k^2+4k)+1, which is odd. Therefore for n2+4n+1n^2+4n+1 to be even, nn must be odd.

So, nn is odd if and only if n2+4n+1n^2+4n+1 is even. \square

4.3 Counterexamples

When proving a universal statement on a domain, it never suffices to just do examples (unless you do every example, which is usually not feasible or possible). However, remember from Chapter 3 that the negation of a universal statement is an existential one. You can prove that "every PP is a QQ" is false by finding just one PP that isn’t a QQ.

Examples. The statement “all prime numbers are odd” is false because 22 is prime and even.

The statement “all odd numbers are prime” is false because 99 is odd and not prime.

Takeaways

Chapter 4 problems

5. Recursion and integer representation

Videos:

One of the most important ideas in discrete mathematics is that of recursion, which we will explore in one way or another for many of the remaining chapters in the course. In this chapter, we will explore it through the idea of integer representation; representing integers in bases other than the common base 10. Bases you are likely to encounter in computer science contexts are base 2 (binary), base 8 (octal), and base 16 (hexadecimal). We will learn how to represent an integer in binary using a recursive algorithm, and then we will cheat our way into octal and hexadecimal.

5.1 Recursion

Recursion is when an algorithm, object, or calculation is constructed recursively.

Just kidding. Reread this joke after the chapter and it’ll make more sense.

Recursion means that an algorithm, object, or calculation is written, defined, or performed by making reference to itself. It is something more easily understood by example, and there will be plenty as we proceed. For now, consider the following example.

Definition 5.1.1 Let nn be a positive integer. Then n!n! ("nn factorial") is the product of all positive integers at most nn: n!=1×2××n=n!n! = 1 \times 2 \times \cdots \times n = n! Furthermore, we define 0!=10!=1.

Anyone questioning the usefulness of n!n! will have their questions answered in excruciating detail over the course of Chapters 9-13. If the reader has had calculus, then they may recognize n!n! as the denominator of the terms in the Taylor expansion of the function exe^x; it is good for “absorbing” constants created by taking derivatives. However, we are not interested in any of its uses right now.

Example. Here are the values of a few factorials: 0!=10!=1, 1!=11!=1, 2!=2×1=22!=2\times 1 = 2, 3!=3×2×1=63!=3\times 2 \times 1 = 6, 4!=4×3×2×1=244!=4 \times 3 \times 2 \times 1 = 24, and 5!=5×4×3×2×1=1205!=5\times 4\times 3 \times 2 \times 1=120.

You may have already noticed the recursion at work. Take for instance 5!=5×4×3×2×15!=5 \times 4 \times 3 \times 2 \times 1. We see the value of 4!4! lurking here: 5!=5×4!5!=5 \times 4!.

In fact, for all positive nn, we can make the alternative (usually more useful) definition n!=n(n1)!n!=n(n-1)!. This is the recursive defintion of the factorial, because we are defining the factorial in terms of the factorial.

Why is this okay? Well, we’re really collapsing a lot of work into two lines: it may make more sense to break it down one step at a time. First, define 0!=10!=1. This definition will make sense in a later chapter, but remember that mathematical definitions don’t necessarily need to “make sense.”

Next, define 1!=1×0!1!=1 \times 0!. This definition is valid, because the number 11, the operation ×\times, and the number 0!0! have all been defined. And when we go to define 2!=2×1!2!=2 \times 1!, we have already defined all of these objects as well. And so on. Therefore, the following definition is a valid one.

Definition 5.1.2 Let nn be a natural number. Then n!n! is equal to 1 when n=0n=0, and n(n1)!n(n-1)! otherwise.

5.2 Binary numbers

Now that we know how recursion works (or at least have more of an inkling than we did upon starting the chapter), we will see another example of it in the calculation of the binary representation of a number.

When we speak of the number one thousand five hundred eight or 1,5081,508, we are implicitly talking about the composition of the number in terms of powers of 10: 1,508=1×103+5×102+0×101+8×100,1,508 = 1\times 10^3 + 5 \times 10^2 + 0 \times 10^1 + 8 \times 10^0, remembering that b0=1b^0=1 for all nonzero bb.

African and Asian mathematicians were writing in base-10, or decimal notation, as early as 3,000 BCE. Presumably the prevalence of decimal notation throughout human history is due to the fact that most of us have ten fingers and ten toes, making the base convenient. However, computers only have two fingers and two toes.

That last line is a joke, but here’s the idea: Computers read information by way of electrical signals, which are either on or off. So the simplest way for a computer to process information is in terms of two states. Here is a higher-level example: suppose you are writing some code. Is it easier to have a single if-then statement (two possible states), or nine (ten possible states)? This is why base-2, or binary numbers, are so prevalent in computer science.

The first step to becoming adept with binary numbers is to know many powers of 22. Fortunately, powers of 22 are easy to calculate: simply multiply the last value by 22 to get the next one. (This is a recursive definition for 2n2^n!)

nn 00 11 22 33 44 55 66 77 88 99
2n2^n 11 22 44 88 1616 3232 6464 128128 256256 512512

Algorithm 5.2.1 Let nn be a natural number. To write the dd-digit binary expansion (ad1ad2a1a0)2(a_{d-1}a_{d-2}\ldots a_1a_0)_2 of nn:

This is perhaps a little intimidating at first glance, so let’s press on with an example.

Example. Let’s calculate the binary expansion of n=1,508n=1,508. Our target nn is neither 00 nor 11, so we must find the highest power of 22 that is no more than nn. This number is 210=1,0242^{10}=1,024. Therefore, our binary number will have eleven digits, and right now it looks like (1a9a8a1a0)2.(1a_9a_8\ldots a_1a_0)_2.

To find the values of the remaining digits a9a_9, a8a_8, etc., let’s find the binary expansion of the remainder 1,5081,024=4841,508 - 1,024 = 484.

Here’s the recursion, by the way: to find the binary expansion of nn, we must find the binary expansion of a smaller number.

We repeat the process. The highest power of 22 that divides 484484 is 28=2562^8=256. So, the binary expansion of 484484 is the nine-digit number (1a7a6a1a0)2.(1a_7a_6\ldots a_1a_0)_2. Be aware that it is okay to reuse the names aia_i for the unknown digits, because the algorithm says they are the same for both numbers.

Next, we find the second remainder 484256=228.484-256=228. The highest power of 2 dividing 228 is 27=128.2^7=128. Continue the example and see for yourself that the binary expansion of 228228 is (11100100)2(11100100)_2. (Your remainders after 228228 will be 100100, 3636, 44, and 00.)

Now, the binary expansion of 484484 is (111100100)2(111100100)_2 and the binary expansion of 1,5081,508 is (1a9111100100)2.(1a_9111100100)_2.

The algorithm tells us what to do with our missing a9a_9: since it does not appear in the expansions of any of our remainders, it is 0. Therefore, 1,508=(10111100100)2.1,508 = (10111100100)_2.

This examples was made much more tedious than it needed to be so that the recursive steps were laid bare. In practice, you will simply continue subtracting off powers of 22 until none of the target number remains, writing a 11 if you subtracted a particular power of 22 and a 00 if you didn’t. Since we had to subtract 1,0241,024, 256256, 128128, 6464, 3232, and 44, we write a 11 for the eleventh, ninth, eighth, seventh, sixth, and third digits, and a 00 everywhere else.

You have no doubt noticed that our binary numbers are all enclosed in the notation ()2()_2. This is to signify that the (101)2(101)_2, for example, is referring to the number five and not the number one hundred one.

5.3 Octal and hexadecimal numbers

The word octal refers to the base-8 system and hexadecimal refers to the base-16 system. These two bases are useful abbreviations of binary numbers: octal numbers are used to modify permissions on Unix systems, and hexadecimal numbers are used to encode colors in graphic design.

We say “abbreviation” because of the following algorithm.

Algorithm 5.3.1 Suppose nn is an integer with a base bb representation and we desire to know its base bkb^k representation, for some integer k>1k > 1. Then we may group the digits of nn in base bb into groups of kk and rewrite each group as a digit in base bkb^k. The resulting string is the binary expansion of nn in base bkb^k.

The idea is that it is easy to go from some base to a base that is a power of the original base. Consider the following example.

Example. Write the number 236236 in octal and in hexadecimal.

The starting point will be to write 236236 in binary. Since 236=128+64+32+8+4,236=128+64+32+8+4, its binary expansion is (11101100)2(11101100)_2.

To write 236236 in octal, we will regroup the binary digits of 236236 by threes because 8=238=2^3. However, there are eight digits. No worries; we will simply add a meaningless 00 as the leftmost digit of the binary expansion. Those groups are: 011011, 101101, and 100100 from left to right. In order, those numbers are 33, 55, and 44; so, 236236 is (354)8(354)_8.

Observe that in a base bb number system, the digits are 0,1,2,,b10, 1, 2, \ldots, b-1. Therefore the highest digit we will see in octal is 77.

However – what about hexadecimal? Its set of digits is larger than our usual base-10 set. It is standard usage to use AA through FF to represent 1010 through 1515.

Decimal digit 1010 1111 1212 1313 1414 1515
Hexadecimal digit AA BB CC DD EE FF

Since 16=2416=2^4, to write 236236 in hexadecimal we will group the digits in its binary expansion by fours this time. Those groups are 11101110 and 11001100, which represent fourteen and twelve respectively. So, 236=(EC)16236=(EC)_{16}.

To summarize our example, here are four different representations of the number 236236.

System Decimal Binary Octal Hexadecimal
236236 (11101100)2(11101100)_2 (354)8(354)_8 (EC)16(EC)_{16}

Takeaways

Chapter 5 problems

6. Sequences and sums

Videos:

6.1 Sequences

Remember that a set has no order. For example, {1,2}\{1,2\} and {2,1}\{2,1\} are the same set. Often, we care about the order objects are in. Enter the sequence.

Definition 6.1.1 Let INI \subseteq \mathbf{N} and XX be some set. A sequence is a function a:IXa:I \to X. We write ana_n instead of a(n)a(n). Here, nn is the index and ana_n is the term. We denote the entire sequence as (an)(a_n) (when the start and endpoints are stated elsewhere or are unimportant), (an)n=k(a_n)_{n=k}^\infty (for an infinite sequence starting with aka_k), or (an)n=km(a_n)_{n=k}^m (for a finite sequence starting at aka_k and ending at ama_m). A finite sequence is called a word from the alphabet XX.

Once again we have chosen a more technical, functional definition where we instead could have said “a sequence is a totally ordered list of objects.” It is worth considering the way in which a function can “port structure” from one set to another. If we wanted a totally ordered set it is hard to imagine a better candidate than N\mathbf{N}. The sequence transfers the order structure from N\mathbf{N} into whatever set XX we like.

Example. Consider the sequence (an)n=1=(1,4,9,16,25,).(a_n)_{n=1}^\infty = (1, 4, 9, 16, 25, \ldots). Observing that each term is the square of its index, this sequence may be represented in closed form by writing an=n2a_n=n^2.

Not every sequence may be so easily identified in closed form.

Example. Consider the sequence (Fn)n=0=(0,1,1,2,3,5,8,13,21,34,).(F_n)_{n=0}^\infty = (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, \ldots). You may recognize this sequence as the famous Fibonacci sequence. There is a closed-form expression for FnF_n, but you are unlikely to see it by visual inspection. However, notice that each term in the sequence is equal to the sum of the preceding two terms: for instance, 13=8+513=8+5.

Therefore instead we can represent the Fibonacci sequence recursively with a recurrence relation: Fn={Fn1+Fn2,n>11,n=10,n=0 F_n = \begin{cases} F_{n-1} + F_{n-2}, & n > 1 \\ 1, & n = 1\\ 0, & n = 0 \end{cases}

Once a recurrence relation is established, the terms of a sequence may be calculated iteratively or recursively. Each method has its strengths and weaknesses depending on the context; typically, recursive calculations are more difficult but require storing less information.

Example. Consider the sequence (xn)(x_n) where xn={xn12xn2,n>21n=2,1n=1 x_n = \begin{cases} x_{n-1} - 2x_{n-2}, & n > 2 \\ 1 & n = 2, \\ -1 & n = 1 \end{cases}

To calculate x4x_4 iteratively, we simply calculate each term of (xn)(x_n) until we arrive at x4x_4. So: x1=1x_1=-1, and x2=1x_2 = 1. Then x3=x22x1=12(1)=3,x_3 = x_2 - 2x_1 = 1 - 2(-1) = 3, and x4=x32x1=32(1)=1.x_4 = x_3 - 2x_1 = 3 - 2(1) = 1. Thus, x4=1x_4 = 1.

In the recursive calculation, we will begin with x4x_4 and repeatedly apply the recurrence relation until we have only known values of the sequence: so, x2x_2 and x1x_1. First, x4=x32x2.x_4 = x_3 - 2x_2. However, we “don’t know” what x3x_3 is, so we repeat the recursion: x4=(x22x1)2x2.x_4 = (x_2 - 2x_1) - 2x_2. Now we substitute in the given values: x4=(12(1))2(1)=1.x_4 = (1 - 2(-1)) - 2(1) = 1. It is important that we arrive at the same value both ways!

6.2 Sums

Given a sequence, a natural course of action is to add its terms together. To avoid writing a bunch of plus signs, we write n=kman=ak+ak+1++am1+am \sum_{n=k}^m a_n = a_k + a_{k+1} + \cdots + a_{m-1} + a_m for the sum of the members of (an)(a_n), starting with the kk-th member and ending with the mm-th member.

Example. Consider the sum i=14i2. \sum_{i=1}^4 i^2. Notice that ii is being used for the index rather than nn. That’s okay! The sum’s value is i=14i2=12+22+32+42=30. \sum_{i=1}^4 i^2 = 1^2 + 2^2 + 3^2 + 4^2 = 30.

We can have sums of sums, and sums of sums of sums, etc. Double sums are common in computer science when evaluating the runtimes of doubly-nested for loops.

Example. Consider the sum m=13n=24(m+n). \sum_{m=1}^3 \sum_{n=2}^4 (m+n). This sum is rectangular in the sense that the bounds of one sum are not determined by the other, and so we could do this sum in either order: m=13[(m+2)+(m+3)+(m+4)] \sum_{m=1}^3 [(m+2) + (m+3) + (m+4)] or n=24[(1+n)+(2+n)+(3+n)].\sum_{n=2}^4 [(1+n) + (2+n) + (3+n)]. Either way, the sum’s value is 4545.

For another example, take i=14j=1i2i+j. \sum_{i=1}^4 \sum_{j=1}^i 2^{i+j}. Because the bounds of the jj-sum are dependent on the value of ii, we should iterate over ii first: j=112j+1+j=122j+2+j=132j+3+j=142j+4. \sum_{j=1}^1 2^{j+1} + \sum_{j=1}^2 2^{j+2} + \sum_{j=1}^3 2^{j+3} + \sum_{j=1}^4 2^{j+4}.

Each of these sums must be done separately. The first sum, of just one term, is equal to 44. The second sum is 2424, the third is 112112, and the fourth is 480480; so altogether the sum is 620620.

We will end on two theorems that are important for manipulating sums, the second of which will be very useful later.

Theorem 6.2.1 [Linear property of summation] Let (an)(a_n) and (bn)(b_n) be sequences and cc be a constant value. Then, n=km(an+bn)=n=kman+n=kmbn \sum_{n=k}^m (a_n + b_n) = \sum_{n=k}^m a_n + \sum_{n=k}^m b_n and n=kmcan=cn=kman.\sum_{n=k}^m ca_n = c\sum_{n=k}^m a_n.

Proof. The associative property of addition – (x+y)+z=x+(y+z)(x+y)+z = x+(y+z) – and the distributive property – c(x+y)=cx+cyc(x+y) = cx+cy – guarantee this theorem. \square

Example. Suppose we know that n=238an=14andn=238bn=19. \sum_{n=2}^{38} a_n = 14 \qquad \text{and} \qquad \sum_{n=2}^{38} b_n = 19. Then the value of the sum n=238(an+2bn) \sum_{n=2}^{38} (a_n+2b_n) is n=238an+2n=238bn=14+2(19)=52. \sum_{n=2}^{38} a_n + 2 \sum_{n=2}^{38} b_n = 14 + 2(19) = 52. Notice that both sums must start and end on the same index.

Theorem 6.2.2 [Recursive property of summation] Let (an)(a_n) be a sequence. Then n=kman=(n=km1an)+am=ak+(n=k+1man). \sum_{n=k}^m a_n = \left( \sum_{n=k}^{m-1} a_n \right) + a_m = a_k + \left( \sum_{n=k+1}^{m} a_n \right).

Proof. By definition, n=kman=ak+ak+1++am1+am. \sum_{n=k}^m a_n = a_k + a_{k+1} + \cdots + a_{m-1} + a_m. By the associative property of addition we may rewrite this as either (ak+ak+1++am1)+am=(n=km1an)+am(a_k + a_{k+1} + \cdots + a_{m-1}) + a_m = \left( \sum_{n=k}^{m-1} a_n \right) + a_m or ak+(ak+1++am1+am)=ak+(n=k+1man),a_k + (a_{k+1} + \cdots + a_{m-1} + a_m) = a_k + \left( \sum_{n=k+1}^{m} a_n \right), proving the theorem. \square

Example. Suppose it is known that b107=72b_{107} = 72 and that k=1106bk=291. \sum_{k=1}^{106} b_k = 291. Then, k=1107bk=363. \sum_{k=1}^{107} b_k = 363.

Takeaways

Chapter 6 problems

7. Asymptotics

Videos:

Consider the following example of two programs, AA and BB, that process some data. The following table provides the number of computations taken by each program to process nn data points.

nn Program AA Program BB
1 1 58
10 100 508
20 400 1008
50 2500 2508
60 3600 3008
100 10,000 5008

At first, Program AA looks much better than Program BB. However, as nn increases, it becomes clear that AA is worse at handling large quantities of data than BB. If (An)(A_n) and (Bn)(B_n) represent sequences of the numbers of steps taken by each program, then An=n2A_n=n^2 and Bn=50n+8B_n = 50n+8. The way we quantify the relationship between the two sequences is by saying AnA_n is “big-O” of BnB_n.

7.1 Growth bounds on sequences

Definition 7.1.1 Let ana_n and bnb_n be terms of sequences. We say ana_n is big-O of bnb_n if there exist a real number CC and an integer kk such that for all n>kn > k, anCbn. |a_n| \le Cb_n. We write an=O(bn)a_n = O(b_n), even if the equals sign is technically incorrect.

This is another technical definition that can be hard to parse, but the idea is that eventually (after the kk-th term of the sequence) ana_n does not grow faster than bnb_n. In this way, bnb_n serves as an upper bound on the growth of ana_n.

As we will see again in Chapter 18, there may be many upper bounds. For example in a classroom of twenty-nine students I may say “Here are fewer than thirty students” or “Here are fewer than a million students.” The first statement is obviously more informative. So often you will be asked to find the least upper bound on the growth of ana_n; this is usually a bnb_n that is the simplest and is itself big-O of all the other options.

Theorem 7.1.2 Any non-negative sequence is big-O of itself.

Proof. Let (an)(a_n) be a non-negative sequence; that is, an0a_n \ge 0 for all ana_n. Then the statement anan |a_n| \le a_n is satisfied by equality since an=an|a_n| = a_n. If we put C=k=1C=k=1, then this statement implies that an=O(an)a_n = O(a_n). \square

We can form a basic hierarchy of sequences and their big-O relationships. In the below table, a sequence ana_n appearing to the left of a sequence bnb_n means that an=O(bn)a_n=O(b_n).

Slowest Fastest
11 logn\log n nn nlognn \log n n2n^2 n3n^3 2n2^n n!n!

At the left-most end of the chart, the slowest-growing sequence is (an)=(1,1,1,)(a_n) = (1, 1, 1, \ldots). It is slowest-growing because it does not grow at all! In fact, any constant sequence is big-O of any other sequence.

Next is the sequence (logn)(\log n). In this book, logn\log n refers to the base-2 logarithm that you might have elsewhere seen written as log2n\log_2 n. Here are a couple of ways to conceptualize this sequence and its relevance to computer science.

The rest of the functions in the list should be familiar. Notice that all polynomial functions of higher degrees appear between n3n^3 and 2n2^n, and higher-order exponential functions like ene^n occur between 2n2^n and n!n!.

The graphs of n squared and 2 to the n are compared. After n=4, 2 to the n is always above n squared. So, n squared is big-O of 2 to the n.

Figure. Two graphs are compared. Since after a point the blue graph is always above the red graph, the function in red is big-O of the function in blue.

Now we know where basic functions fit into the big-O hierarchy, but what about more complicated expressions like 50n+850n+8? The following theorem will help.

Theorem 7.1.3 Let (an)(a_n) and (bn)(b_n) be sequences.

In plain English, this theorem says:

It is worthwhile to read the below proof of the theorem, but do not worry if you don’t understand it. Read it once, then skip it until after Chapter 8. You can work the problems at the end of the chapter even if you don’t totally understand the following proof.

Proof. To prove the addition rule (second bullet point), suppose an=O(cn)a_n = O(c_n) and bn=O(cn)b_n = O(c_n). That means there exist witnesses k1,k2,C1,k_1, k_2, C_1, and C2C_2 where anC1cn|a_n| \le C_1c_n for all n>k1n > k_1 and bnC2cn|b_n| \le C_2c_n for all n>k2n > k_2. Then, an+bnC1cn+C2cn=(C1+C2)cna_n + b_n \le C_1c_n + C_2c_n = (C_1+C_2)c_n for all n>max{k1,k2}n > \text{max}\{k_1,k_2\}. In other words, this inequality holds for all nn far enough along that both the big-O relationships hold. By definition, this means an+bna_n + b_n is big-O of cnc_n.

To prove the multiplication rule (third bullet point), suppose there exist sequences AnA_n and BnB_n such that an=O(An)a_n=O(A_n) and bn=O(Bn)b_n=O(B_n). Therefore there exist witnesses k1,k2,C1,k_1, k_2, C_1, and C2C_2 where anC1An|a_n| \le C_1A_n for all n>k1n > k_1 and bnC2Bn|b_n| \le C_2B_n for all n>k2n > k_2, so that for all n>max{k1,k2}n > \text{max}\{k_1, k_2\}, anbn(C1An)(C2Bn)=(C1C2)AnBn. |a_nb_n| \le (C_1A_n)(C_2B_n) = (C_1C_2)A_nB_n.

To prove the theorem about constant multiples (first bullet point), apply the multiplication rule where (bn)(b_n) is the constant sequence (c,c,c,)(c, c, c, \ldots). Observe that this sequence is big-O of the sequence (1)(1) by taking k=0k=0 and C=1/cC=1/c. So, since an=O(an)a_n = O(a_n) and c=O(1)c = O(1), can=O(1×an)=O(an)ca_n = O(1\times a_n) = O(a_n) as claimed. \square

Finally, we have enough tools to do some examples!

Examples. Consider the sequence whose terms are given by the formula an=n2+5n+7a_n = n^2 + 5n + 7. We would like to find the simplest function bnb_n of lowest order such that an=O(bn)a_n=O(b_n).

Notice that ana_n is a sum of three terms: n2n^2, 5n5n which is big-O of nn, and 77 which is big-O of 11. Only the highest-order of these matters; both the constant term and the linear term nn are big-O of n2n^2, but not the other way around. Therefore an=O(n2)a_n = O(n^2).

What is the simplest and lowest-order function bnb_n such that (1+5n+2n2)(logn+3nlogn)(1 + 5n + 2n^2)(\log n + 3n \log n) is big-O of bnb_n?

Theorem 7.1.3 says we only need the highest-order term of each factor, and then we must multiply those terms together. The highest-order term of the first factor is big-O of n2n^2 and the second factor is big-O of nlognn \log n; so, the whole sequence is big-O of n3lognn^3 \log n.

7.2 Other growth relationships

We will briefly mention two other growth relationships here. Recall that saying an=O(bn)a_n = O(b_n) is to say that bnb_n is an upper bound on the growth rate of ana_n. Then saying an=Ω(bn)a_n = \Omega(b_n) (big-Omega) is to say that bnb_n is a lower bound on the growth rate of ana_n, and $a_n = Θ(bn)\Theta(b_n) (big-Theta) is to say that ana_n and bnb_n grow in the same way.

Definition 7.2.1 Let (an)(a_n) and (bn)(b_n) be sequences. Then

Takeaways

Chapter 7 problems

8. Proof by induction

Review.

It is time to come full circle on our discussion of proofs and recursion to establish the recursive proof technique, proof by induction.

What are the two ingredients to knocking over a line of domino tiles? Each domino must be sufficiently close to the next, and you must tip over the first one.

Proof by induction proceeds in much the same way. We want to prove some statement P(n)P(n) holds for every natural number nn past a base value n0n_0. We prove the following two claims:

Once both of these statements are proven we will have P(n0)P(n_0), P(n0+1)P(n_0+1), P(n0+2)P(n_0+2), and so on; we will know P(n)P(n) to be true for all nn0n \ge n_0.

8.1 Inductive proofs involving sums

We will discuss three types of inductive proofs in this chapter. The first type involve proving facts about sums. Recall the recursive property of summation from Chapter 6, which will mostly look like this now: i=1k+1ai=(i=1kai)+ak+1. \sum_{i=1}^{k+1} a_i = \left( \sum_{i=1}^{k} a_i \right) + a_{k+1}.

Theorem 8.1.1 For all n1n \ge 1, i=1ni=n(n+1)2.\sum_{i=1}^n i = \frac{n(n+1)}{2}.

Proof. First, check the base case. Since i=11i=1=1(1+1)2, \sum_{i=1}^1 i = 1 = \frac{1(1+1)}{2}, the statement is true in the base case.

Next, suppose that for some integer k1k \ge 1 that i=1ki=k(k+1)2\sum_{i=1}^k i = \frac{k(k+1)}{2} (this is the inductive hypothesis). We claim that i=1k+1i=(k+1)(k+2)2.\sum_{i=1}^{k+1} i = \frac{(k+1)(k+2)}{2}. This statement is P(k+1)P(k+1)–notice that k+1+1k+1+1 is k+2k+2. We say up front that we “claim” it so that our reader knows where we are headed.

The preceding part of the proof is basically the same for all proofs by induction you’ll do. The next part is where the real work comes in, and it depends on what sort of thing you’re doing induction on. Since this is a sum, we will use the recursive property.

We may rewrite i=1k+1i \sum_{i=1}^{k+1} i as (i=1ki)+(k+1).\left( \sum_{i=1}^k i \right) + (k+1). Notice that the sum is known by the inductive hypothesis; so, this is k(k+1)2+(k+1). \frac{k(k+1)}{2} + (k+1). We put the fractions over common denominators: i=1k+1i=k(k+1)2+2(k+1)2. \sum_{i=1}^{k+1} i = \frac{k(k+1)}{2} + \frac{2(k+1)}{2}. Then, we regroup the fractions: i=1k+1i=(k+2)(k+1)2, \sum_{i=1}^{k+1} i = \frac{(k+2)(k+1)}{2}, as claimed.

Therefore by induction, i=1ni=n(n+1)2\sum_{i=1}^n i = \frac{n(n+1)}{2} for all n1n \ge 1. \square

If desired, you could expand out k(k+1)k(k+1) and 2(k+1)2(k+1) and add them together and check the result against the expansion of (k+1)(k+2)(k+1)(k+2). Instead, we noted that (k+2)(k+1)=k(k+1)+2(k+1)(k+2)(k+1)=k(k+1)+2(k+1). Factoring!

8.2 Inductive proofs involving divisibility

Remember that aa divides nn (or ana|n) means that there exists an integer kk such that n=akn=ak. Induction may be used to prove that every element of a sequence is divisible by some number.

Theorem 8.2.1 If n1n \ge 1, then 3(4n+5)3|(4^n+5).

Proof. Observe that when n=1n=1, 4n+5=94^n+5=9, which is divisible by 3. So, the statement is true in the base case.

Assume that for some integer k1k \ge 1 that 3(4k+5)3|(4^k+5). We mean to show that 3(4k+1+5)3|(4^{k+1}+5).

Observe that 4k+1+54^{k+1}+5 may be rewritten as 44k+5.4\cdot 4^k + 5. This is the same as 34k+4k+5.3\cdot 4^k + 4^k + 5. The assumption that 3(4k+5)3|(4^k+5) means there exists an integer mm such that 4k+5=3m4^k+5 = 3m. So, 34k+4k+5=34k+3m=3(4k+m),3\cdot 4^k + 4^k + 5 = 3 \cdot 4^k + 3m = 3(4^k+m), implying that 3(4k+1+5)3|(4^{k+1}+5) as desired.

Therefore, by induction, 3(4n+5)3|(4^n+5) for all n1n \ge 1. \square

The key calculation, that 4k+1=34k+4k4^{k+1} = 3\cdot 4^k + 4^k, may merit some discussion. Recall that 4k+1=44k4^{k+1} = 4 \cdot 4^k; this is the “addition rule” for exponents. Then we observe that 4=3+14=3+1 and apply the distributive property. In general, we may say that ak+1=(a1)ak+aka^{k+1} = (a-1)a^k + a^k for all real numbers aa; this fact will be helpful when you do these types of proofs on your own.

8.3 Inductive proofs involving inequality

Remember the definition of big-O involves the key line anCbn|a_n| \le Cb_n for sequences (an)(a_n) and (bn)(b_n) and a real number CC. So, computer scientists are often interested in justifying that an inequality is true past some given threshold.

Before we begin, we should discuss some tips for manipulating inequalities. They are much more flexible than equalities; this is both a positive and a negative in the sense that it is sometimes difficult to know the next step to take! So, outside of the proof we will “work backwards” to see if that illuminates the process; more on this in a moment.

Theorem 8.3.1 Suppose that aa, bb, cc, and dd are positive real numbers satisfying aba \le b and cdc \le d. Then a+cb+da + c \le b + d and acbd. ac \le bd.

To get an idea of what sort of “scratch work” is helpful or appropriate, take the following theorem.

Theorem 8.3.2 If n5n \ge 5, then 2n2+4n+13n22n^2 + 4n + 1 \le 3n^2.

Notice that this statement is a proof that 2n2+4n+12n^2 + 4n + 1 is big-O of n2n^2. (Do you see why?)

In a proof, it is not valid to imagine that the desired statement is true and work backwards. However, working these “backwards” steps can give us a crucial idea as to what the valid, forward-thinking proof should look like. Knowing how proof by induction works, we will assume that 2k2+4k+13k22k^2 + 4k + 1 \le 3k^2 and hope to claim that 2(k+1)2+4(k+1)+13(k+1)2. 2(k+1)^2 + 4(k+1) + 1 \le 3(k+1)^2. How to prove such a claim? Well, if we expand the desired statement, we see 2k2+4k+2+4k+4+13k2+6k+3. 2k^2 + 4k + 2 + 4k + 4 + 1 \le 3k^2 + 6k + 3. Using Theorem 8.3.1, we can cancel the terms of the inductive hypothesis to arrive at 4k+66k+3. 4k + 6 \le 6k + 3. This is nearly an obvious statement – because k5k \ge 5 we know 4k+64k+6 is at least 26 and 6k+36k+3 is at least 33, and the latter grows faster – but we can rewrite it as 32k, 3 \le 2k, which is clear for k5k \ge 5.

Having done this work now, we know exactly what do in our proof. Crucially, all of the above steps are reversible!

Proof. The statement is true in the base case because 2(5)2+4(5)+1=7175=3(5)2. 2(5)^2 + 4(5) + 1 = 71 \le 75 = 3(5)^2.

Assume that for some integer k5k \ge 5 that 2k2+4k+13k22k^2 + 4k + 1 \le 3k^2. We aim to show that 2(k+1)2+4(k+1)+13(k+1)2. 2(k+1)^2 + 4(k+1) + 1 \le 3(k+1)^2.

First, since k5k \ge 5, we know that 32k,3 \le 2k, and that this implies (by adding 4k+34k+3 to both sides) that 4k+66k+34k+6 \le 6k+3.

Having assumed 2k2+4k+13k22k^2 + 4k + 1 \le 3k^2, we may combine both inequalities: 2k2+4k+1+4k+33k2+6k+3,2k^2 + 4k + 1 + 4k + 3 \le 3k^2 + 6k + 3, which may be regrouped and rearranged into 2(k+1)2+4k+13(k+1)2,2(k+1)^2 + 4k + 1 \le 3(k+1)^2, as claimed.

Therefore, by induction 2n2+4n+13n22n^2 + 4n + 1 \le 3n^2 for all n5n \ge 5. \square

Proofs involving sums and divisors are relatively straightforward to do without scratch work, because there is only one way to manipulate an equality. There are many ways to manipulate inequalities, and so the scratch work can point you in the right direction.

Takeaways

Chapter 8 problems

Interlude

At this point in the book you have been exposed to the basics of formal mathematics: sets, logic, and proof. The motivated independent student could quit here and go off and read something else; you now know enough to read introductory books in various mathematical topics such as algebra, topology, and analysis (though calculus will help with the latter two).

The second half of this book will deal with two topics in discrete math: combinatorics, or methods to solve counting problems; and relations, a subset of graph theory that touches on the areas of functions, orders, and equivalences. Familiarity with all of the previous material, especially sets, logic, proofs, sums, and recursion will be necessary.

9. Introduction to combinatorics

Review.

The time has come to learn how to count. Counting problems that require more sophisticated methods than simple listing include knowing how a program’s computation time scales with more data or counting the number of objects or structures of a particular type. In this chapter we will familiarize ourselves with the basics of counting: enumeration, addition, and multiplication.

9.1 Enumeration

In blatant disregard for the previous paragraph, a powerful tool in combinatorics is enumeration: systematically listing the objects to be counted. If the target set is small and finite, enumeration is perfectly fine. When the target set is cumbersomely large, enumeration can still help give an idea of how the elements are generated so that formal rules can give their number without the need to fully list them.

Example. How many ways are there to shuffle a deck of 52 cards?

It will be impossible to fully enumerate the space of objects to be counted (we will discuss just how impossible once we have an answer). However, let’s pretend the deck has just four cards: AA, BB, CC, and DD.

Now it is possible to systematically list out all possible shuffles of the deck. Let’s arrange them into a table where each column corresponds to a different card on top of the deck. For example, we will list the two decks where ABAB are on top; then, we will list the two decks where ACAC is on top; then the two decks with ADAD on top. Here is the full list of decks:

AA on top BB on top CC on top DD on top
ABCDABCD BACDBACD CABDCABD DABCDABC
ABDCABDC BADCBADC CADBCADB DACBDACB
ACBDACBD BCADBCAD CBADCBAD DBACDBAC
ACDBACDB BCDABCDA CBDACBDA DBCADBCA
ADBCADBC BDACBDAC CDABCDAB DCABDCAB
ADCBADCB BDCABDCA CDBACDBA DCBADCBA

There are 2424 such shuffles.

Another enumeration technique is the “tree diagram.” The tree diagram for all possible shuffles of four cards is pictured below.

The 24 arrangements of abcd are visualized via a tree diagram.

Figure. The 24 arrangements of abcd are visualized via a tree diagram.

We have enough information here to figure out how many shuffles there are of a deck of 5252 cards. However, we need a theorem first.

9.2 Addition and multiplication principles

In combinatorics we count the number of outcomes that lead to an event coming to pass. In the previous example, “a deck of four cards is shuffled” is the event and the outcome BCDABCDA is an outcome. We may phrase this set-theoretically by saying an event is a set and that the outcomes are the elements of that set. So, BCDAA Deck Of Four Cards Is Shuffled. BCDA \in \text{A Deck Of Four Cards Is Shuffled}.

With this in mind we have already seen the addition and multiplication rules for counting!

Theorem 9.2.1 Let EE and FF be events. Then the event "EE or FF" has EF=E+FEF|E \cup F|=|E| + |F| - |E \cap F| outcomes and the event "EE and FF" and E×F=EF.|E \times F|=|E|\cdot |F|.

Example. A particular restaurant sells many varieties of burritos. Of the selections, seven burritos are spicy and six are vegetarian. Two vegetarian burritos are spicy. How many burritos are spicy or vegetarian?

Letting SS be the set of spicy burritos and VV be the set of vegetarian burritos, we are looking for SV=S+VSV, |S \cup V| = |S| + |V| - |S \cap V|, which is 7+627+6-2 or 1111.

Example. How many ways are there to shuffle a deck of 52 cards?

Again we will consider the example of the four card deck AA, BB, CC, and DD. Some card must go first; there are four such choices. Then a card must go second, for which there are three choices. Then two, then one.

To give an example of how set theory is relevant, we may decompose the set Deck Is Shuffled\text{Deck Is Shuffled} into a product of four sets: Deck Shuffled=First Card×Second Card×Third Card×Fourth Card, \text{Deck Shuffled} = \text{First Card} \times \text{Second Card} \times \text{Third Card} \times \text{Fourth Card}, where the content of (for instance) Third Card\text{Third Card} depends on the choices of the first and second cards.

By the multiplication rule, there are 43214 \cdot 3 \cdot 2 \cdot 1 ways to shuffle this deck. Fortunately for us this is 2424, which was the answer we got last time!

You may have noticed we have done this calculation before. The product 43214 \cdot 3 \cdot 2 \cdot 1 is exactly 4!4!. By similar reasoning, there are 52!52! ways to shuffle a deck of standard size: there are 5252 choices for the top card, then 5151 choices for the next card, then (etc.)

In fact the following theorem may be proven by induction.

Theorem 9.2.2 There are n!n! ways to order a set of nn elements. Such an ordering is called a permutation of the set.

The first line of the following proof finally gives some intuition as to why 0!=10!=1 is a good definition.

Proof. Observe that there is one ordering of the elements of \varnothing: the empty ordering {}\{ \}. Since 0!=10!=1, the statement is true in the base case.

Suppose there are k!k! permutations of a kk-element set. Consider then a set with k+1k+1 elements. We must choose an element to be first in the permutation; there are (k+1)(k+1) choices. After doing so we are left to permute the remaining kk elements; by the inductive hypothesis there are k!k! ways to do so. Using the multiplicative principle of counting there must be (k+1)k!=(k+1)! (k+1) k! = (k+1)! permutations of the (k+1)(k+1)-element set, so the theorem is true by induction. \square

To get an idea of how mind-blowingly large factorials can become, check out this article about how long it would take for 52!52! seconds to go by.

Takeaways

Chapter 9 problems

10. Ordered samples

Review.

Many of the things we are trying to count can be conceptualized as samples being taken from a sample space (think universal set). Whether that sample is ordered or unordered, and whether the elements may be repeated or replaced, affects the way the samples should be counted.

In this chapter we will discuss samples where the order of the elements is relevant. Examples include “words” (strings of characters that may have meaning attached, whether those characters be the usual letters or something else) and groups of people where each person has a different role (so switching the people in their roles would create a different configuration of the team).

Ordered samples with replacement are finite sequences, and ordered samples without replacement are permutations.

10.1 Finite sequences

Let nn and rr be natural numbers and Ω\Omega be a set of nn elements. Interchangeably we will call Ω\Omega the universe or sample space (in the context of this particular section Ω\Omega may also be called the alphabet).

A finite sequence in Ω\Omega is exactly what it sounds like: an ordered sample (ak)k=1r=(a1,a2,,ar) (a_k)_{k=1}^r = (a_1,a_2,\ldots, a_r) where each aka_k may be the same. We say that the elements may be “repeated” or, if they are concrete objects being drawn or selected, “replaced.” (Imagine a Scrabble tile being put back in the bag, or a card back in the deck.)

Given Ω\Omega, how many sequences of length rr are in Ω\Omega? Let’s consider the following example.

Example. Suppose a password of length 1010 is to be made of only lowercase letters. The alphabet, or universe, is the set {a,b,c,,z}\{a, b, c, \ldots, z\}. We must choose a letter for the first character of the password: there are 2626 total choices. Then we must choose the second character. Since usually passwords allow for character repetition, there are still 2626 choices; and for the third, and fourth, so on. Hence there are 26×26××26=2610 26 \times 26 \times \cdot \times 26 = 26^{10} total passwords. (See the problems at the end of the chapter for why you should vary the types of characters in your passwords.)

This example generalizes to the following theorem.

Theorem 10.1.1 Let n,rNn, r \in \mathbf{N}. There are nrn^r finite sequences of length rr in a set Ω\Omega with Ω=n|\Omega|=n.

The following construction will be useful in other counting contexts.

Definition 10.1.2 A bit string of length nn is a sequence of length nn in the set {0,1}\{0,1\}.

Be sure you understand why there are 2n2^n bit strings of length nn.

10.2 Permutations

Example. Suppose a team of three people, each with a different role, is to be made from seven volunteers. Are there 737^3 possible teams? No, because this formula assumes that one person could take all three roles. That would not be a “team of three people.” Usually when we are counting groups of people, we are not allowing for repetition.

In the last chapter, we defined a permutation of Ω\Omega to be an ordering of its elements, and discovered that there are n!n! such orderings if Ω\Omega has nn elements. In the above example, there are 7!7! ways to arrange the entire set of volunteers. That’s closer to what we want, but we still only need three of the volunteers.

Consider that there are 77 choices for the first role. Now that person may not be chosen again, and there are only 66 choices for the second role. Accordingly, there are only 55 choices for the third role. Altogether according to the multiplication principle, we have 7×6×5=210 7 \times 6 \times 5 = 210 options.

There is another way to come up with the number 7×6×57 \times 6 \times 5. Remember that there are 7!7! ways to arrange the whole set of people, but that we only care about the arrangement of three of them. In other words, there are four people whose arrangement we don’t care about at all. What if we divided 7!7! by the 4!4! ways to arrange those people? 7!4!=7×6×5=210 \frac{7!}{4!} = 7 \times 6 \times 5 = 210 That’s the same answer! How convenient…

Theorem 10.2.1 The number of rr-permutations, or ordered samples without replacement, of a set of nn elements is P(n,r)=n!(nr)!,P(n,r) = \frac{n!}{(n-r)!}, where “P(n,r)P(n,r)” is read "nn permute rr".

One hopes that if r=nr=n, we recover the number of ways to permute the entire set. Good news:

Theorem 10.2.2 P(n,n)=n!P(n,n)=n!.

Proof. P(n,n)=n!/(nn)!=n!/0!=n!P(n,n) = n!/(n-n)! = n!/0! = n!, because 0!=10!=1. \square

Takeaways

Chapter 10 problems

11. Unordered samples

Review.

In the last chapter, we counted the number of ordered samples with rr elements we could make from a universe Ω\Omega of nn elements. If the sample allows repetition or replacement of the elements, it is a finite sequence and there are nrn^r of them. If not, they are called rr-permutations and there are P(n,r)=n!/(nr)!P(n,r) = n!/(n-r)! of them.

This chapter will explore the very rich combinatorics that arise from failing to distinguish between orders.

11.1 Counting subsets

This section will answer two questions: how many unordered samples without repetition are there, and what are they? The title of this section answers the latter question, and this viewpoint is going to give us a lot of extra power when counting these structures.

Example. How many ways can the three elements be chosen in any order with no replacement from the universe {a,b,c,d,e}\{a,b,c,d,e\}?

We’ve already done a little work already. We can start by taking the ordered samples, the 33-permutations of the set, of which we already know there are P(5,3)=5!/2!=60P(5,3)=5!/2!=60. We won’t list all 6060 33-permutations, but we will make note of a few: abcacbbacbcacabcab abc \qquad acb \qquad bac \qquad bca \qquad cab \qquad cba All six of these ordered samples correspond to the single unordered sample abcabc. In fact, for any 33-element unordered sample there are 3!=63!=6 ways to order that sample, so in counting the ordered samples we have overcounted the unordered samples by a factor of 66. Thus, there are 5!3!2!=10 \frac{5!}{3!2!} = 10 unordered samples without repetition, and they are tabulated below. abcabdabeacdace abc \qquad abd \qquad abe \qquad acd \qquad ace adebcdbcebdecdeade \qquad bcd \qquad bce \qquad bde \qquad cde

So the program to count unordered samples without replacement is to count the ordered ones, and then divide by the number of possible orders.

What are we counting? Well, we are counting objects with no order and no repetition: that’s a set! The unordered samples without replacement from a universe are exactly the subsets of that universe.

Theorem 11.1.1 Let nn and rr be natural numbers where rnr \le n. A set with nn elements has (nr)=n!r!(nr)!{{n}\choose{r}} = \frac{n!}{r!(n-r)!} subsets of cardinality rr, where the symbol above is a binomial coefficient read "nn choose rr." When typeset inline, we may use C(n,r)C(n,r) instead for the same number.

Example. Enumerate the subsets of {a,b,c,d}\{a,b,c,d\}, separated by cardinality.

rr subsets C(n,r)C(n,r)
00 \varnothing 4!0!4!=1\frac{4!}{0!4!}=1
11 {a}\{a\}, {b}\{b\}, {c}\{c\}, {d}\{d\} 4!1!3!=4\frac{4!}{1!3!}=4
22 {a,b}\{a,b\}, {a,c}\{a,c\}, {a,d}\{a,d\}, {b,c}\{b,c\}, {b,d}\{b,d\}, {c,d}\{c,d\} 4!2!2!=6\frac{4!}{2!2!}=6
33 {a,b,c}\{a,b,c\}, {a,b,d}\{a,b,d\}, {a,c,d}\{a,c,d\}, {b,c,d}\{b,c,d\} 4!3!1!=4\frac{4!}{3!1!}=4
44 {a,b,c,d}\{a,b,c,d\} 4!4!0!=1\frac{4!}{4!0!}=1

There are a couple of worthwhile patterns to point out. First, notice the symmetry. We can see why C(4,3)C(4,3) and C(4,1)C(4,1) would be the same–the denominators are the same because multiplication is commutative. But combinatorially, in terms of what they are counting, why would they be the same? Observe that when we create a three-element subset of {a,b,c,d}\{a,b,c,d\}, we are implicitly choosing a one-element subset to be “left behind.” For instance, when we take the subset {a,c,d}\{a,c,d\} we are not taking {b}\{b\}. So, these subsets are in correspondence with each other.

Second, notice that 1+4+6+4+1=16=241+4+6+4+1=16=2^4, which is the total number of subsets of {a,b,c,d}\{a,b,c,d\}.

11.2 Binomial coefficients

The binomial coefficient (nr)=n!r!(nr)!{{n}\choose{r}} = \frac{n!}{r!(n-r)!} counts the number of subsets with rr elements from a universe Ω\Omega with nn elements. It will be useful to note that C(n,r)C(n,r) also counts the number of bit strings of length nn that have rr 1’s.

To see why, consider a bit string of length nn. If n=7n=7, the string can be written a1a2a3a4a5a6a7a_1a_2a_3a_4a_5a_6a_7. Consider also a set with 77 elements. We may number of the elements of that set so that the set is labeled {x1,x2,x3,x4,x5,x6,x7}\{x_1, x_2, x_3, x_4, x_5, x_6, x_7\}. Then a bit string SS with rr 1’s corresponds to a subset EE of rr elements by saying ai=1a_i=1 if and only if xiEx_i \in E. For example, the string 01110100111010 corresponds to the subset {x2,x3,x4,x6}\{x_2,x_3,x_4,x_6\}. So, there are exactly as many bit strings with a certain number of 1’s as there are subsets of a fixed size, and vice-versa.

Next, we will explore a recurrence relation on the binomial coefficients that will help us compute many of them quickly. This theorem (and the associated visual) are typically named after 17th century French mathematician Blaise Pascal. However, it was known much earlier to Islamic, Chinese, and Indian mathematicians, including Pingala, the Indian mathematician who used binomial coefficients to study poetic meter. Therefore we will not contribute to the western colonization of mathematics by naming these theorems after a man who did not invent them.

Theorem 11.2.1 Let nn and rr be natural numbers so that 1rn11 \le r \le n-1. Then (nr)=(n1r1)+(n1r). {{n}\choose{r}} = {{n-1}\choose{r-1}} + {{n-1}\choose{r}}.

We will prove this theorem with a combinatorial proof, where we prove that two numbers are equal by proving they each count the same set of objects.

Proof. As previously shown, there are C(n,r)C(n,r) bit strings of length nn with exactly rr 1’s. We claim that there are also C(n1,r1)+C(n1,r)C(n-1,r-1)+C(n-1,r) such strings.

Suppose we would like to make a bit string of length nn with exactly rr 1’s. We must make a choice for the first digit of the string.

If we choose a 1 to be the first digit of the string, then the remaining n1n-1 characters must have r1r-1 1’s; there are C(n1,r1)C(n-1,r-1) ways to make this choice.

Likewise, if we choose a 0 to be the first digit of the string, then the remaining n1n-1 characters must have rr 1’s. There are C(n1,r)C(n-1,r) ways to make this choice.

Our first digit must be 0 or 1 and cannot be both, so there are C(n1,r1)+C(n1,r)C(n-1,r-1)+C(n-1,r) total choices by the addition principle of counting. Therefore, C(n,r)=C(n1,r1)+C(n1,r)C(n,r)=C(n-1,r-1)+C(n-1,r). \square

The recurrence relation gives rise to a visual representation of the binomial coefficients that we will simply call the binomial triangle. We will arrange the binomial coefficients in a triangle so that each row corresponds to a value of nn starting at 00, and that within each row rr increases from 00 to nn as we read from left to right. Because C(n,0)=n!/(0!n!)=1C(n,0)=n!/(0!n!)=1 for all nn, the tip and sides of the triangle are all 11's. The recurrence relation says that any other entry in the triangle is the sum of the two that come above it. So the first seven rows of the binomial triangle are displayed below. Check carefully that you understand how to generate the triangle yourself.

11 1    11 \; \; 1 1    2    11 \; \; 2 \; \; 1 1    3    3    11 \; \; 3 \; \; 3 \; \; 1 1    4    6    4    11 \; \; 4 \; \; 6 \; \; 4 \; \; 1 1   5  10  10  5   1 1\, \; 5 \; 10 \; 10 \; 5 \; \, 1 1  6  15  20  15  6  1 1 \; 6 \; 15 \; 20 \; 15 \; 6 \; 1

Our final theorem in this section will explain why binomial coefficients are named as they are. First, consider this example.

Example. Calculate the squared binomial (x+y)2(x+y)^2: (x+y)2=x2+2xy+y2. (x+y)^2 = x^2 + 2xy + y^2. “Why?” You exclaim. “We already needed to know how to do this back in Chapter 4.” Okay, then, what about the cubed binomial (x+y)3(x+y)^3? Likely, you will need to use the distributive property and do this one longhand. (x+y)3=(x+y)(x+y)(x+y)(x+y)^3 = (x+y)(x+y)(x+y) Each term will be a collection of three variables, some of which are xx and some of which are yy. (x+y)3=xxx+xxy+xyx+yxx+xyy+yxy+yyx+yyy(x+y)^3 = xxx + xxy + xyx + yxx + xyy + yxy + yyx + yyy Combine like terms: (x+y)3=x3+3x2y+3xy2+y3(x+y)^3 = x^3 + 3x^2y + 3xy^2 + y^3

You may have noticed the punchline already. (x+y)1=1x+1y (x+y)^1 = \mathbf{1}x + \mathbf{1}y (x+y)2=1x2+2xy+1y2 (x+y)^2 = \mathbf{1}x^2 + \mathbf{2}xy + \mathbf{1}y^2 (x+y)3=1x3+3x2y+3xy2+1y3 (x+y)^3 = \mathbf{1}x^3 + \mathbf{3}x^2y + \mathbf{3}xy^2 + \mathbf{1}y^3

The coefficients of the powers of (x+y)n(x+y)^n are exactly the binomial coefficients C(n,r)C(n,r). How could this possibly be true? Well, consider what happened when we expanded (x+y)3(x+y)^3. For each factor, we chose either an xx or a yy. We can liken that to having to choose a 1 or a 0 for each digit in a bit string, and so the correspondence follows.

Theorem 11.2.2: The binomial theorem Let nn and rr be natural numbers with rnr \le n. Then (x+y)n=r=0n(nr)xnryr. (x+y)^n = \sum_{r=0}^n {{n}\choose{r}} x^{n-r} y^r.

Proof. The function (x+y)n(x+y)^n may be expanded: (x+y)n=r=0narxnryr, (x+y)^n = \sum_{r=0}^n a_r x^{n-r} y^r, for some coefficients ara_r. We claim that ar=C(n,r)a_r=C(n,r).

A term of (x+y)n(x+y)^n is the product of nn variables, rr of which are yy's and nrn-r of which are xx's. Before the xx's and yy's are combined with exponential notation we can associate a bit string of length nn with exactly rr 1’s to each term. For instance, if n=8n=8 and r=3r=3, the term xxxyyxyxxxxyyxyx can be identified with the string 0001101000011010. Then when we rewrite the term as x5y3x^5y^3, the coefficient of the term will be the number of such strings, which is exactly C(n,r)C(n,r). \square

Example. Let’s expand and simplify (a3)4(a-3)^4. Using the binomial theorem, we have (a3)4=a4+4a3(3)+6a2(3)2+4a(3)3+(3)4. (a-3)^4 = a^4 + 4a^3(-3) + 6a^2(-3)^2 + 4a(-3)^3 + (-3)^4. Then, (a3)=a412a3+54a2108a+81.(a-3) = a^4 - 12a^3 + 54a^2 - 108a + 81. Look how much easier that is than multiplying (a3)(a3)(a3)(a3)(a-3)(a-3)(a-3)(a-3) would have been!

Example. We hinted back in Section 11.1 that the sum of all the binomial coefficients for fixed nn is 2n2^n. You will have the opportunity to give a combinatorial proof of this fact in the problems. However, we can also use the binomial theorem to prove this and many other interesting facts about binomial coefficients via “proof by plugging in the right numbers”. We claim r=0n(nr)=2n. \sum_{r=0}^n {{n}\choose{r}} = 2^n. Observe that the left-hand side of the equation would be r=0n(nr)xnryr\sum_{r=0}^n {{n}\choose{r}} x^{n-r} y^r where x=y=1x=y=1. So, using the Binomial Thoerem, we have r=0n(nr)1nr1r=(1+1)n, \sum_{r=0}^n {{n}\choose{r}} 1^{n-r} 1^r = (1+1)^n, which implies r=0n(nr)=2n. \sum_{r=0}^n {{n}\choose{r}} = 2^n.

11.3 Multisets

So far we have solved the following counting problems.

sample type of object number of objects
ordered with replacement finite sequences nrn^r
ordered without replacement rr-permutations n!(nr)!\dfrac{n!}{(n-r)!}
unordered without replacement subsets n!r!(nr)!\dfrac{n!}{r!(n-r)!}

We will close the chapter by answering the natural fourth question: how many unordered samples with replacement of size rr may be taken from a universe of nn elements?

Example. You intend to spend three days visiting your friend. They have five things they want to show you. It’s not really important which order you do the activities in or what you do each day, just how many things are done each day. In other words we want to know the distribution of the activities. How many such distributions are there, or how many ways are there to choose how many activities are done each day?

Let’s consider one such allocation of activities. Maybe you’ll do one activity on the first day, three on the second, and one on the third. The visit with your friend can be diagrammed like so: * | *** | * Notice that there are two dividers between the activities. You may have heard the expression “three days two nights,” or something similar, to describe a trip away.

This diagram may be directly ported to a bit string: 0100010, 0100010, where the 11's represent the dividers and the 00's represent the activities. There must be 55 0’s and 22 1’s; altogether there are 2+5=72+5=7 characters in the bit strings. We already know there are C(7,5)=21C(7,5)=21 such bit strings. So there are 2121 distributions of activities over the trip.

More generally, we want to sample from three days five times. In this example, n=3n=3 and r=5r=5. (It is okay if r>nr > n when we allow repetition!) If {1,2,3}\{1,2,3\} is the universe of days on the trip, our sample is the multiset {1,2,2,2,3}\{1,2,2,2,3\}.

Definition 11.3.1 Let Ω\Omega be a set with nn elements. A multiset α\alpha (“alpha”) of cardinality rr is a function α:ΩN\alpha:\Omega \to \mathbf{N} that assigns to each element xΩx \in \Omega a natural number, α(x)\alpha(x), called the multiplicity of xx, subject to the restriction that α(x)=r\sum \alpha(x) = r.

Example. Consider the universe {1,2,3}\{1,2,3\} and the multiset {1,2,2,2,3}\{1,2,2,2,3\}. Then the multiplicity of 11 is α(1)=1\alpha(1)=1, because 11 appears once in the multiset. Likewise α(2)=3\alpha(2)=3 and α(3)=1\alpha(3)=1. Observe that α(1)+α(2)+α(3)=5.\alpha(1) + \alpha(2) + \alpha(3) = 5. Finally, we may instead choose to write the multiset as {11,23,31}\{1^1, 2^3, 3^1\}.

As another example, take the multiset {1,1,1,3,3}\{1,1,1,3,3\}. For this multiset, α(1)=3\alpha(1)=3, α(2)=0\alpha(2)=0, and α(3)=2\alpha(3)=2.

Theorem 11.3.2 Let nn and rr be natural numbers. There are (n1+rr)=(n1+r)!r!(n1)! {{n-1+r}\choose{r}} = \frac{(n-1+r)!}{r!(n-1)!} multisets of cardinality rr from a universe of nn elements.

sample type of object number of objects
ordered with replacement finite sequences nrn^r
ordered without replacement rr-permutations n!(nr)!\dfrac{n!}{(n-r)!}
unordered without replacement subsets n!r!(nr)!\dfrac{n!}{r!(n-r)!}
unordered with replacement multisets (n1+r)!r!(n1)!\dfrac{(n-1+r)!}{r!(n-1)!}

Takeaways

Chapter 11 problems

12. Multinomial coefficients

In the preceding chapters we discussed binary samples in the sense that an element of our universe Ω\Omega is either in the sample or isn’t. In this chapter, we are going to count the ways to arrange the elements of Ω\Omega into one of kk states, where kk is a positive integer.

12.1 Multinomials

We will answer the following question. Let [k]={1,2,,k}[k]=\{1, 2, \ldots, k\} for a positive integer kk. Given a multiset α\alpha of cardinality nn from [k][k], how many ways are there to sort the elements of Ω\Omega according to that multiset? Recall that a multiset α\alpha associates to each i[k]i \in [k] a number α(i)\alpha(i), called its multiplicity. We will write α(i)=ri\alpha(i)=r_i to be consistent with the notation from the preceding chapter. We would like to know how many ways there are to choose r1r_1 elements of Ω\Omega to be of the first type, r2r_2 elements to be of the second type, etc., rkr_k elements of the kk-th type, so that ri=n\sum r_i = n. Furthermore we do not care about the ordering within each type.

We will relate this idea to the example from the last section of the previous chapter.

Example. You are visiting your friend for three days, and they have five things they want to do while you’re in town. Recall you counted there were 2121 ways (multisets) to distribute the activities across the days. You and your friend decided to do one activity the day you get into town, then two the next day, and two the last day. In other words your chosen multiset is {11,22,32}\{1^1, 2^2, 3^2\}.

Now that you have chosen that distribution, how many ways can you choose which activity is done on each day? There are two equivalent ways to answer this question; one will lead us to the other. Let’s first pick the first day’s activity. There are five activities and we’d like to choose one; so there are C(5,1)C(5,1) options. That leaves four activities from which to choose two; so, C(4,2)C(4,2) ways to pick the second day’s activites. Finally there are C(2,2)C(2,2) ways to pick the last day’s activities. Applying the multiplication principle there are (51)(42)(22) {{5}\choose{1}}{{4}\choose{2}}{{2}\choose{2}} ways to choose which activity is done on each day. Applying the formulas for the binomial coefficients, that’s 5!1!4!4!2!2!2!2!0! \frac{5!}{1!4!}\frac{4!}{2!2!}\frac{2!}{2!0!} ways. Observe that we can cancel two of the numerators, and that 0!=10!=1: 5!1!2!2!=30. \frac{5!}{1!2!2!} = 30.

This is the second way to solve the problem. We may imagine the five activities are a small deck of cards, and shuffle the deck. The top card gives the activity to do on the first day. The second two cards give the second day’s activities. However, we have double-counted by 2!2! since there are 2!2! ways to arrange those cards, and we don’t care which is first. Likewise with the second two cards.

Theorem 12.1.1 Take a multiset from [k][k] where rir_i is the multiplicity of i[k]i \in [k], and the sum of the rir_i is nn. Consider a function f:Ω[k]f:\Omega \to [k] such that the number of xΩx \in \Omega such that f(x)=if(x) = i is rir_i. The number of such functions is given by the multinomial coefficient (nr1,r2,,rk)=n!r1!r2!rk! {{n}\choose{r_1, r_2, \ldots, r_k}} = \frac{n!}{r_1!r_2!\cdots r_k!}

Example. How many distinguishable arrangements are there of the word ARRANGEMENT?

“Distinguishable” means that we can tell the difference in the arrangements. For example, there are two R’s in the word. If we call them R1 and R2, there is no difference between A(R1)(R2)ANGEMENT and A(R2)(R1)ANGEMENT.

There are 77 different letters in the word ARRANGEMENT. If we number the 1111 characters in the word, then we are trying to count the number of functions f:{1,2,3,,11}{A,R,N,G,E,M,T} f: \{1, 2, 3, \ldots, 11\} \to \{A, R, N, G, E, M, T\} such that rxr_x characters are the letter xx, rA=rR=rE=rN=2, r_A = r_R = r_E = r_N = 2, and rG=rM=rT=1. r_G = r_M = r_T = 1. By the preceding theorem there are (112,2,2,1,2,1,1)=11!2!2!2!1!2!1!1!=2,494,800 {{11}\choose{2,2,2,1,2,1,1}} = \frac{11!}{2!2!2!1!2!1!1!} = 2,494,800 total distinguishable arrangements of the word.

That explanation was perhaps a bit painstaking to make clear the viewpoint of counting functions based on a multiset, but there is a cleaner path to the answer. There are 11!11! ways to arrange all the letters; however, we have double-counted by the 2!2! arrangements of the A’s, R’s, N’s, and E’s, and so we divide by 2!2! four times.

12.2 Multinomial theorems

Observe that when k=2k=2, a multinomial coefficient is exactly a binomial coefficient with r1=rr_1=r and r2=nrr_2=n-r. When k=3k=3 we call the numbers trinomial coefficients and thereafter typically say “multinomial coefficients”.

Accordingly, theorems about binomial coefficients generalize to multinomial coefficients. For example, the trinomial coefficients exhibit a similar recurrence relation.

Theorem 12.2.1 Let nn be a positive integer and r1,r2,r3r_1, r_2, r_3 be positive integers summing to nn. Then, (nr1,r2,r3)=(n1r11,r2,r3)+(n1r1,r21,r3)+(n1r1,r2,r31).{{n}\choose{r_1, r_2, r_3}} = {{n-1}\choose{r_1-1,r_2,r_3}} + {{n-1}\choose{r_1,r_2-1,r_3}} + {{n-1}\choose{r_1,r_2,r_3-1}}.

There is, accordingly, a multinomial theorem:

Theorem 12.2.2 Let nn and kk be positive integers. Then (x1+x2++xk)n=(nr1,r2,,rk)x1r1x2r2xkrk. (x_1 + x_2 + \cdots + x_k)^n = \sum {{n}\choose{r_1, r_2, \ldots, r_k}} x_1^{r_1} x_2^{r_2} \cdots x_k^{r_k}.

This theorem tells us that when we expand the polynomial (x1+x2++xk)n(x_1 + x_2 + \cdots + x_k)^n,

Example. There are C(15,10)=3,003C(15,10) = 3,003 terms of the polynomial (a+b+c+d+e+f)10(a+b+c+d+e+f)^{10}. Obviously we will not list them all. The a5bc2efa^5bc^2ef term has the coefficient (105,1,2,0,1,1)=10!5!1!2!0!1!1!=15,120. {{10}\choose{5,1,2,0,1,1}} = \frac{10!}{5!1!2!0!1!1!} = 15,120.

Takeaways

Chapter 12 problems

13. Probability

Review.

The time has come to end our short tour through combinatorics with a conversation about probability, the study of random phenomena. There is a lot more to say about both topics, but that’s for other books to do.

13.1 Introduction to probability

Speaking very loosely, an event’s probability is the “chance” it will occur. There are two kinds of probabilities: empirical probability, the proportion of times an event occurred over a number of trials; and theoretical probability, the proportion of times an event should occur over the long run. We assign theoretical probabilities according to mathematical principles so that the long-run empirical probability tends towards the theoretical probability. It will not surprise you to learn, then, that probability is a function of sets.

Definition 13.1.1 Let EE be an event contained in the universe Ω\Omega (in probability-theoretic contexts the universe is called the sample space). If every outcome of EE is equally likely, then the probability of EE is P(E)=EΩ.P(E) = \frac{|E|}{|\Omega|}.

To brush some interesting technical details away, we are assuming that EE and Ω\Omega are finite sets and doing discrete probability with something called the “counting measure.” There are other ways to measure sets, and those measures have associated probabilities as well; but this is beyond our scope.

Example. Two dice are rolled. They are distinguishable; you may suppose one is red and the other is blue, or that the person rolling kept track of which was rolled first. The sample space of 3636 possible sums of the two dice is tabulated below.

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Let EE be the event that the sum is prime and let FF be event that the first roll was even. (You may choose the row or column to be the first roll; it won’t matter in this case.) Then, P(E)=1536P(E) = \frac{15}{36} and P(F)=1836.P(F) = \frac{18}{36}. Check these counts yourself.

Sometimes it is desirable to know how one event affects another. For example: does the first roll being even make it more or less likely that the sum will be prime? Or is there no effect? Conditional probability answers this question for us.

Definition 13.1.2 Let EE and FF be events. The conditional probability of EE given that FF is known to occur is P(EF)=EFF. P(E|F) = \frac{|E \cap F|}{|F|}.

A very good interpretation of the above definition is that the original sample space is replaced with or “localized” to be FF.

Example. Suppose the rectangle below is a dartboard. We have two players: a novice who throws the dart perfectly randomly, or someone with a little more experience who will always land the dart within quadrilateral FF. Who is more likely to land in the target EE? The more experienced player will land in EE more often on average. In probabilistic notation, P(EF)>P(E)P(E|F) > P(E).

The set E is visualized as a small circle in the middle of a large rectangle, enclosed in a smaller quadrilateral F.

The set E is visualized as a small circle in the middle of a large rectangle, enclosed in a smaller quadrilateral F.

Example. Revisiting our dice question, does the first roll being even make it more or less likely the sum will be prime?

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

There are 1818 outcomes where the first roll is even. Counting, we see 77 outcomes where the sum is prime and the first roll is even. (Check this.) By definition, P(EF)=718=1436, P(E|F) = \frac{7}{18} = \frac{14}{36}, which is less than P(E)=15/36.P(E) = 15/36. So, prime sums are less likely when the first roll is even (so, more likely when the first roll is odd).

13.2 Algebraic rules

The probability that at least one of EE or FF occurs is denoted P(EF)P(E \cup F), and the probability that both events occur is denoted P(EF)P(E \cap F).

Theorem 13.2.1 If EE and FF are events, then P(EF)=P(E)P(FE)=P(F)P(EF)P(E \cap F) = P(E)P(F|E) = P(F)P(E|F) and P(EF)=P(E)+P(F)P(EF).P(E \cup F) = P(E) + P(F) - P(E \cap F).

Proof. Fundamental counting principles, together with the observation that one of EE or FF has an effect on the other. \square

Definition 13.2.2 If P(EF)=0P(E \cap F) = 0, the events EE and FF are disjoint.

Example. The events “the first roll is odd” and “the first roll is even” are disjoint since they cannot both happen at the same time.

Theorem 13.2.3 The following are equivalent.

Example. Any event involving just the first die and any event involving just the second die will be independent.

Example. Again with the dice:

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

The probability that a sum is prime and a first roll is even is, by counting, P(EF)=736.P(E \cap F) = \frac{7}{36}. We may also use the multiplication rule to calculate this probability: P(EF)=P(F)P(EF)=1836718=736. P(E \cap F) = P(F)P(E|F) = \frac{18}{36}\frac{7}{18} = \frac{7}{36}.

The probability that a sum is prime or a first roll is even is, again by counting first, P(EF)=2636.P(E \cup F) = \frac{26}{36}. We may also use the addition rule: P(EF)=P(E)+P(F)P(EF)=1536+1836736=2636. P(E \cup F) = P(E) + P(F) - P(E \cap F) = \frac{15}{36} + \frac{18}{36} - \frac{7}{36} = \frac{26}{36}.

Takeaways

Chapter 13 problems

14. Boolean matrices

Review.

Relations allow us to mathematically encode various relatonships between sets. There are many such relationships we may be interested in: graphs, functions, equivalences, and orderings. But before we tell that story, we have to tell this one: the story of Boolean matrices.

14.1 Matrices and Boolean algebras

Did you notice at the beginning of the book that logic and set theory were incredibly similar to one another? They are indeed different implementations of the same idea, Boolean algebra. As you read the following technical details, just keep in mind that a Boolean algebra is formalizing the relationships between logical connectives and set operations.

Definition 14.1.1 A bit is an element of the set {0,1}\{0,1\} together with commutative binary operations join \vee and meet \wedge and unary operation complement ¬\neg such that ab={0a=b=01otherwise, a \vee b = \begin{cases} 0 & a=b=0 \\ 1 & \text{otherwise} \end{cases}, ab={1a=b=10otherwise, a \wedge b = \begin{cases} 1 & a=b=1 \\ 0 & \text{otherwise} \end{cases}, and ¬a={1a=00a=1. \neg a = \begin{cases} 1 & a=0 \\ 0 & a=1 \end{cases}.

Definition 14.1.2 A Boolean algebra is a set A\mathcal{A} together with commutative binary operations \vee and \wedge such that a0=a,a1=a, a \vee 0 = a, \qquad a \wedge 1 = a, a(bc)=(ab)(ac),a \vee (b \wedge c) = (a \vee b) \wedge (a \vee c), and a(bc)=(ab)(ac), a \wedge (b \vee c) = (a \wedge b) \vee (a \wedge c), and a unary operation ¬\neg such that a¬a=1,a¬a=0. a \vee \neg a = 1, \qquad a \wedge \neg a = 0.

Examples. The set {0,1}\{0,1\} is the smallest non-degenerate Boolean algebra. Here are two other Boolean algebras. You will be asked to verify that these are Boolean algebras in the exercises.

Loosely speaking, a matrix is a rectangular array of data. This data can be anything with an algebraic structure, including bits and numbers. While matrices serve as a convenient way to store information, they also have applicatons in solving systems of linear equations and in encoding relations and functions. It is this last usage that we are the most interested. Before you read the definition below remember [m]={1,2,,m}[m]=\{1, 2, \ldots, m\} when mm is a positive integer.

Definition 14.1.3 An m×nm \times n matrix is a function A:[m]×[n]XA:[m]\times[n] \to X where XX is a set with certain algebraic structure (for us, bits or numbers) such that the entry A(i,j)A(i,j) in the ii-th row and jj-th column is written Ai,jA_{i,j}. To specify the elements that belong to the matrix we sometimes write A=(Ai,j)A=(A_{i,j}). We say AA has mm rows and nn columns and can represent the matrix visually as A=(A1,1A1,2A1,nA2,1A2,2A2,nAm,1Am,2Am,n). A = \left( \begin{array}{cccc} A_{1,1} & A_{1,2} & \cdots & A_{1,n} \\ A_{2,1} & A_{2,2} & \cdots & A_{2,n} \\ & & \ddots & \\ A_{m,1} & A_{m,2} & \cdots & A_{m,n} \end{array} \right).

More than you wanted to know: The desired properties of the set XX are the presence of an operation playing the role of addition and an operation playing the role of multiplication, such that subtraction is possible and otherwise these operations satisfy all the usual properties that addition and multiplication do. Such a structure is called a ring. The integers Z\mathbf{Z} are the classic example as the smallest ring containing all the elements of N\mathbf{N}, and the Boolean algebra {0,1}\{0,1\} is another. If all the non-zero elements of the ring can be inverted – i.e., you can divide as well – the ring is called a field and then matrix division may be possible as well.

Definition 14.1.4 A Boolean matrix is a matrix whose entries are bits.

Examples. Consider the Boolean matrix A=(101011). A = \left( \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right). This is a matrix with 22 rows and 33 columns, so we would say it is a 2×32 \times 3 matrix. The entry A1,3=1A_{1,3}=1, the entry A2,1=0A_{2,1}=0, and the entry A3,1A_{3,1} does not exist for this matrix.

What follows are some worthwhile definitions for studying matrices and relations.

Definition 14.1.5 An m×nm \times n matrix is square if m=nm=n.

Definition 14.1.6 Let AA be a square n×nn \times n matrix. The entries Ai,jA_{i,j} where i=ji=j are called diagonal entries and entries that are not diagonal are off-diagonal entries.

Definition 14.1.7 A square matrix whose only nonzero entries are diagonal is a diagonal matrix.

Definition 14.1.8 The identity matrix InI_n of dimension nn is the n×nn\times n diagonal matrix whose diagonal entries are all 11.

Examples. The matrix A=(101011100) A = \left( \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 0\end{array} \right) is square but not diagonal, the matrix B=(100010000) B =\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0\end{array} \right) is diagonal but not I3I_3, and I3=(100010001).I_3=\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right).

Definition 14.1.9 Let AA and BB both be m×nm \times n Boolean matrices. We say ABA \le B if ai,jbi,ja_{i,j} \le b_{i,j} (as numbers) for all (i,j)[m]×[n](i,j) \in [m]\times[n]. Equivalently, we can say that ABA \le B if BB never has a 00 in a position where AA has a 11.

Example. For instance, (100110)(101111) \left( \begin{array}{cc} 1 & 0\\ 0 & 1 \\ 1 & 0\end{array} \right) \le \left( \begin{array}{cc} 1 & 0\\ 1 & 1 \\ 1 & 1\end{array} \right) but (100110)≰(010111).\left( \begin{array}{cc} 1 & 0\\ 0 & 1 \\ 1 & 0\end{array} \right) \not\le \left( \begin{array}{cc} 0 & 1\\ 0 & 1 \\ 1 & 1\end{array} \right).

14.2 Basic matrix operations

In this section we will explore basic operations involving matrices: meet, join, complement, and transposition. Multiplication requires enough work to necessitate its own section.

Definition 14.2.1 Let AA and BB be m×nm\times n matrices. Their meet (or join) is the m×nm\times n matrix AB=(Ai,jBi,j)A \wedge B = (A_{i,j} \wedge B_{i,j}) (or AB=(Ai,jBi,j)A \vee B=(A_{i,j} \vee B_{i,j})) whose i,ji,j-th entry is the meet (join) of the i,ji,j-th entries of AA and BB.

Examples. (010101)(001011)=(000001) \left (\begin{array}{ccc} 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right) \wedge \left (\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right) = \left (\begin{array}{ccc} 0 & 0 & 0 \\ 0& 0 & 1 \end{array} \right)

(010101)(001011)=(011111) \left (\begin{array}{ccc} 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right) \vee \left (\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right) = \left (\begin{array}{ccc} 0 & 1 & 1 \\ 1 & 1 & 1 \end{array} \right)

Definition 14.2.2 Let A=[Ai,j]A=[A_{i,j}] be a m×nm\times n Boolean matrix. Its complement is the m×nm\times n Boolean matrix ¬A=(¬Ai,j)\neg A = (\neg A_{i,j}).

Example.

¬(001011)=(110100) \neg \left (\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right) = \left (\begin{array}{ccc} 1 & 1 & 0 \\ 1 & 0 & 0 \end{array} \right)

The last operation we need to look at in this section is transposition, where a matrix’s rows and columns are interchanged.

Definition 14.2.3 Let A=(Ai,j)A=(A_{i,j}) be a m×nm\times n Boolean matrix. Its transpose is the n×mn \times m Boolean matrix AT=(Aj,i)A^T = (A_{j,i}).

Example.
(111001)T=(110101) \left (\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \right)^T = \left (\begin{array}{ccc} 1&1&0 \\ 1&0&1 \end{array} \right)

14.3 Multiplication of Boolean matrices

Multiplication of matrices in general, and Boolean matrices in particular, is not as straightforward as you would hope at first. As we will see in the next chapter, each Boolean matrix represents a relation, and multiplication of matrices should correspond to a combination of relations.

Real matrices, i.e. matrices with values in R\mathbf{R}, correspond to linear transformations, functions that rotate, stretch, or shear space. Multiplication of real matrices corresponds to composition of these functions.

Definition 14.3.1 Let A=(A1A2An)A = (\begin{array}{cccc}A_1 & A_2 & \ldots & A_n\end{array}) be a 1×n1\times n Boolean row matrix and B=(B1B2Bn)TB = (\begin{array}{cccc}B_1 & B_2 & \ldots & B_n\end{array})^T be a n×1n\times 1 Boolean column matrix. Their Boolean dot product is the bit AB=(A1B1)(A2B2)(AnBn). A \odot B = (A_1 \wedge B_1) \vee (A_2 \wedge B_2) \vee \cdots \vee (A_n \wedge B_n). We may also write this in “big vee” notation: AB=r=1n(AiBi). A \odot B = \bigvee_{r=1}^n (A_i \wedge B_i).

Example. Let A=(0101) A=\left(\begin{array}{cccc} 0 & 1 & 0 & 1\end{array}\right) and B=(1100).B=\left( \begin{array}{c} 1 \\ 1 \\ 0 \\ 0 \end{array} \right). Then,
AB=(01)(11)(00)(10).A \odot B = (0 \wedge 1) \vee (1 \wedge 1) \vee (0 \wedge 0) \vee (1 \wedge 0). Evaluating each of the conjunctions, we have AB=0100.A \odot B = 0 \vee 1 \vee 0 \vee 0. Since ABA \odot B is a join with at least one 11, AB=1.A \odot B = 1. In fact, ABA\odot B is only 11 because there is an entry (the second one) where both AiA_i and BiB_i are 1. If there are no such entries, the Boolean dot product will always be 0. This theorem is highly useful in quickly calculating Boolean products.

Theorem 14.3.2 Let A=(Ai)A=(A_i) be a 1×n1\times n Boolean matrix and B=(Bi)B=(B_i) be a n×1n\times 1 Boolean matrix. Their Boolean dot product ABA \odot B is 11 if and only if there exists an integer kk such that AkA_k and BkB_k are both 11.

Reflect for a moment that this definition does not drop out of thin area, as most don’t. Suppose we have predicates A(i)A(i) and B(i)B(i) whose domains are both [n][n]. The Boolean dot product ABA\odot B is 1 if and only if there exists an object in the domain for which A(i)B(i)A(i) \wedge B(i) is true. As an example, let A(i)A(i) be the statement "ii is even" and B(i)B(i) be the statement “i2i \le 2”, both over the domain [4][4]. Then the above example answers the question, “Is there an element of [4][4] that is both even and less than or equal to 22?”

Definition 14.3.3 Let A=(Ai,j)A=(A_{i,j}) be a m×km\times k Boolean matrix and B=(Bi,j)B=(B_{i,j}) be a k×nk \times n Boolean matrix. Their Boolean product is the m×nm\times n matrix AB=(r=1k(Ai,rBr,j));A\odot B = \left( \bigvee_{r=1}^k (A_{i,r} \wedge B_{r,j})\right); that is, the i,ji,j-th entry of ABA \odot B is the Boolean dot product of the ii-th row of AA with the jj-th column of BB.

Note that AA must have exactly as many columns as BB has rows, else the Boolean dot products will be undefined.

Example. Let A=(010110) A = \left(\begin{array}{cc} 0 & 1 \\ 0 & 1\\ 1 & 0 \end{array}\right) and B=(110010). B = \left(\begin{array}{ccc} 1 & 1 & 0 \\ 0 & 1 & 0 \end{array}\right). Their Boolean product ABA \odot B is the matrix whose entries are the Boolean dot products of the appropriate rows of AA with the corresponding columns of BB.

For example, the entry in the first row and first column is (01)(10)=00=0, (0 \wedge 1) \vee (1 \wedge 0) = 0 \vee 0 = 0, as evidenced by the fact that the first row (0,1)(0,1) of AA and the first column (1,0)T(1,0)^T of BB do not have any 11's in the same spot. Meanwhile, the 1,21,2-th entry of the product is 11, because the first row and the second column both have a 11 in the second position. Here is the full product: AB=(010010110) A \odot B = \left( \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{array} \right)

As an exercise, compute BAB \odot A and verify that it is not equal to ABA \odot B.

From the logical perspective, a Boolean matrix is a family of statements. The Boolean product then gives a matrix whose entries answer the question, “Is there a subject for which the statement represented by this row and the statement represented by that column are both true simultaneously?”

Takeaways

Chapter 14 problems

15. Relations

Review.

As previously stated, a relation is a way to mathematically encode the idea of a relationship between objects. What does that mean? When we “encode” an idea mathematically, we are defining a mathematical object that captures the important data of that idea. Because the thing is now a mathematical object, we can do mathematics to it. If the thing has become an expression involving real numbers, it is now subject to algebra. If the thing has become a set, it is now subject to set operations, logic, proof, combinatorics, so on. One may imagine importing a package into a programming language so they have new objects to “play with” in that language.

In this chapter we will define relations, discuss various ways to represent them, and discuss some properties of interesting relations.

15.1 Basic notions and representation

Remember that the product X×YX \times Y of two sets is the collection of all pairs (x,y)(x,y) whose first component is from XX and whose second component is from YY. Therefore, this is a good place to start thinking about connecting two sets.

Definition 15.1.1 Let XX and YY be sets. A relation RR from XX to YY is any subset RX×YR \subseteq X \times Y. We say that XX is the domain and YY is the codomain of YY. When (x,y)R(x,y) \in R, we say "xx is related to yy" and may write xRyxRy.

Example. Let X={x,y,z}X = \{x,y,z\} and Y={1,2}Y=\{1,2\}. One of the many relations from XX to YY is the set {(x,1),(x,2),(y,2)}. \{ (x,1), (x,2), (y,2) \}. We would say (for instance) that xx is related to 11, or that zz is related to nothing.

Some relations have their own symbols. We will be a little sloppy and sometimes identify the statement xRyxRy with the relation itself.

Example. The relation Δ={(x,x)    xX}\Delta=\{(x,x) \;|\; x \in X\} is the relation of all pairs (x,y)X×X(x,y) \in X \times X where x=yx=y. (When the domain and codomain of a relation are the same we say it is a relation on XX.) This relation is called the diagonal relation on the set XX. (Exercise: Figure out why.) It may be denoted with the symbol Δ\Delta (capital delta), or it may be denoted with the statement characterizing its elements. In other words, we might say “Consider the relation x=yx=y on a set XX.”

You have seen functions in algebra, or perhaps calculus, and we have used them in various definitions. In other words, a function f:XYf:X \to Y is an assignment that pairs every element xXx \in X with a unique element f(x)Yf(x) \in Y. A function is a relation, subject to the extra criteria that every element of XX is paired to exactly one element of YY, whereas a relation can have elements of XX related to many elements of YY, or no elements of YY.

If XX and YY are finite sets there are two convenient ways to represent a relations between XX and YY. Without loss of generality, label the elements of XX as x1,x2,,xmx_1, x_2, \ldots, x_m and those of YY as y1,y2,,yny_1, y_2, \ldots, y_n. Consider that each element xiXx_i \in X induces the statement "xix_i is related to yjy_j". Remembering that Boolean matrices encode families of statements, we may define the m×nm\times n matrix MR=(mi,j) M_R = (m_{i,j}) where mi,j=1m_{i,j} = 1 if and only if xix_i is related to yjy_j.

Example. Earlier we had the relation {(x,1),(x,2),(y,2)} \{ (x,1), (x,2), (y,2) \} from X={x,y,z}X = \{x,y,z\} to Y={1,2}Y=\{1,2\}. There are many different matrices representing this relation depending on how the sets XX and YY are ordered. When there is an “obvious” ordering, say numerical or alphabetical, we will use that one. Under these orderings the matrix for the relation is (110100) \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{array} \right) .

Consider a relation RR on a set XX (meaning domain and codomain are the same set). We can also represent RR visually with something called a directed graph or digraph for short. From this perspective, the elements of XX are called the vertices of the graph and the pairs (x,y)(x,y) in the relation are called edges or arrows. An arrow is drawn from one vertex to another if those vertices are related.

Example. Let X={a,b,c,d}X = \{a,b,c,d\}, and consider the relation R={(a,b),(a,d),(b,b),(b,c),(d,a)}. R = \{ (a,b), (a,d), (b,b), (b,c), (d,a) \}. The directed graph for this relation is shown below.

The directed graph for the relation is shown.

Figure. The directed graph for the relation is shown.

It is possible to draw many graphs for one relation, so why can we say “the directed graph”? Observe that any graphs will have the same arrows between the same vertices. Two graphs for which there is a 1-1 correspondence that preserves the edges are isomorphic and are functionally the same. Therefore, we say “the” directed graph for a relation.

15.2 Properties of relations

As we study relations in the remaining chapters of this book there are a few properties that will make for valuable terminology. For the remainder of the chapter, every relation has the same domain and codomain.

Definition 15.2.1 A relation RR on a set XX is reflexive if every element is related to itself, in other words (x,x)R(x,x) \in R for all xXx \in X. The pair (x,x)(x,x) is a diagonal pair.

Definition 15.2.2 A relation RR on a set XX is antireflexive if no element is related to itself, in other words (x,x)∉R(x,x) \not\in R for all xXx \in X.

So a relation is reflexive if every element is related to itself, and antireflexive if none are. A relation can be neither, if some elements are related to themselves and some aren’t! (One relation is both, but that’s an exercise.)

In all of this section’s examples you should draw a digraph for each relation yourself and make sure you agree with which properties each relation has.

Examples. Let X={a,b,c,d}X = \{a,b,c,d\}. Consider the relations R1={(a,b),(a,d),(b,c),(b,d),(d,b),(d,c)}, R_1 = \{ (a,b), (a,d), (b,c), (b,d), (d,b), (d,c) \}, R2={(a,a),(a,d),(b,b),(b,d),(c,c),(c,d),(d,d)}, R_2 = \{ (a,a), (a,d), (b,b), (b,d), (c,c), (c,d), (d,d) \}, R3={(a,b),(b,b),(b,c),(c,d),(d,a),(d,d)},R_3 = \{(a,b), (b,b), (b,c),(c,d),(d,a), (d,d)\}, as well as the full relation X×XX \times X and the empty relation \varnothing.

The relations R2R_2 and X×XX\times X are reflexive as they contain all pairs of the form (x,x)(x,x). The relations R1R_1 and \varnothing are likewise antireflexive. The relation R3R_3 is neither reflexive nor antireflexive, as it contains some of these “diagonal pairs” but not all.

Definition 15.2.3 A relation RR on a set XX is symmetric if for all x,yXx, y \in X where (x,y)R(x,y) \in R, then (y,x)R(y,x) \in R. Such a pair (x,y)(x,y) is said to be invertible.

Definition 15.2.4 A relation RR on a set XX is antisymmetric if for all x,yXx, y \in X where (x,y)R(x,y) \in R, then (y,x)∉R(y,x) \not\in R unless x=yx=y. Equivalently the only invertible pairs in an antisymmetric relation are the diagonal pairs.

It is crucial to understand that symmetry and antisymmetry are two ends of a spectrum, measuring how many of the pairs in the relation are invertible. If they all are, the relation is symmetric. We would like to begin the next sentence with “if none are” but this is actually impossible, because if (x,x)R(x,x) \in R then it is automatically invertible. In other words the fewest amount of pairs that can be inverted are the diagonal pairs. So, an antisymmetric relation is one where the only invertible pairs are on the diagonal. Finally, a relation may be neither or both.

Examples. As before let X={a,b,c,d}X = \{a,b,c,d\} and consider the relations R1={(a,b),(a,d),(b,c),(b,d),(d,b),(d,c)}, R_1 = \{ (a,b), (a,d), (b,c), (b,d), (d,b), (d,c) \}, R2={(a,a),(a,d),(b,b),(b,d),(c,c),(c,d),(d,d)}, R_2 = \{ (a,a), (a,d), (b,b), (b,d), (c,c), (c,d), (d,d) \}, R3={(a,b),(b,b),(b,c),(c,d),(d,a),(d,d)},R_3 = \{(a,b), (b,b), (b,c),(c,d),(d,a), (d,d)\}, as well as the full relation X×XX \times X and the empty relation \varnothing.

The relation X×XX \times X is symmetric, because it contains all the possible pairs; so every (x,y)(x,y) will find its (y,x)(y,x). It is less obvious but equally true that \varnothing is symmetric. The definition is that every pair must be invertible; if there are no pairs then the definition is satisfied automatically.

The relations R1R_1, R2R_2, and R3R_3 are each not symmetric because they contain a pair that is not invertible. The relation R1R_1 has (a,d)(a,d) but no (d,a)(d,a); R2R_2 has (b,d)(b,d) but no (d,b)(d,b); and R3R_3 has (a,b)(a,b) but no (b,a)(b,a), for example.

Of these, R1R_1 is not antisymmetric, because it has “some” symmetry (the pairs (b,d)(b,d) and (d,b)(d,b)). The relations R2R_2 and R3R_3 are antisymmetric. The empty set \varnothing manages to be both symmetric and antisymmetric because just as there are no pairs to invert, there are also no pairs to not invert. Vacuousness!

The final property to discuss is transitivity. You have heard this word before, both without this book (transitivity of equality is used to solve equations in algebra) and within (inclusion, implication, and divisibility are all transitive).

Definition 15.2.5 A relation is transitive if for all x,y,zXx, y, z \in X, whenever (x,y)R(x,y) \in R and (y,z)R(y,z) \in R we also have (x,z)R(x,z)\in R.

There are two implications of this definition. The first is that any local symmetry within a transitive relation – that is, the presence of an invertible pair – must involve a loop. That is, suppose we have (x,y)R(x,y) \in R and (y,x)R(y,x) \in R in our relation. If we replace zz with xx in the above definition it forces us to include the loop (x,x)(x,x) in our relation.

The next implication involves a new definition.

Definition 15.2.6 Consider a relation RR on the set X={x1,x2,,xn}X = \{x_1, x_2, \ldots, x_n\}. A walk of length nn is a sequence (x1,x2,,xn,xn+1)(x_1, x_2, \ldots, x_n, x_{n+1}) where (xi,xi+1)R(x_i, x_{i+1}) \in R for all 1in1 \le i \le n.

(The length of the walk is a measure of the number of arrows involved, not the number of elements in the sequence.)

The second implication is that in a transitive relation, if (xi)(x_i) is a path beginning at x1x_1 and ending at xnx_n, then x1x_1 must be related to xnx_n.

Examples. One more time: Let X={a,b,c,d}X = \{a,b,c,d\} and consider the relations R1={(a,b),(a,d),(b,c),(b,d),(d,b),(d,c)}, R_1 = \{ (a,b), (a,d), (b,c), (b,d), (d,b), (d,c) \}, R2={(a,a),(a,d),(b,b),(b,d),(c,c),(c,d),(d,d)}, R_2 = \{ (a,a), (a,d), (b,b), (b,d), (c,c), (c,d), (d,d) \}, R3={(a,b),(b,b),(b,c),(c,d),(d,a),(d,d)},R_3 = \{(a,b), (b,b), (b,c),(c,d),(d,a), (d,d)\}, as well as the full relation X×XX \times X and the empty relation \varnothing.

The relation R1R_1 is missing the edge (a,c)(a,c), for example, the loop (b,b)(b,b), so it is not transitive.

However, R2R_2 is transitive. There are no pairs (x,y)(x,y) and (y,z)(y,z) missing their (x,z)(x,z). A common mistake is to ask, “Shouldn’t we require (a,b)(a,b) for transitivity?” but neither aa nor bb are actually related to each other. Both are related to dd, but the direction of the arrow matters. (If the (b,d)(b,d) arrow were reversed, then a (a,b)(a,b) arrow would be necessary.)

The relation R3R_3 is not transitive. It is missing many edges required for transitivity; (b,d)(b,d) is one example. As an exercise, try to find all of the missing edges that keep R1R_1 and R3R_3 from being transitive.

The relation X×XX\times X is transitive because it contains all possible pairs. No pairs can possibly be missing, so it is transitive “by default.” Likewise \varnothing is transitive because there are no pairs (x,y)(x,y) and (y,z)(y,z) that can fail to have their (x,z)(x,z).

Takeaways

Chapter 15 problems

16. Closure and composition

Review.

Many times we will have a mathematical object in hand but desire it to have a property it does not have. For example, suppose I have the natural numbers N\mathbf{N}, but would like to be able to subtract these numbers. Sometimes I will be successful. For instance, 53=2N5-3=2 \in \mathbf{N}. But othertimes I will not; say, in the case of 35=23-5=-2, which is not a natural number.

Then we construct a new object, the closure of the original, that contains the same data and the minimum amount of extra data required to allow the desired property. In the case of closing N\mathbf{N} under subtraction, the result is Z\mathbf{Z}, the smallest set containing both the natural numbers and the results of subtracting any two natural numbers.

A closure YY of the set XX under the property PP must satisfy the following three conditions:

  1. YY must have property PP.
  2. XYX \subseteq Y. In other words, YY must be a superset of XX. (“Subset” and “superset” are converses.)
  3. For any ZZ such that ZZ has property PP and XZX \subseteq Z, it must also be the case that YZY \subseteq Z. In this sense YY is minimal among the supersets of XX that have PP. (Chapter 18 will make more sense of this idea, but not specifically as it pertains to closure.)

In this chapter we will close a relation under reflexiveness, symmetry, and transitivity. The third of these will require a detour into the operation of composing relations, and finally provide the response to Chapter 14’s call.

16.1 Reflexive and symmetric closures

In this section, RR is a relation on the set XX. The set XX need not be finite unless we want to talk about a matrix or a directed graph.

Example. Suppose we have the relation R={(a,b),(a,c),(b,b),(c,b)}R = \{(a,b), (a,c), (b,b), (c,b)\} on {a,b,c}\{a,b,c\}. To construct the reflexive closure of RR, we must add the pairs (a,a)(a,a) and (c,c)(c,c). (Notice (b,b)R(b,b)\in R already.) So the reflexive closure of RR is the new relation {(a,a),(a,b),(a,c),(b,b),(c,b)}. \{(a,a),(a,b),(a,c),(b,b),(c,b),(c,c)\}.

Definition 16.1.1 Given a set XX, the diagonal relation Δ\Delta on XX is the relation Δ={(x,x)    xX}. \Delta = \{(x,x) \;|\; x \in X \}.

The diagonal relation is the smallest reflexive relation on a set.

Definition 16.1.2 Given a relation RR on a set XX, the reflexive closure of RR is the relation RΔR \cup \Delta.

In other words, to close a relation under reflexiveness it is only necessary to ensure it contains all the pairs in the diagonal relation.

Are these really definitions? Well, no. In the Form of discrete math textbook in the library of Heaven, the definition of closure is the one given in the preamble to the section – in this case, the smallest reflexive superset of RR – and then it is a theorem that RΔR \cup \Delta is the reflexive closure. (It is also a theorem that a closure is necessarily unique, but this is easier to prove and only needs proving once for all closures.) Fortunately for us, this is not the Form of the discrete math textbook in the library of Heaven; so we will take these as definitions. Prove that they are truly what we say they are, as an exercise, if you like.

Next we will construct the symmetric closure of any relation.

Example. Again take R={(a,b),(a,c),(b,b),(c,b)}R = \{(a,b), (a,c), (b,b), (c,b)\} on {a,b,c}\{a,b,c\}. To construct the symmetric closure of RR, we must add the pairs (b,a)(b,a), (c,a)(c,a), and (b,c)(b,c). (As previously discussed, (b,b)(b,b) is invertible automatically.) Therefore the symmetric closure of RR is {(a,b),(a,c),(b,a),(b,b),(b,c),(c,a),(c,b)}. \{(a,b), (a,c), (b,a), (b,b), (b,c), (c,a), (c,b)\}.

Definition 16.1.3 Given a relation RR on a set XX, its inverse relation is the relation R1={(y,x)    (x,y)R}. R^{-1} = \{ (y,x) \; | \; (x,y) \in R \}.

Example. You have seen this before. If ff is a function that has an inverse and (x,y)f(x,y) \in f (meaning y=f(x)y=f(x)), then (y,x)f1(y,x) \in f^{-1} (meaning x=f(y)x=f(y)).

Definition 16.1.4 Given a relation RR on a set XX, the symmetric closure of RR is the relation RR1R \cup R^{-1}.

16.2 Towards a transitive closure

We can’t define a transitive closure yet, but we can probably build one. Let’s do an example that is particularly relevant at the time this book was written.

Example. In the past two weeks, Alicia has hung out with Bob. Bob and Carlos are roommates, and Davina was Carlos’s cashier at the grocery store the other day. Davina lives with her boyfriend, Erik. If Alicia gets sick, who is at risk?

The underlying relation is
R={(a,b),(b,c),(c,d),(d,e)} R = \{ (a,b), (b,c), (c,d), (d,e) \}
on the set {a,b,c,d,e}\{a,b,c,d,e\} (where aa refers to Alicia, etc.), the set of all pairs (x,y)(x,y) where xx interacted with yy over the past two weeks. However, this is not the relation that totally represents the spread of the illness, because if yy was exposed to the illness in xx and likewise zz was exposed by xx, then zz could have caught something from yy. So if (x,y)(x,y) and (y,z)(y,z) are in RR, we would like (x,z)(x,z) to be in our relation. Therefore we are looking for the transitive closure of RR.

Transitivity is best understood visually. Draw a directed graph with vertices representing aa, bb, cc, dd, and ee, and draw arrows between aa and bb, bb and cc, cc and dd, and dd and ee. These are the arrows in RR. If our relation is to be transitive, these arrows will induce arrows between aa and cc, between bb and dd, and between cc and ee, i.e. the arrows corresponding to the paths of length 2 in the original relation. Then we will get arrows between aa and dd and between bb and ee (paths of length 3). Finally, we will draw one last arrow between aa and ee (paths of length 4). You can verify for yourself that the relation R={(a,b),(a,c),(a,d),(a,e),(b,c),(b,d),(b,e),(c,d),(c,e),(d,e)} R^* = \{(a,b), (a,c), (a,d), (a,e), (b,c), (b,d), (b,e), (c,d), (c,e), (d,e)\} is transitive.

If you understand that example, then you understand the idea of a transitive closure. It remains to put some formal machinery behind the idea.

16.3 Composition

Composition may be a familiar idea from algebra. Suppose you have functions ff and gg such that f(x)=yf(x)=y and g(y)=zg(y)=z. Then you have a third function fgfg such that fg(x)=zfg(x) = z, called the composition of ff and gg.

You are extremely likely to have seen fgfg written as gfg\circ f before, and probably again outside of this book. We will have a material reason to prefer the fgfg notation very shortly. (And I prefer it for aesthetic reasons. - DW)

A composition of relations is the identical idea, but without the extra restrictions that make a relation into a function.

Definition 16.3.1 Let RR be a relation from XX to YY, and SS be a relation from YY to ZZ. Their composition is the relation RS={(x,z)    y s.t. (x,y)R and (y,z)S}.RS = \{(x,z) \; | \; \exists y \text{ s.t. } (x,y) \in R \text { and } (y,z) \in S \}. The composition of RR with itself nn times is denoted RnR^n.

Again, elsewhere you may see RSRS written as SRS \circ R.

Example. Let R={(x1,y2),(x2,y3),(x3,y1)}R = \{ (x_1,y_2), (x_2,y_3), (x_3,y_1) \} be a relation from X={x1,x2,x3}X=\{x_1,x_2,x_3\} to Y={y1,y2,y3}Y=\{y_1,y_2,y_3\}, and S={(y1,z1),(y1,z2),(y2,z2)}S = \{(y_1,z_1), (y_1,z_2), (y_2,z_2)\} be a relation from YY to Z={z1,z2}Z=\{z_1,z_2\}. What is RSRS?

The relation RSRS is the set of all pairs (x,z)(x,z) for which there is a yy “connecting” the xx and the zz. Looking at the given relations we may identify the following pairs: (x1,z2)(x_1,z_2) via y2y_2, (x3,z1)(x_3,z_1) via y1y_1, and (x3,z2)(x_3,z_2) also via y1y_1. Therefore RS={(x1,z2),(x3,z1),(x3,z2)}.RS = \{(x_1, z_2), (x_3,z_1), (x_3,z_2)\}.

The following diagram illustrates this composition.

A picture of the composition of two relations is shown. Two elements are related in the composition if a third in the middle set 'bridges' them.

Figure. A picture of the composition of two relations is shown. Two elements are related in the composition if a third in the middle set "bridges" them.

Observe that SRSR does not exist because the codomain of SS is not the domain of RR.

You may tell yourself, “I don’t want to draw this diagram every time I compute a composition.” And you may tell yourself, “If the relations involved were large, I would like to be able to make a computer do this job for me.” Fortunately this apparatus is already in place! Recall the criteria for a pair (xi,zj)(x_i,z_j) appearing in the composition for sets whose elements are labeled numerically:

"xix_i is related to zjz_j if there exists a kk such that xkx_k is related to yky_k and yky_k is related to zkz_k."

You may recall this exact language has appeared before:

“The Boolean dot product of a row matrix (Ai)(A_i) and a column matrix (Cj)(C_j)" is 11 if there exists a kk such that AkA_k and CkC_k are both 11.”

It is precisely the multiplication of Boolean matrices that will find the necessary yky_k's for us. Indeed,

Theorem 16.3.2 Let RR and SS be relations such that the composition RSRS exists. Then MRS=MRMS, M_{RS} = M_R \odot M_S, i.e. the matrix for RSRS is the product of the matrix for RR with the matrix for SS.

Example. In the last example we let R={(x1,y2),(x2,y3),(x3,y1)}R = \{ (x_1,y_2), (x_2,y_3), (x_3,y_1) \} be a relation from X={x1,x2,x3}X=\{x_1,x_2,x_3\} to Y={y1,y2,y3}Y=\{y_1,y_2,y_3\}, and S={(y1,z1),(y1,z2),(y2,z2)}S = \{(y_1,z_1), (y_1,z_2), (y_2,z_2)\} be a relation from YY to Z={z1,z2}Z=\{z_1,z_2\}.

The matrix for RR is MR=(010001100) M_R = \left( \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right) and the matrix for SS is MS=(110100). M_S = \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{array} \right). Their product is (010001100)(110100)=(010011), \left( \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right) \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{array} \right) = \left( \begin{array}{cc} 0 & 1 \\ 0 & 0 \\ 1 & 1 \end{array} \right), which is also the matrix for RSRS.

16.4 Transitive closure

Composition takes the pairs (x,y)(x,y) and (y,z)(y,z) and returns (x,z)(x,z), so you can already imagine how this would relate to transitivity. A transitive relation is one that contains all of its self-compositions.

Theorem 16.4.1 A relation RR on a set XX is transitive if and only if RnRR^n \subseteq R for all positive integers nn.

Therefore, the smallest transitive relation containing RR is the union of all of the RnR^n. You will recall a very, very long time ago that we defined the infinite union n=1Xn=X1X2 \bigcup_{n=1}^\infty X_n = X_1 \cup X_2 \cup \cdots to be the set of all elements that are in at least one of the XiX_i, and said that this definition would be needed exactly once in the book. Well, here we are!

Definition 16.4.2 The transitive closure of a relation RR on a set XX is the relation R=n=1Rn. R^* = \bigcup_{n=1}^\infty R^n.

Of course, in practice, we will not need to calculate infinitely many composites. You will be able to tell that there are no new pairs that would arise from a new composition.

The following comment makes it easy to conceptualize RnR^n. Recall a walk of length nn is a sequence (x1,x2,,xn+1)(x_1, x_2, \ldots, x_{n+1}) where (xi,xi+1)R(x_i,x_{i+1}) \in R. We can call these nn-walks for short.

Theorem 16.4.3 Given a relation RR on a set XX, the relation RnR^n is precisely the set of all pairs (x,y)(x,y) such that there is an nn-walk in RR beginning at xx and ending at yy.

Example. Consider the set X={w,x,y,z}X=\{w,x,y,z\} and the relation R={(w,x),(x,y),(y,z),(z,y)}R = \{(w,x), (x,y), (y,z), (z,y) \}. Construct RR^*.

We will find R2R^2 and R3R^3, and the union of these with RR will be the transitive closure. (We will verify that R4R^4 will yield nothing new.)

To find R2R^2, find all of the pairs that are “two steps away” in the original relation. You may immediately name (w,y)(w,y) and (x,z)(x,z) as two such pairs, but there are two less obvious pairs: (y,y)(y,y) and (z,z)(z,z). Recall from the last chapter that any time you have invertible pairs in a transitive relation, that symmetry must produce loops; because yy is related to zz and zz is related to yy, for the relation to be transitive yy must be related to yy. (Same for zz.) Therefore R2={(w,y),(x,z),(y,y),(z,z)}.R^2 = \{(w,y), (x,z), (y,y), (z,z)\}.

The triple composite R3R^3 consists of all the pairs that are “three steps away,” which at first glance is just (w,z)(w,z). In fact, if we stopped here we would be correct about RR^*, but we should be careful: because of the loops in R2R^2 there are more pairs in R3R^3. For example, one iteration of RR may take xx to yy; another takes yy to zz; and a third takes zz back to yy. Therefore the pair (x,y)(x,y) is in RR^*, even though it is also in RR. Check for yourself that R3={(w,z),(x,y),(y,z),(z,y)} R^3 = \{(w,z), (x,y), (y,z), (z,y) \} is the set of all pairs that start and end 3-paths in the original relation.

Finally, we take the union of these three sets to make R={(w,x),(w,y),(w,z),(x,y),(x,z),(y,y),(y,z),(z,y),(z,z)}.R^*= \{(w,x), (w,y), (w,z), (x,y), (x,z), (y,y), (y,z), (z,y), (z,z)\}.

Takeaways

Chapter 16 problems

17. Equivalence relations

Review:

17.1 Introduction

Consider the digraph generated by the relation RR on X={a,b,c,d,e,f,g}X=\{a,b,c,d,e,f,g\}, where R={(a,a),(a,d),(a,g),R= \{ (a,a), (a,d), (a,g), (b,b),(b,c),(b,e),(b,b), (b,c), (b,e), (b,f),(c,b),(c,c),(b,f), (c,b), (c,c), (c,e),(c,f),(d,a),(c,e), (c,f), (d,a), (d,d),(d,g),(e,b),(d,d), (d,g), (e,b), (e,c),(e,e),(e,f),(e,c), (e,e), (e,f), (f,b),(f,c),(f,e),(f,f)}(f,b), (f,c), (f,e), (f,f)\}. The digraph is drawn below:

The directed graph of the relation. Three elements occupy their own triangle, while the other four have their own square.

Figure. The directed graph of the relation. Three elements occupy their own triangle, while the other four have their own square.

The most visually striking feature of this directed graph is that the set XX is partitioned into separated pieces: the piece with aa, dd, and gg; and the piece with bb, cc, ee, and ff.

To understand this phenomenon, what properties does RR have? It is reflexive, it is symmetric, and it is transitive. (Verify these for yourself.) Taken together they imply a “fullness” of the relation; the arrow (a,d)(a,d) brings with it the arrows (a,a)(a,a), (d,a)(d,a), and (d,d)(d,d); if we add (d,g)(d,g), we get (a,g)(a,g), (g,d)(g,d), (g,a)(g,a), and (g,g)(g,g).

Such a relation gets its own title.

Definition 17.1.1 An equivalence relation is any relation that is reflexive, symmetric, and transitive.

Examples. The relation given by the statement x=yx=y on any set XX is an equivalence relation. In fact it is the prototypical equivalence relation, in the sense that the point of an equivalence relations is to get the idea of equality in a less strict way.

For example, the typical way to define equivalence on a set is to say that two sets XX and YY are isomorphic if X=Y|X|=|Y|. This relation is an equivalence relation, but XX and YY do not need to actually be equal to be equivalent. This is the reason we freely rename the elements of a set whenever we like.

Another example is the equivalence of statements from Chapter 2, where ϕψ\phi \equiv \psi if ϕψ\phi \leftrightarrow \psi is a tautology.

Definition 17.1.2 Let RR be an equivalence relation on a set XX. The equivalence class of xXx \in X is the set of all elements to which it is equivalent: [x]={yX    (x,y)R}. [x] = \{ y \in X \; | \; (x,y) \in R \}.

Example. Let again R={(a,a),(a,d),(a,g),R= \{ (a,a), (a,d), (a,g), (b,b),(b,c),(b,e),(b,b), (b,c), (b,e), (b,f),(c,b),(c,c),(b,f), (c,b), (c,c), (c,e),(c,f),(d,a),(c,e), (c,f), (d,a), (d,d),(d,g),(e,b),(d,d), (d,g), (e,b), (e,c),(e,e),(e,f),(e,c), (e,e), (e,f), (f,b),(f,c),(f,e),(f,f)}(f,b), (f,c), (f,e), (f,f)\} be a relation on X={a,b,c,d,e,f,g}X = \{a,b,c,d,e,f,g\} whose directed graph is drawn above. This relation divides its underlying set into the equivalence classes [a]={a,d,g} [a] = \{a,d,g\} and [b]={b,c,e,f},[b]=\{b,c,e,f\}, precisely the discrete “sections” of the digraph. Note that the name of the equivalence class is an arbitrarily chosen representative; [b][b], [c][c], [e][e], and [f][f] all refer to the same set.

17.2 Modular arithmetic

Now we consider a special equivalence relation on the set of integers.

Definition 17.2.1 Fix a nonzero integer aa. We declare mm and nn to be congruent modulo a\boldsymbol{a}, and write mn(mod a)m \equiv n \,(\text{mod }a), if a(mn)a|(m-n).

We are functionally declaring mm and nn “the same” if they differ by a multiple of aa. In a way, we are “destroying” the information of aa. Throughout even very different areas of mathematics, the word “modulo” or “mod” refers to collapsing some information in a structure, to be very vague.

Theorem 17.2.2 For a fixed nonzero integer aa, congruence mod aa defines an equivalence relation on Z\mathbf{Z}.

Proof. Up to you, in the exercises.

Examples. In the right context this relation is a familiar one. Suppose it is 10:00 in the morning. What time will it be 37 hours from now? Of course we subtract a full day, or 24 hours, from 37; so what time is it 13 hours from now? It will be 23:00 or 11:00 in the evening, depending on how you tell time. This addition is done modulo 24 (or 12).

A lot of trigonometry is done modulo 2π2\pi, but since 2π2\pi is not an integer, we don’t care about it in this book.

1733(mod 8)17 \equiv 33 \,(\text{mod }8), because 3317=1633-17=16 and 8168|16. You may also think of this as being true because the remainder of both 1717 and 3333 is 11 upon being divided by 88.

However, 23≢49(mod 5)23 \not\equiv 49 \,(\text{mod }5) because 4923=2649-23=26, which is not divisible by 55.

The equivalence class of a given element mod aa is not difficult to generate with the following observation, via example. Suppose we want to know the equivalene class of 1111 modulo 66. We are looking for integers mm satisfying 6(11m)6|(11-m); equivalently, integers mm for which there exists a kk such that 11m=6k11-m=6k. Or, m=116km=11-6k. Therefore, we will allow kk to run through all of its options–both positive and negative–so the integers in the equivalence class of 1111 mod 66 are just the integers that are 1111 plus some multiple of 66: [11]={,7,1,5,11,17,23,}. [11] = \{\ldots, -7, -1, 5, 11, 17, 23, \ldots \}. Another equivalence class mod 6 is [3]={,9,3,3,9,15,}. [3] = \{\ldots, -9, -3, 3, 9, 15, \ldots \}. Observe that there is one equivalence class per possible remainder when dividing by 66. The smallest non-negative member of each class is the remainder of all the members of its class modulo 6.

Finally, we will note some useful algebraic properties of this relation.

Theorem 17.2.3 Suppose uvu \equiv v and xyx \equiv y modulo aa. Then (u+x)(v+y)(mod a) (u+x) \equiv (v+y) \,(\text{mod }a) and uxvy(mod a). ux \equiv vy \,(\text{mod }a).

Example. In certain cryptography contexts it is helpful to know the remainders of large integers mod something, but it may expensive to calculate those large integers. Instead we exploit the above theorem. Suppose we want to know the value of (191227×48)(mod 9).(19^{12} - 27 \times 48) \,(\text{mod }9).

First, observe that since 2727 is divisible by 99, 27027 \equiv 0. (It is okay to drop the “mod9mod 9” here because it should be clear from context after the above paragraph.) Since 00 times anything is 00, (191227×48)1912(19^{12} - 27 \times 48) \equiv 19^{12}.

Next, observe 19119 \equiv 1. Then, 191211219^{12} \equiv 1^{12}, which is just 11.

So, (191227×48)1(mod 9)(19^{12} - 27 \times 48) \equiv 1 \,(\text{mod }9). In fact, since 11 is the smallest nonnegative member of its equivalence class, it is the remainder we would get if we divided whatever 191227×4819^{12}-27\times 48 is by 99.

Takeaways

Chapter 17 problems

18. Partial orders

Review.

In the last chapter we looked at equivalence relations, which generalize the idea of equality. Partial orders will do the same for inequalities. How can we say one object is “less than” another when the two objects are not numbers? You may be able to think of a way to order sets, or strings, or matrices. That way is probably a partial order.

18.1 Introduction

The secret ingredient to order is that an order must only go one way. If xx is less than yy, the only way that yy should be able to be less than xx is if the two are equal (and by “less than” we really mean “less than or equal to”). Fortunately, we have a word for this: antisymmetry!

Definition 18.1.1 A relation is a partial order if it is reflexive, antisymmetric, and transitive. If RR is a partial order on a set XX and (x,y)R(x,y) \in R, we write xRyx \le_R y, xyx \preceq y, or xyx \le y. The third is only used when the partial order is clear from context.

Examples. The prototypical partial order is the relation xyx \le y on the integers. This relation is reflexive (any integer is less than or equal to itself), antisymmetric (xyx \le y and yxy \le x means x=yx=y), and transitive (as we saw in Chapter 8).

Given a family of sets, there are two “natural” ways to order them: by inclusion, or by size. The first is the relation where sets EE and FF are related if and only if EFE \subseteq F. The other is given by EF|E| \le |F|. In Chapter 15 you verified that the former was a partial order; you will verify the latter in this chapter.

Given a set of integers we may say that mm is related to nn if and only if mnm|n. In Chapter 4 you proved that this is a partial order (even if you didn’t know it at the time).

Why the word “partial”? Take the last example, the order defined by divisibility, and suppose our underlying set is all of Z\mathbf{Z}. Neither 353|5 nor 535|3. Since we have two elements that cannot be ordered, the order is only partial. The following definitions formalize the notion.

Definition 18.1.2 Suppose that RR is a partial order on a set XX. If xyx \preceq y or yxy \preceq x in the partial order, we say xx and yy are comparable; if not, they are incomparable.

Definition 18.1.3 An order on a set XX is total or linear if xx and yy are comparable for every x,yXx,y \in X.

Example. The usual order on the integers is a total order. The divisibility order is total if restricted to the right set, for example, the set of powers of 22.

Definition 18.1.4 A set XX is well-ordered if it has a total order \preceq and an element x0x_0 such that x0xx_0 \preceq x for all xXx \in X.

In language we will learn later, a well-ordered set is a set with a total order and a “minimum” element.

Example. The poster child for well-ordered sets is N\mathbf{N}, with the usual total order and the least element 00. Observe that one of the most important features of N\mathbf{N}, the validity of mathematical induction, is not special to N\mathbf{N} at all: all we need is somewhere to start and a way to move “forward” through the set. So, well-ordered sets are precisely those for which mathematical induction is possible.

18.2 Hasse diagrams

As relations, partial orders can be represented with directed graphs. Let’s give it a try with a simple partial order: the inclusion order on the power set of {a,b,c}\{a,b,c\}.

The directed graph for the power set of a three-element set, ordered under inclusion. The number of arrows make it incomprehensible.

Figure. The directed graph for the power set of a three-element set, ordered under inclusion. The number of arrows make it incomprehensible.

This visual is, in a word, awful. There are too many arrows! However, the fact that we are dealing with a partial order implies that certain arrows are redundant. Since every partial order is reflexive, we can lose the loops. Transitivity says that the edges from \varnothing to {a}\{a\} and from {a}\{a\} to {a,b}\{a,b\} are sufficient; we do not need the edge from \varnothing to {a,b}\{a,b\} (and etc.) Finally if we impose the restriction that all edges flow upward, we can draw the following, much more pleasant, picture.

The Hasse diagram of the previous digraph, where the redundant arrows (those implied by reflexiveness and transitivity) are removed.

Figure. The Hasse diagram of the previous digraph, where the redundant arrows (those implied by reflexiveness and transitivity) are removed.

Such a diagram is called a Hasse diagram.

Definition 18.2.1 Suppose XX is partially ordered and x,y,zXx,y,z \in X. If xyx \le y and there is no intermediate element zz such that xzx \le z and yzy \le z, we say yy covers xx.

Definition 18.2.2 The Hasse diagram of a partial order is a graph on a set XX where there is an edge from a vertex xx up to yy if and only if yy covers xx in the order.

Example. Let X={a,b,c,d,e}X=\{a,b,c,d,e\} and R={(a,a),(a,b),(a,e),R=\{(a,a),(a,b),(a,e), (b,b),(b,e),(c,b),(b,b),(b,e),(c,b), (c,c),(c,e),(d,b),(c,c),(c,e),(d,b), (d,c),(d,d),(d,e),(e,e)}(d,c),(d,d),(d,e),(e,e)\}. Draw the Hasse diagram for this partial order.

To draw a Hasse diagram out of a set of pairs, observe that the elements related to the most other elements should be on the bottom of the diagram, while the elements related to the fewest other elements should be near the top. You can typically then “fit in” the others. Here is the diagram for the above relation:

The Hasse diagram for the example order.

Figure. The Hasse diagram for the example order.

18.3 Extreme features of a partial order

We will close by discussing various “extreme” features a partial order can have.

Definition 18.3.1 Let XX be a partially ordered set. We say that xx is less than yy (in the partial order) if xyx \preceq y, and then yy is greater than xx.

This definition might seem weird, but remember that we are not necessarily talking about numbers. Once everyone knows which partial order is being discussed, we use “less than” and “greater than” to describe relations within the order.

Definition 18.3.2 An element xx in a partially ordered set XX is minimal/maximal if it is not greater/less than any other element. If the minimal/maximal element is unique, it is called the minimum/maximum.

Example. It is important to note that minimal does not mean less than everything; it means not greater than anything, and these are not the same! Consider the order {(w,w),(w,y),(x,x),\{(w,w), (w,y), (x,x), (x,y),(x,z),(y,y),(z,z)}(x,y), (x,z), (y,y), (z,z)\}. Its Hasse diagram is shown below:

The Hasse diagram for the example order, with two minimal elements and two maximal elements.

Figure. The Hasse diagram for the example order, with two minimal elements and two maximal elements.

Above, ww is minimal, but it is not less than zz. (Likewise zz is maximal even though it is not greater than ww.)

Definition 18.3.3 Fix a subset AA of the partially ordered set XX. A lower bound/upper bound of AA is any element of XX that is less/greater than every element in AA.

Example. Consider the order {(a,a),(a,b),(a,c),\{(a,a), (a,b), (a,c), (a,d),(a,e),(a,f),(b,b),(a,d), (a,e), (a,f), (b,b), (b,c),(b,d),(b,e),(b,f),(b,c), (b,d), (b,e), (b,f), (c,c),(c,e),(c,f),(d,d),(c,c), (c,e), (c,f), (d,d), (d,e),(d,f),(e,e),(f,f),(d,e), (d,f), (e,e), (f,f), (g,d),(g,e),(g,f),(g,g)}(g,d), (g,e), (g,f), (g,g)\} and the subset A={b,c,d}A=\{b,c,d\} of the ordered set.

A Hasse diagram with a three-element subset highlighted.

Figure. A Hasse diagram with a three-element subset highlighted.

The upper bounds of AA are the elements ee and ff, because both are greater than every element in AA.

The lower bounds of AA are aa and bb. Since partial orders are reflexive, bb is “less than” (remember, really “less than or equal to”) itself. Because gg is only less than dd, it is not a lower bound of AA.

Definition 18.3.4 Let AA be a subset of a partially ordered set. The greatest lower bound of AA, if it exists, is the maximum element of the set of lower bounds of AA. The greatest lower bound is denoted glb\text{glb} and sometimes called the infemum, or inf\text{inf}, of the set. (The least upper bound, lub\text{lub}, supremum, or sup\text{sup} is the minimum upper bound.)

Sound familiar? Indeed, we have talked about finding “tightest bounds” when we discussed growth rates of sequences all the way back in Chapter 7.

Example. As before take the order {(a,a),(a,b),(a,c),\{(a,a), (a,b), (a,c), (a,d),(a,e),(a,f),(b,b),(a,d), (a,e), (a,f), (b,b), (b,c),(b,e),(b,f),(b,c), (b,e), (b,f), (c,c),(c,e),(c,f),(d,d),(c,c), (c,e), (c,f), (d,d), (d,e),(d,f),(e,e),(f,f),(d,e), (d,f), (e,e), (f,f), (g,d),(g,e),(g,f),(g,g)}(g,d), (g,e), (g,f), (g,g)\} whose Hasse diagram is shown above. And again fix the set A={b,c,d}A=\{b,c,d\}. The greatest lower bound is the element bb; so, glb(A)=b\text{glb}(A)=b.

The least upper bound of AA does not exist. The set of upper bounds of AA is {e,f}\{e,f\}, neither of which is less than the other.

You may have noticed that some Hasse diagrams tend to “sprawl,” while others have pleasing geometric shapes. Our final definition puts some structure on that idea.

Definition 18.3.5 A partially ordered set is a lattice if any pair of elements has both a least upper bound and a greatest lower bound.

Examples. The set {a,b,c,d,e,f,g}\{a,b,c,d,e,f,g\} with the order {(a,a),(a,b),(a,c),\{(a,a), (a,b), (a,c), (a,d),(a,e),(a,f),(b,b),(a,d), (a,e), (a,f), (b,b), (b,c),(b,e),(b,f),(b,c), (b,e), (b,f), (c,c),(c,e),(c,f),(d,d),(c,c), (c,e), (c,f), (d,d), (d,e),(d,f),(e,e),(f,f),(d,e), (d,f), (e,e), (f,f), (g,d),(g,e),(g,f),(g,g)}(g,d), (g,e), (g,f), (g,g)\} from the last two examples is not a lattice, because (for instance) the pair {e,f}\{e,f\} fails to have a least upper bound.

An example of a lattice is the inclusion order on any power set; see Section 18.2 for an example. Surprisingly (or maybe not), any linear order is a lattice, even though they don’t visually “look like” lattices. Take a pair {a,b}\{a,b\} in a linear order where (say) aba \preceq b (it’s either that or the other way around). Then, the greatest lower bound of the pair is aa and the least upper bound of the pair is bb.

Any Boolean algebra yields a lattice, and any finite lattice yields a Boolean algebra. This is an exercise.

Takeaways

Chapter 18 problems

ONE MORE FINAL: Oh, the places you’ll go

Congratulations! You have either read all of, or scrolled to the end of, Elementary discrete mathematics. What next?

If you are a computer science student in your first year, i.e. a member of the target audience of this book, you either will take calculus or have taken it recently. Asymptotics can be phrased in terms of limits. The binomial theorem will give you a powerful tool in computing difficult integrals. You will look at infinite sequences and sums. Most importantly, you have hopefully gained some confidence in mathematics, and you will go on with a greater ability to read and do math.

With perseverance and a good group of people to work with, there’s not a lot you can’t do after reading this book. I don’t claim this book is magic, or has taught you everything you need to know. What I do claim is that once you know about sets and how to read and write a proof, and maybe have some familiarity with recursion, the tapestry of mathematics is open to you, especially after learning the language of calculus and the examples therein. I am particularly fond of combinatorics and category theory.

Don’t want to do anymore math? That’s fine too, I guess. Maybe this book helped you organize your mental machinery for other areas of your life. Maybe you have found within you a confidence to solve problems you did not think you could.

Whatever powers, if any, this book gave you, use them for good, and live well.