Elementary discrete mathematics

0. What is discrete mathematics?

0.1 It’s math you do quietly, right?

The phrase “discrete mathematics” is often used nebulously to refer to a certain branch of mathematics that, from the college point of view, is “not calculus” or perhaps is applicable to computer science. In this section, we will try to understand what it means for a topic to fall under the umbrella of discrete mathematics.

0.2 But really, what is it?

Here’s a question: What is the next number after $1$ ? Your answer depends on “where you live,” so to speak. If you live in the rational numbers (numbers that can be expressed as fractions, like $3/7$ ) or the real numbers (including numbers that can’t be written as fractions, like $\pi$ ), you can’t answer the question. If you give be a real or a rational number that’s a little more than $1$ , say $1.1$ , I can find something even smaller; say, $1.01$ .

If you live in the set of natural numbers, which is the set of numbers that are useful for counting things in the world, the question does have an answer: $2$ . The natural numbers do not suffer from the crowding issue that the rationals and the real do; all of their elements are sufficiently “far apart” in this way. There is $1$ , and then $2$ , and nothing in between. In this sense they are discrete. In this book, we will (usually) live in the natural numbers.

0.3 Some examples

Here are some questions we might try to answer using discrete mathematics. In fact, you will be able to answer all these questions by the end of the book.

Suppose we have a program that will activate condition 4 if conditions 1 and 2 are met but not condition 3. How do we know when condition 4 is active?

We can use logic to represent this program with the statement $(p \wedge q \wedge \neg r) \to s$ , and then use truth tables to determine exactly when condition 4 will trigger.

Which of these two programs is more efficient?

A sequence is a function of natural numbers. An example is the number $(a_n)$ of steps it takes to run a computer program with $n$ inputs. If $(a_n)$ grows must faster than $(b_n)$ does, that can tell us that one program scales worse for large quantities of data.

Some members of a community are sick. How do we know who is at risk of being infected?

Relations encode relationships between sets of objects. We may compose relations to iterate those relationships over and over again. If person $x$ is sick and has come into contact with person $y$ , and $y$ has come into contact with person $z$ , and $z$ has since seen $w$ , then $w$ is at risk.

0.4 How to use this book

Chapters 1-8 must be done in order. Each chapter has a section of exercises located on the problems page. These exercises should be attempted after each chapter. If you are in MATH 174 at Coastal Carolina University, these exercises are your homework.

Then, chapters 9-13 (on combinatorics) and chapters 14-18 (on relations) may be attempted in that order, or in the order 14-18; 9-13. Be aware that some of the exercises in 14-18 assume that the reader has learned the combinatorial material. So, if you follow the alternate order, leave these exercises for later.

This book is paired with a four playlists on my YouTube channel. Playlist 1 (sets and logic) maps to chapters 1-3, playlist 2 (proof and recursion) maps to chapters 4-8, playlist 3 (combinatorics) maps to chapters 9-13, and playlist 4 (relations) maps to chapters 14-18.

These videos are very short demonstrations and examples of the material pursued in depth by the book. I am not sure which ordering is better: book then videos, or videos then book. Choose your own adventure.

I try to use the invitational “we” throughout the book and maintain some level of formality, but sometimes my annoying authorial voice pokes through. I apologize if that bothers you.

Each chapter ends with several “takeaways” that summarize the chapter. Starting with the fourth chapter, if necessary, each chapter begins with hint to go back and look at some things you may not have thought about in a while.

0.5 The point of this book

– is to teach you discrete mathematics, of course.

But much more broadly, this book is intended to teach you to think mathematically. You may be a student who has never seen any math beyond calculus or algebra, or perhaps you dread doing math. The point of this book is to get you to learn to think deeply, communicate carefully, and not shy away from difficult material.

As you proceed, have something to write with on hand. Work each example along with the book. Then, at the end of each chapter, work the exercises. Some are more difficult than others, but are well worth doing to improve your understanding. Work along with someone at roughly your skill level, share solutions and ask each other questions.

Everyone can be a “math person!” So, let’s get started.

1. Sets

Videos:

1.1 Membership

For nearly all interested parties, sets form the foundation of all mathematics. Every mathematical objects you have ever encountered can be described as a set, from numbers themselves to functions and matrices.

We will say that a set is an unordered collection of objects, called its elements. When $x$ is an element of the set $X$ , we write $x \in X$ .

Examples. Our starting point will be the set of all natural numbers,

$\mathbf{N} = \{0,1,2,3,\ldots \}.$

Certain sets are special enough to get their own names, with bold or “blackboard bold” face. Sometimes (especially in ink) you will see $\mathbb{N}$ instead. Note that the elements of the set are enclosed in curly braces. Proper notation is not optional; using the wrong notation is like saying “milk went store” to mean “I went to the store to get milk.” Finally, not everyone agrees that $0$ is a natural number, but computer scientists do.

If we take $\mathbf{N}$ and include its negatives, we get the integers

$\mathbf{Z} = \{\ldots, -2, -1, 0, 1, 2, \ldots \}.$

Notice that in the case of an infinite set, or even a large finite set, we can use ellipses when a pattern is established. One infinite set that defies a pattern is the set of prime numbers. You may recall that a prime number is a positive integer $p$ with exactly two factors: itself and $1$ . We may list a few: $2, 3, 5, 7, 11, 13, 17, 19, 23,$ but there is no pattern that allows us to employ ellipses.

Fortunately there is another way to write down a set. All previous examples have used roster notation, where the elements are simply listed. It may be more convenient to use set-builder notation, where the set is instead described with a rule that exactly characterizes this element. In that case, the set of primes can be written as

$\{ \, p \, | \; p \text{ is a prime number} \}.$

To provide one last example, consider the set of all U.S. states. In roster notation this set is a nightmare: {AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY}. But in set-builder notation we simply write

$\{ \, S \, | \, S \text{ is a U.S. state} \}.$

Definition 1.1.1. Two sets $X$ and $Y$ are equal if they contain exactly the same elements, in which case we write $X=Y$ .

Remember that sets are unordered, so even $\{a,b\}=\{b,a\}$ .

Definition 1.1.2. Let $X$ be a finite set. Its cardinality, denoted $|X|$ , is the number of elements in $X$ .

Example. If $X=\{a,b,c\}$ , then $|X|=3$ .

We can talk about the cardinality of infinite sets, but that is far beyond the scope of this course.

1.2 Containment

One of the most important ideas when dealing with sets is that of a subset.

Definition 1.2.1. Let $X$ and $Y$ be sets. We say $X$ is a subset of $Y$ if every element of $X$ is also an element of $Y$ .

Examples. Put $X = \{a,b,c\}$ , $Y = \{a,c\}$ , and $Z=\{b\}$ . Then $Y \subseteq X$ , $Z \subseteq X$ , but $Y \not\subseteq Z$ and $Z \not\subseteq Y$ .

In mathematics, true statements are either definitions (those taken for granted) or theorems (those proven from the definitions). We will learn to prove more difficult theorems later, but for now we can prove a simple one.

Theorem 1.2.2. Every set is a subset of itself.

A proof is simply an argument that convinces its reader of the truth of a mathematical fact. Valid proofs must only use definitions and the hypotheses of the theorem; it is not correct to assume more than you are given. So the ingredients in the proof of this theorem are simply going to be a set and the definition of the word “subset.”

Proof. Let $X$ be a set. Suppose $x$ is an arbitrary element of $X$ . Then, $x$ is also an element of $X$ . Because $X$ was shown to have any arbitrary element of $X$ , then $X \subseteq X$ . $\square$

Notice that the actual make-up of the set $X$ is irrelevant. If $X$ has an element, then $X$ also has that element, and that is all it takes to conclude $X \subseteq X$ . It would be incorrect to try to prove $X \subseteq X$ from an example: just because $\{0\}$ is a subset of itself doesn’t necessarily mean that $\{0,1\}$ is!

We will discuss logic and proof-writing at length as we continue. For now, let’s continue learning about sets and proving the “simple” theorems as we go.

Definition 1.2.3. The set with no elements is called the empty set, denoted $\{ \}$ or $\varnothing$ .

With a little logical sleight-of-hand, we can prove another (much more surprising) theorem.

Theorem 1.2.4. The empty set is a subset of every set.

Proof. Let $X$ be a set. We may conclude $\varnothing \subseteq X$ if every element of $\varnothing$ is an element of $X$ . Observe that this statement is equivalent to saying that no elements of $\varnothing$ are not elements of $X$ . Because there are no elements of $\varnothing$ at all, this statement is true; and we are done. $\square$

Perhaps you already agree with the central claim of the proof, that there are no difference between the statements "everything in $\varnothing$ is in $X$ " and "nothing in $\varnothing$ is not in $X$ ". If you do not, take it on faith for now. When we study predicate logic later you will have the tools to convince yourself of the claim.

Such a statement that relies on the emptiness of $\varnothing$ is said to be vacuously true.

When $X \subseteq Y$ , we say $Y$ contains $X$ or that $Y$ is a superset of $X$ . With the idea of containment in hand we can construct a new type of set.

Definition 1.2.5. Let $X$ be a set. Its power set is the set $\mathcal{P}(X)$ whose elements are the subsets of $X$ . That is,
$\mathcal{P}(X) = \{ \, E \, | \, E \subseteq X \, \}.$

Wait - our elements are sets? Yes, and this is perfectly fine! Elements can be all sorts of things, other sets included. When we have a “set of sets” we typically refer to it as a family or a collection to avoid using the same word and over and over.

Example. Let $X = \{a,b,c\}$ . Then

$\mathcal{P}(X) = \{ \varnothing, \{a\}, \{b\}, \{c\}, \{a,b\}, \{a,c\}, \{b,c\}, X\}.$

Notice that the number of sets in $\mathcal{P}(X)$ can be derived from the number of elements in $X$ .

Theorem 1.2.6. If $X$ is finite, then $|\mathcal{P}(X)|=2^{|X|}$ .

We will come back to prove this theorem much later, once we have learned how to count!

1.3 The algebra of sets

You are used to word “algebra” as it refers to a class where you must solve some equations for $x$ . If you think about those classes, they are really about understanding operations and functions of real numbers. Think about it: a successful algebra student will know how to manipulate sums, products, polynomials, roots, and exponential functions – ways of combining and mapping real numbers.

So, “algebra” in general refers to the ways objects may be combined and mapped. In this section, we will see many ways that sets can be combined to produce new sets.

Definition 1.3.1. Let $X$ and $Y$ be sets. Their union is the set $X \cup Y$ of all elements that are in at least one of $X$ or $Y$ ; in set builder notation this is
$X \cup Y = \{\, x \, | \, x \in X \text{ or } x \in Y \}.$

Definition 1.3.2. Let $X$ and $Y$ be sets. Their intersection is the set $X \cup Y$ of all elements that are in both $X$ and $Y$ ; in set builder notation this is
$X \cap Y = \{\, x \, | \, x \in X \text{ and } x \in Y \}.$

A useful way to visualize combinations of sets is with a Venn diagram. The Venn diagrams for union and intersection are shown below.

On the left, two sets X and Y are pictured as overlapping circles. The region within both circles together - the union of the sets - is shaded in blue. On the right, only the region that is part of both circles - their intersection - is shaded in pink.

Figure. On the left, the union of two sets; on the right, the intersection.

Example. Let $X = \{1,2,3,4\}$ and $Y=\{2,4,7,9\}$ . Their union is
$X \cup Y = \{1,2,3,4,7,9\}$ and their intersection is $X \cap Y = \{2,4\}.$ Notice that we do not repeat elements in a set. An element is either in the set or it isn’t. (A generalization, the multiset, will be developed in Chapter 11.) Therefore, $|X \cup Y|$ (recall, the number of elements in the union) won’t just be $|X|+|Y|$ .

Theorem 1.3.3. If $X$ and $Y$ are finite sets, then $|X \cup Y| = |X|+|Y|-|X\cap Y|.$

Definition 1.3.4. Let $X$ and $Y$ be sets. The difference $X-Y$ is the set of all elements in $X$ but not $Y$ ; i.e. $X-Y=\{ \, x \, | \, x \in X \text{ and } x \not\in Y\}.$

Example. Like before, let $X = \{1,2,3,4\}$ and $Y=\{2,4,7,9\}$ . Then $X-Y=\{1,3\}$ . Notice that $Y-X=\{7,9\}$ ; just like with numbers, taking a diference is not commutative. (That is, the order matters.)

Often when working with sets there is a universal or ambient set, sometimes denoted $\Omega$ , that all of our sets are assumed to be subsets of. This set is typically inferred from context.

In calculus, this set might be $\mathbf{R}$ .
If my set is “students taking MATH 174,” my universal set might be the set of all students at Coastal Carolina.
In probability theory, the sets are events (e.g. “the roll is odd”) and $\Omega$ is the sample space (e.g. all possible rolls of the die).

Definition 1.3.5. Let $X$ be a set contained in some universal set $\Omega$ . The complement of $X$ (relative to $\Omega$ ), denoted $\overline{X}$ , is all the elements not in $X$ . In other words, $\overline{X}=\Omega-X$ .

On the left, two sets X and Y are pictured as overlapping circles. The region within X, excluding the portion that is also part of Y, is shaded in pink. This is a picture of the difference X minus Y. On the right, a set X, pictured as a circle, is shown contained in its universal set, pictured as a rectangle. The region between the rectangle and circle is shaded in blue, representing the complement of X.

Figure. On the left, the difference of two sets shaded in pink. On the right, the complement of a set shaded in blue.

Example. Suppose $X=\{a,b,c\}$ is contained in the universal set $\Omega=\{a,b,c,d,e\}$ . Then $\overline{X}=\{d,e\}$ .

Definition 1.3.6. Let $X$ and $Y$ be sets. Their Cartesian product (in this book, just “product”) is the set $X\times Y = \{(x,y)\,|\, x \in X \text{ and } y \in Y\}.$ The elements of the product are called ordered pairs.

As the name implies, the ordering of the entries in an ordered pair matters. Therefore, we will adopt the convention that unordered structures are enclosed in $\{$ curly braces $\}$ and ordered structures are enclosed in $($ parentheses $)$ . The immediate consequence to this observation is that $X \times Y$ is different from $Y \times X$ .

A picture of the product of two sets is shown. One element of the first set and one element of the second combine to make an ordered pair in the product.

Figure. A visualization of the product of two sets.

Examples. You are likely familiar with $\mathbf{R}^2=\mathbf{R}\times\mathbf{R}$ , the Cartesian plane from algebra and calculus. The elements of this (infinite) set are ordered pairs $(x,y)$ where both $x$ and $y$ are real numbers.

To make a discrete example, put $X=\{a,b,c\}$ and $Y=\{1,2\}$ . Then
$X \times Y = \{(a, 1), (a,2), (b,1), (b,2), (c,1), (c,2)\}.$ As an exercise, calculate $Y \times X$ yourself. The pair $(1,a)$ is an element.

Theorem 1.3.7. If $X$ and $Y$ are finite sets, then $|X \times Y| = |X||Y|$ .

Proof. The elements of $X\times Y$ may be tabulated into a rectangle, where each row corresponds to an element of $X$ and each column corresponds to an element of $Y$ . There are $|X|$ rows and $|Y|$ columns and therefore $|X||Y|$ elements in the table. $\square$

1.4 Operations involving more than two sets

Note: While this section is cool, we will not make use of its content very often in this course. You may choose to skip this section for now and return to it after Chapter 8, when you are more comfortable with the material.

There is one last thing to say about sets (for now, anyway). There is no reason to limit ourselves to two sets in union, intersection, and product. If we have a family of $n$ sets $X_1, X_2, \ldots, X_n$ , we may take $n$ -ary unions, intersections, and products.

The $n$ -ary union $\bigcup_{i=1}^n X_i = X_1 \cup X_2 \cup \cdots \cup X_n$ is the set of elements that are in at least one of the sets $X_i$ .

The $n$ -ary intersection $\bigcap_{i=1}^n X_i = X_1 \cap X_2 \cap \cdots \cap X_n$ is the set of elements that are in all of the sets $X_i$ .

The $n$ -ary product $\prod_{i=1}^n X_i = X_1 \times X_2 \times \cdots \times X_n$ is the set of ordered $n$ -tuples $(x_1, x_2, \ldots, x_n)$ , where the element $x_i$ is a member of the set $X_i$ .

Suppose instead the family $X_1, X_2, \ldots$ is infinite. We will, exactly once in this book, need to take an infinite union. The infinite union $\bigcup_{i=1}^\infty X_i = X_1 \cup X_2 \cup \cdots$ is the set of all elements that are in at least one of the $X_i$ . (Notice that the definition hasn’t changed – just the number of sets!) Likewise the infinite intersection $\bigcap_{i=1}^n X_i = X_1 \cap X_2 \cap \cdots$ is the set of elements in all the $X_i$ .

Takeaways

A set is a collection of objects. Given a mathematical object, there is almost definitely a way to define that object as a set.
A given set has many subsets, the collection of which forms its power set.
The cardinality of a finite set is the number of its elements.
There are many ways to combine and manipulate sets: union, intersection, difference, and product.

Chapter 1 problems

2. Propositional logic

Videos:

In mathematics, we spend a lot of time worried about whether a sentence is true or false. This is fine in the case of a simple sentence such as “The number $2$ is prime.” (That sentence is true.) But there are much more complicated sentences lurking around, and whole constellations of sentences whose truth depends on one another. (An unsolved problem in mathematics is deciding whether the Riemann hypothesis, a statement about the prime numbers, is true. Most people believe it is, and many results in the literature are true given the hypothesis. The Riemann hypothesis is going to make or break a lot of research.)

So, we are interested in tools that will allow us to judge whether complicated statements are true. Such models of truth are called logic.

2.1 Statements and connectives

Definition 2.1.1. A statement is a sentence that is either true or false.

Examples. “The number $2$ is prime” and “It is perfectly legal to jaywalk” are statements. Non-statements include questions (“Is it raining?”), commands (“Do your homework”), and opinions (“Mint chocolate chip ice cream is the superior flavor”). It is important to note, not necessarily for mathematical purposes, that opinions are statements of pure taste. Claims whose truth values exist but are unknown are beliefs.

Atomic propositional statements of the type discussed in this chapter are flatly always true or false. You may think of them in the same way you think of constant functions in algebra. They are denoted with lowercase letters $p$ , $q$ , $r$ , etc. If the only statements were atomic, logic would be boring. Fortunately, we can combine statements with connectives. In the next chapter, we will look at predicate statements whose truth values depend on an input. Likewise, these may be thought of as the non-constant functions.

Compound statements arise from combining atomic statements with connectives. They are denoted using Greek letters like $\varphi$ (“phi”), $\psi$ (“psi”), and $\gamma$ (“gamma”). The truth of a compound statement depends on the connectives involved and the truth values of the constituent statements. A tool for determining the truth of a compound statement is called a truth table.

In a truth table for the compound statement $\varphi$ , the left side of the table is the atomic statements involved in $\varphi$ and the right side of the table is $\varphi$ itself. Each row gives a possible combination of truth values for the atomic statements, and the corresponding truth value for $\varphi$ . At this point, it is easier to start introducing connectives and seeing some truth tables.

Definition 2.1.2. Let $p$ be a statement. Its negation is the statement $\neg p$ , the statement that is true exactly when $p$ is false and vice-versa.

The statement $\neg p$ is typically read "not $p$ " or “it is not the case that $p$ is true.” If we have an English rendering of $p$ , we try to write $\neg p$ in a natural-sounding way that also conveys its logical structure.

Example. Suppose $p$ represents the statement “It is raining.” Then, $\neg p$ stands for “It is not raining.”

Below is the truth table for $\neg p$ .

$p$	$\neg p$
$T$	$F$
$F$	$T$

As you can see, on the left are columns corresponding to the atomic statements involved in $\neg p$ (just $p$ ), and each row is a possible combination of truth values for these statements. Here is a quick, useful theorem without proof:

Theorem 2.1.3. If the compound statement $\varphi$ involves $n$ different atomic statements, then its truth table has $2^n$ rows.

Look familiar? We will prove this theorem when we prove the related theorem about power sets later on.

Negation is called a unary connective because it only involves one statement. Our remaining connectives are all binary connectives, as they combine two statements.

Definition 2.1.4. Let $p$ and $q$ be statements. Their conjunction is the statement $p \wedge q$ , which is true only when $p$ and $q$ are both true.

The symbol in the conjunction is called a “wedge,” and $p \wedge q$ is read " $p$ and $q$ ." This definition is meant to be analogous to our English understanding of the word “and.” The statements $p$ and $q$ are called conjuncts.

Example. Letting $p$ be “It is raining” and $q$ be “I brought an umbrella,” the statement $p \wedge q$ is “It is raining and I brought an umbrella.” The statement is true only if it is currently raining and the speaker brought an umbrella. If it is not raining, or if the speaker did not bring their umbrella, the conjunction is false.

Here is the truth table for $p \wedge q$ :

$p$	$q$	$p \wedge q$
$T$	$T$	$T$
$T$	$F$	$F$
$F$	$T$	$F$
$F$	$F$	$F$

Notice that this table is organized intentionally, not haphazardly. The rows are divided into halves, where $p$ is true in one half and then false in the other half. The rows where $p$ is true are further subdivided into half where $q$ is true and half where $q$ is false. If there were a third letter, we would subdivide the rows again, and so on. This method of organizing the truth table allows us to ensure we did not forget a row. Our leftmost atomic letter column will always be half true followed by half false, and the rightmost atomic letter column will always alternate between true and false.

Definition 2.1.5. Let $p$ and $q$ be statements. Their disjunction is the statement $p \vee q$ , which is true if at least one of $p$ or $q$ is true.

The symbol in the conjunction is called a “vee” and $p \vee q$ is read " $p$ or $q$ ." This definition is meant to be analogous to our English understanding of the word “or.” The statements $p$ and $q$ are called disjuncts. (Did you notice that I was able to copy and paste from an above paragraph? Conjunction and disjunction share a very similar structure.)

Example. Letting $p$ be “It is raining” and $q$ be “I brought an umbrella,” the statement $p \vee q$ is “It is raining or I brought an umbrella.” This time, the speaker only needs to meet one of their conditions. They are telling the truth as long as it is raining or they brought an umbrella, or even if both statements are true.

The truth table for $p \vee q$ follows.

$p$	$q$	$p \vee q$
$T$	$T$	$T$
$T$	$F$	$T$
$F$	$T$	$T$
$F$	$F$	$F$

Of great importance to computer science and electrical engineering, but not so much in mathematics (so it will not appear again in this book), is the exclusive disjunction (or “exclusive-or”). It is a modified disjunction where exactly one of the disjuncts must be true.

Definition 2.1.6. If $p$ and $q$ are statements, their exclusive disjunction is the statement $p \oplus q$ , which is true if and only if exactly one of $p$ and $q$ is true.

The statement $p \oplus q$ is read " $p$ exclusive-or $q$ ."

Example. The statement “You can have the soup or the salad” is likely meant as an exclusive disjunction.

In mathematics, disjunctions are generally considered inclusive unless stated otherwise. (An example for us all to follow!)

Our next connective is probably the most important, as it conveys the idea of one statement implying another. As we discussed earlier in the chapter, mathematics is nothing but such statements. It is also the only connective that is not relatively easy to understand given its English interpretation. So we will begin with the truth table, and think carefully about each row.

Definition 2.1.7. If $p$ and $q$ are statements, the conditional statement $p \to q$ is true if $p$ can never be true while $q$ is false.

There are many ways to read $p \to q$ , including “if $p$ , then $q$ ,” " $p$ implies $q$ “, and " $p$ is sufficient for $q$ .” The statement before the arrow is the antecedent and the following statement is the consequent. Here is the conditional’s truth table.

$p$	$q$	$p \to q$
$T$	$T$	$T$
$T$	$F$	$F$
$F$	$T$	$T$
$F$	$F$	$T$

That $p \to q$ is true in the bottom two rows may surprise you. So, let’s consider each row separately.

Example. Let $p$ be “You clean your room” and $q$ be “I will pay you $10,” so that $p\to q$ is “If you clean your room, then I will pay you $10.”

In the case that both $p$ and $q$ are true, the speaker has told the truth. You cleaned, and you were paid. All is well.

In the case that $p$ is true but $q$ is false, the speaker has lied. You cleaned the room, but you did not earn your money.

What if $p$ is false? Well, then the speaker did not lie. The speaker said that if you clean your room you will be paid. If the room isn’t cleaned, then $p \to q$ cannot be falsified, no matter where $q$ is true or false.

If that explanation is not sufficient for you, reread the definition of the conditional: $p \to q$ being true means that $p$ cannot be true without $q$ . If you are unsatisfied by the English connection, then consider $p \to q$ to be a purely formal object whose definition is the above truth table.

Definition 2.1.8. If $p$ and $q$ are statements, the biconditional statement $p \leftrightarrow q$ is true if $p$ and $q$ are both true, or both false.

The statement $p \leftrightarrow q$ may be read as " $p$ if and only if $q$ , " $p$ is necessary and sufficient for $q$ ," or " $p$ is equivalent to $q$ ." Here is the truth table.

$p$	$q$	$p \leftrightarrow q$
$T$	$T$	$T$
$T$	$F$	$F$
$F$	$T$	$F$
$F$	$F$	$T$

Example. Let $p$ be “You clean your room” and $q$ be “I will pay you $10,” so that $p\leftrightarrow q$ is “I will give you $10 if and only if you clean your room.” (The statements were reordered for readability. We will see later in the chapter that this is the same statement, but this wouldn’t work for $p \to q$ .) This time, the clean room and the $10 paycheck are in total correspondence; one doesn’t happen without the other.

2.2 Connecting connectives

We are not required to use one connective at a time, of course. Most statements involve two or more connectives being combined at once. If $\varphi$ and $\psi$ are statements, then so are: $\neg \varphi$ , $\varphi \wedge \psi$ , $\varphi \vee \psi$ , $\varphi \to \psi$ , and $\varphi \leftrightarrow \psi$ . This fact allows us to combine as many connectives and statements as we like.

How should we read the statement $p \wedge q \to r$ ? Should it be read " $p$ ; also, if $q$ then $r$ "? Or, “If $p$ and $q$ , then $r$ ?” Just like with arithmetic, there are conventions defining which connective to apply first: negation, then conjunction/disjunction, then implication, then bi-implication. So, "If $p$ and $q$ , then $r$ " is correct. However, to eliminate confusion in this book we will always use parentheses to make our statements clear. Therefore, we would write this statement as $(p \wedge q) \to r$ .

Remember that a statement involving $n$ letters will have $2^n$ rows in its truth table. The column for the left-most statement letter should be true for half the rows then false for half the rows, and you should continue dividing the rows in half so that the right-most statement letter’s column is alternating between true and false.

Finally, it is a good idea to break a statement into small “chunks” and make a column for each “chunk” separately. Observe the following example.

Example. Write down the truth table for the statement $(p \wedge q) \to (r \to \neg s)$ . When is this statement false?

Since the statement has $4$ letters, there will be $2^4=16$ rows to the truth table. Let’s go through the first row very slowly, where all four atomic statements are true. Then $p \wedge q$ is true and $\neg s$ is false. Since $r$ is true and $\neg s$ is false, that means $r \to \neg s$ is false. Finally, because its antecedent is true and its consequent is false, the compound statement $(p \wedge q) \to (r \to \neg s)$ would be false.

Here is the full truth table. Be sure you understand how each row is calculated.

$p$	$q$	$r$	$s$	$p \wedge q$	$\neg s$	$r \to \neg s$	$(p \wedge q) \to (r \to \neg s)$
$T$	$T$	$T$	$T$	$T$	$F$	$F$	$F$
$T$	$T$	$T$	$F$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$T$	$F$	$T$	$T$
$T$	$T$	$F$	$F$	$T$	$T$	$T$	$T$
$T$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$T$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$T$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$T$	$F$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$T$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$F$	$F$	$F$	$F$	$T$	$T$	$T$

There are tricks to calculate truth tables more efficiently. However, it wouldn’t do you any good to read them; they must come with practice.

2.3 Inference and equivalence

Mathematics is all about relationships. If the statement $\varphi \to \psi$ is always true, then it tells us that whenever $\varphi$ is true then $\psi$ must be as well. If the statement $\varphi \leftrightarrow \psi$ is always true, then it tells us that from the point of view of logic $\varphi$ and $\psi$ are actually the same statement. That is, it is never possible for them to have different truth values.

Definition 2.3.1. A tautology is a statement that is always true, denoted $T$ .

Definition 2.3.2. A contradiction is a statement that is always false, denoted $F$ .

Definition 2.3.3. A rule of inference is a tautological conditional statement. If $\varphi \to \psi$ is a rule of inference, then we say $\varphi$ implies $\psi$ and write $\varphi \Rightarrow \psi$ .

Remember that a conditional statement does not say that its antecedent and consequent are both true. It says that if its antecedent is true, its consequent must also be true.

There are many rules of inference, but we will explore two.

Definition 2.3.4. Modus ponens is the statement that says we may infer $q$ from $p \to q$ and $p$ . Symbolically: $[(p \to q) \wedge p] \to q.$
Example. Consider the conditional statement “If you clean your room, then I will give you $10.” If the speaker is trustworthy, and we clean our room, then we may conclude we will be receiving $10.

Another example: “If a function is differentiable, then it is continuous.” (You may know the meanings of these words, but it is not important.) If this statement is true, and the function $f$ is differentiable, then without any further work we know that $f$ is continuous.

Theorem 2.3.5. Modus ponens is a rule of inference.

Notice that in the definition no claim is made that modus ponens is always true. That is up to us.

Since we know how to use truth tables, we may construct one to show that $[(p \to q) \wedge p] \to q$ is always true. Before reading the table below, try to make your own!

$p$	$q$	$(p \to q)$	$(p \to q) \wedge p$	$[(p \to q) \wedge p] \to q$
$T$	$T$	$T$	$T$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$F$	$T$	$F$	$T$

Proof. The above table shows that $[(p \to q) \wedge p] \to q$ is true no matter the truth of $p$ and $q$ . $\square$

Definition 2.3.6. In the context of conditional statements, transitivity is the rule that if $p$ implies $q$ and $q$ implies $r$ , then $p$ implies $r$ . Or, $[(p \to q) \wedge (q \to r)] \to (p \to r).$

The wording of the definition above should you make you suspicious that this is not the only time we will see transitivity. In fact, you may have seen it before…

Example. If you give a mouse a cookie, then he will ask for a glass of milk. If he asks for a glass of milk, then he will also ask for a straw. Therefore, if you give a mouse of cookie then he will ask for a straw.

Theorem 2.3.7. Transitivity is a rule of inference.

We could simply write down another truth table and call it a day. In fact, go ahead and make sure you can prove transitivity with a truth table. Once you’re done, we’ll try something slicker to prove transitivity.

Proof. Suppose that $p \to q$ and $q \to r$ are true statements, and that moreover $p$ is true. By modus ponens, $q$ must also be true. Then, $r$ is true, by a second application of modus ponens. Therefore, if $p$ is true, $r$ must be also; this is the definition of $p \to r$ . $\square$

Pretty cool, right? We can cite proven theorems to prove a new one.

Does it work the other way? In other words: if I know that $[(p \to q) \wedge (q \to r)] \Rightarrow (p \to r)$ , does that mean $(p \to r) \Rightarrow [(p \to q) \wedge (q \to r)]$ ? Regrettably, no: consider the case there $p$ and $r$ are true but $q$ is false.

Some rules of inference do work both ways. These are called equivalences.

Definition 2.3.8. If $\varphi \leftrightarrow \psi$ is a tautology, then it is an equivalence and $\varphi$ and $\psi$ are equivalent statements. We write $\varphi \equiv \psi$ .

Equivalent statements are not the same statements, but they are close enough from the point of view of logic. Equivalent statements are interchangeable, and may be freely substituted with one another.

We will quickly run through many equivalences before slowing down to prove a few.

Algebraic properties of conjunction, disjunction, and negation

Our first batch of statements should look familiar to you; these are the ways that statements “act like” numbers. As you read each one, try to justify to yourself why it might be true. If you aren’t sure of one, make a truth table.

Equivalence	Name
$p \equiv \neg (\neg p)$	Double negation
$p \wedge q \equiv q \wedge p$	Commutative of conjunction
$p \vee q \equiv q \vee p$	Commutative of disjunction
$(p \wedge q) \wedge r \equiv p \wedge (q \wedge r) \equiv p \wedge q \wedge r$	Associativity of conjunction
$(p \vee q) \vee r \equiv p \vee (q \vee r) \equiv p \vee q \vee r$	Associativity of disjunction
$p \wedge T \equiv p$	Conjunctive identity
$p \vee F \equiv p$	Disjunctive identity
$p \wedge F \equiv F$	Conjunctive absorption
$p \vee T \equiv T$	Disjunctive absorption
$p \wedge \neg p \equiv F$	Conjunctive cancellation
$p \vee \neg p \equiv T$	Disjunctive cancellation

Distributive properties

In our numbers, multiplication distributes over addition. This is one fact that is likely implemented twice in your head, once for a number: $a(b+c) = ab + ac,$ and once for “the negative” (truly the number $-1$ ): $-(b+c) = -b - c.$ However, these rules are one and the same; the distributive law. Likewise, we will have distributive laws for negation, conjunction, and disjunction.

Equivalence	Name
$\neg (p \wedge q) \equiv \neg p \vee \neg q$	DeMorgan’s law
$\neg (p \vee q) \equiv \neg p \wedge \neg q$	DeMorgan’s law
$p \wedge (q \vee r) \equiv (p \wedge q) \vee (p \wedge r)$	Conjunction over disjunction
$p \vee (q \wedge r) \equiv (p \vee q) \wedge (p \vee r)$	Disjunction over conjunction

DeMorgan’s law is one of the most important facts in basic logic. In fact, we will see it three times before we are through. In English, DeMorgan’s law says that the negation of a conjunction is a disjunction and vice-versa.

To show that $\neg (p \wedge q)$ and $\neg p \vee \neg q$ are equivalent, we can show that the biconditional statement $\neg (p \wedge q) \leftrightarrow \neg p \vee \neg q$ is a tautology. Thinking about the definition of the biconditional gives us another way: we can show that the truth tables of $\neg (p \wedge q)$ and $\neg p \vee \neg q$ are the same; in other words, that both statements are always true and false for the same values of $p$ and $q$ .

Proof (of DeMorgan’s law for conjunction). Let $p$ and $q$ be statements. The statements $\neg (p \wedge q)$ and $\neg p \vee \neg q$ are equivalent, as shown by the below truth table.

$p$	$q$	$\neg (p \wedge q)$	$\neg p \vee \neg q$
$T$	$T$	$F$	$F$
$T$	$F$	$T$	$T$
$F$	$T$	$T$	$T$
$F$	$F$	$T$	$T$

Therefore, $\neg (p \wedge q) \equiv \neg p \vee \neg q$ . $\square$

You’ll notice a few columns are skipped in the above truth table. After a few tables, it is more convenient to skip “easy” columns. For example, we know that $p \wedge q$ is true only when $p$ and $q$ are both true, and false everywhere else. Therefore, we can “flip” that column to quickly get the column for $\neg (p \wedge q)$ .

As an exercise, prove DeMorgan’s other law.

Properties of the conditional

Finally, we come to the properties of conditional (and biconditional) statements. These do not have “obvious” numerical counterparts, but are useful all the same.

Equivalence	Name
$p \to q \equiv \neg p \vee q$	Material implication
$\neg (p \to q) \equiv p \wedge \neg q$	False conditional
$p \to q \equiv \neg q \to \neg p$	Contraposition
$p \leftrightarrow q \equiv (p \to q) \wedge (q \to p)$	Mutual implication

Material implication is the surprising result that the conditional is just a jumped-up disjunction! In fact, DeMorgan’s law, material implication, and mutual implication together prove that all of the connectives we’ve studied can be implemented with just two: negation and one other. However, this would make things difficult to read, so we allow the redundancy of five connectives.

Let’s prove material implication with a truth table, and then we will prove contraposition with another method.

Proof (of material implication). Let $p$ and $q$ be statements and observe that the truth tables for $p \to q$ and $\neg p \vee q$ are the same:

$p$	$q$	$p \to q$	$\neg p \vee q$
$T$	$T$	$T$	$T$
$T$	$F$	$F$	$F$
$F$	$T$	$T$	$T$
$F$	$F$	$T$	$T$

Therefore, $p \to q \equiv \neg p \vee q$ . $\square$

There is another way to think about material implication. The statement $p \to q$ means that $p$ never happens without $q$ . Another way to say this is that $q$ happens, or else $p$ did not; so, $\neg p \vee q$ .

Before we prove contraposition by the other method, we must justify said method with a theorem.

Theorem 2.3.9. Statements $\varphi$ and $\psi$ are equivalent if and only if $\varphi \Rightarrow \psi$ and $\psi \Rightarrow \varphi$ are both rules of inference.

The phrase “if and only if” in a mathematical statement means that the statement “works both ways.” In this case, $\varphi \equiv \psi$ means $\varphi \Rightarrow \psi$ and $\psi \Rightarrow \varphi$ are both rules of inference; and conversely, $\varphi \Rightarrow \psi$ and $\psi \Rightarrow \varphi$ both being rules of inference means $\varphi \equiv \psi$ . It means that “ $\varphi \equiv \psi$ ” and “” $\varphi \Rightarrow \psi$ and $\psi \Rightarrow \varphi$ are both rules of inference" are equivalent statements. Fortunately, we already have a rule that justifies this equivalence.

Proof (of Theorem 2.3.9). Mutual implication says that $p \leftrightarrow q \equiv (p \to q) \wedge (q \to p)$ . $\square$

It’s that easy. Whenever we have a conditional statement and its converse (the order reversed), we have the biconditional; likewise, we may trade in a biconditional for two converse conditional statements.

Now we will prove contraposition by showing that $p\to q$ implies $\neg q \to \neg p$ , and vice-versa. Before you read the below proof, write down these rules: material implication, double negation, and commutativity of disjunction.

Proof (of contraposition). Suppose $p$ and $q$ are statements and that $p \to q$ is true. By material implication, $\neg p \vee q$ is true. Commutativity says this is the same as $q \vee \neg p$ . So, we may apply material implication backwards, negating $q$ this time, to get $\neg q \to \neg p$ ; therefore, $p \to q$ implies $\neg q \to \neg p$ .

Conversely, suppose $\neg q \to \neg p$ is true. Then material implication (and double negation) gives $q \vee \neg p$ . We may commute the disjunction and apply material implication again, negating $\neg p$ , to get $p \to q$ .

By mutual implication, $p \to q \equiv \neg q \to \neg p$ . $\square$

Takeaways

Statements may be modified and combined in (mainly) five ways: negation, conjunction, disjunction, implication (the conditional), and bi-implication (the biconditional).
A truth table allows us to see when a compound statement will be true or false.
If one statement always implies another, that implication is a rule of inference.
If two statements are always true or always false together, they are equivalent.
Rules of inference and equivalences may be justified with truth tables or algebraically.

Chapter 2 problems

3. Predicate logic

Videos:

3.1 Predicates

Basic propositional logic as you learned it in the preceding chapter is powerful enough to give us the tools we will need to prove many theorems. However, it is deficient in one key area. Consider the following example.

Example. If Hypatia is a woman and all women are mortal, then Hypatia is mortal.

We can all agree that this statement is a tautology; that is, if it is true that Hypatia is a woman (she was; she was a Greek mathematician contemporary with Diophantus) and if it is true that all women are mortal (as far as we know this is the case) then it must follow that Hypatia is mortal (she was murdered in 415).

So the statement is a tautology. If so, the truth table would confirm it. Let $(p \wedge q) \to r$ symbolize the statement, where $r$ stands for the statement that Hypatia is mortal.

$p$	$q$	$r$	$p\wedge q$	$(p \wedge q)\to r$
$T$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$F$
$T$	$F$	$T$	$F$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$T$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$T$
$F$	$F$	$F$	$F$	$T$

Oh no! Our second row gives that the statement is false when $r$ is false but $p$ and $q$ are true. What gives?

Well, consider the statement “If pigs fly and the Pope turns into a bear, then Bob will earn an A on the exam.” Now it is not so clear that this is a tautology, because pigs, the Pope, and Bob’s grade presumably have nothing to do with one another.

But Hypatia and the mortality of women do. What we will need is the ability to signify that two statements are related, whether in subject (e.g. Hypatia) or predicate (e.g. mortality). This is where predicate logic comes in to help.

Definition 3.1.1. A predicate is a function $P:\Omega \to \{T,F\}$ that takes one or more subjects (members of some domain $\Omega$ ) and assigns to them all either true $T$ or false $F$ .

Predicates are typically denoted with capital letters like $P$ , $Q$ , and $R$ . Subjects that are known are given lowercase letters like $a$ , $b$ , and $c$ , unless there is something more reasonable to pick like $h$ for Hypatia. Subjects whose values are unknown or arbitrary are denoted with lowercase $x$ , $y$ , $z$ , etc.

The definition of predicate above is much more complicated than it needs to be; this is so that we can take the central idea (a function) and continue using it throughout the book. You know a function: an assigment $f(x)$ that gives every input $x$ exactly one output $f(x)$ . We write $f:X \to Y$ to signify that $f$ is a function whose inputs are elements of the set $X$ and whose outputs are elements of the set $Y$ .

So, in the above definition, the predicate $P$ is a function that takes a subject, a member of the set $\Omega$ , and assigns it exactly one truth value.

(Take note that propositions, which you studied in the last chapter, may be regarded as constant predicates – functions whose value is the same regardless of the input, like $f(x)=3$ .)

Examples. Letting $W(x)$ be the predicate " $x$ is a woman" on the domain of humans and $h$ stand for Hypatia, $W(h)$ is the statement “Hypatia is a woman.” Since this statement is true, the function $W$ assigns $T$ to $h$ .

Let $P(x)$ be the predicate “ $x + 3 = 5$ ” on the domain $\mathbf{Z}$ and let $Q(x,y)$ be the predicate " $x$ orbits $y$ " on the domain of celestial objects in the sky. The statement $P(2)$ is true, but $P(3)$ is false. If $e$ stands for the earth and $s$ for the sun, then $Q(s,e)$ is false but $Q(e,s)$ is true.

A predicate, once evaluated to a subject, is a statement. Therefore, these statements may be combined into compound statements using the logical connectives we have just studied.

Examples. As before let $P(x)$ be the predicate “ $x + 3 = 5$ ” on the domain $\mathbf{Z}$ and let $Q(x,y)$ be the predicate " $x$ orbits $y$ " on the domain of celestial objects in the sky, where $e$ stands for the earth and $s$ for the sun.

The statements $P(2) \vee Q(s,e)$ , $\neg P(3)$ , and $Q(s,e) \to P(2)$ are true.

The statements $P(2) \wedge Q(s,e)$ and $P(2) \to Q(s,e)$ ae false.

3.2 Quantifiers

This is all well and good, you think, but what about the statement “All women are mortal”? How can we make “all women” the subject of a statement?

Quantifiers allow us to say that a predicate is true for all, or for at least one undetermined, object in its domain.

Definition 3.2.1 Let $P(x)$ be a predicate. The universal statement $\forall x P(x)$ says that $P(x)$ is true for all $x$ in its domain. The symbol $\forall$ (“for all”) is called the universal quantifier.

Example. The domain is very important. If $P(x)$ is the predicate " $x$ takes discrete math" then $\forall x P(x)$ – “Everyone takes discrete math” – is true on the domain of computer science students at Coastal Carolina, but not, say, on the domain of all people in South Carolina.

Definition 3.2.2 Let $P(x)$ be a predicate. The existential statement $\exists x P(x)$ says that $P(x)$ is true for at least one $x$ in its domain. The symbol $\exists$ (“there exists”) is called the existential quantifier.

Example. Again let $P(x)$ be the predicate " $x$ takes discrete math". The existential statement $\exists x P(x)$ says “Someone takes discrete math.” This is now true for people in South Carolina – pick a student at Coastal, or Clemson, or USC – but not true on the domain of dogs. Wouldn’t that be cute, though?

The statement operated on by a quantifier is called its scope. Without parentheses, a quantifier is considered to apply only to the statement it is next to and no others.

Examples. Let $P(x)$ be the predicate " $x$ takes discrete math" and $Q(x)$ be the predicate " $x$ is a math major".

The statement $\exists x (P(x) \wedge Q(x))$ means “There is someone who is taking discrete math and is a math major.” Both $P$ and $Q$ apply to the same $x$ .
The statement $\exists x P(x) \wedge \exists x Q(x)$ says “Someone is taking discrete math and someone is a math major.” The subjects of $P$ and $Q$ are possibly different.
The expression $\exists x P(x) \wedge Q(x)$ is not a statement at all, because $Q(x)$ is not a statement. It needs either a quantifier, or for $x$ to refer to a specific, known member of the domain.

There are two important quantified statements: “All $P$ 's are $Q$ 's” and "Some $P$ is a $Q$ ". These statements are all over the place in mathematics. We like to characterize mathematical objects by saying all objects of one type also belong to another; or, by saying we can find an object of one type that is another type.

In “All $P$ 's are $Q$ 's”, the word “all” suggests that the universal quantifier is appropriate. What connective should be used to combine the predicates? Observe that the statement $\forall x (P(x) \wedge Q(x))$ says that every $x$ is both a $P$ and a $Q$ . This isn’t what we mean to say; we mean to say if something is a $P$ then it will also be a $Q$ . So, the correct rendering of “All $P$ 's are $Q$ 's” is $\forall x (P(x) \to Q(x))$ .

For the statement “Some $P$ 's are $Q$ 's”, we mean to say that there is an object that is at once a $P$ and a $Q$ . Therefore, this time the conjunction is appropriate: we write $\exists x [P(x) \wedge Q(x)]$ .

Statement	Implementation
“All $P$ 's are $Q$ 's”	$\forall x (P(x) \to Q(x))$
“Some $P$ 's are $Q$ 's”	$\exists x (P(x) \wedge Q(x))$

Finally, what is to be done with statements with two or more subjects? The ordering of the quantifiers matters. Here are the four possible combinations of universal and existential quantifiers for two subjects, as well as a diagram corresponding to each.

Examples. Let $R(x,y)$ be the predicate " $x$ and $y$ are classmates" over the domain of students at some school. Note that, in this case, the order of the subjects does not affect the truth of the statement.

The statement $\exists xy R(x,y)$ means “There are two students who are classmates.” In other words, there is at least one student who is paired with at least one other student. Some writers say $\exists x \exists y R(x,y)$ instead, but when the quantifiers are the same we will “collapse” the notation.

On the left, one object is paired with one object (existential-existential). On the right, one object is paired with many objects (existential-universal).

Figure. On the left, one object is paired with one object (existential-existential). On the right, one object is paired with many objects (existential-universal).

The statement $\exists x \forall y R(x,y)$ means “There is a student who is classmates with every student.” We have one student who (somehow!) is classmates with the entire student body.

The statement $\forall x \exists y R(x,y)$ means “Every student has a classmate.” Notice that their classmate need not be the same person. If we quantify universally first, then each existential object may be different. In the preceding example, we quantified existentially first, so that one person was classmates with everybody.

On the left, many objects are each paired with an object (universal-existential). On the right, many objects are paired with many objects (universal-universal).

Figure. On the left, many objects are each paired with an object (universal-existential). On the right, many objects are paired with many objects (universal-universal).

The statement $\forall xy R(x,y)$ means “Every student is classmates with every other student.” Every object in our domain is paired up with every other object in the domain, including themselves! (Don’t think too hard about that for this example.)

3.3 Inferences and equivalences for quantifiers

Quantifiers evaluated on a predicate are statements, since they are functions to $\{T,F\}$ . Therefore, some quantified statements imply others, and some pairs imply each other. In this section we will learn a few important rules for quantifiers, before finally proving that our opening statement about Hypatia is indeed a tautology.

Theorem 3.3.1 (DeMorgan’s laws) Let $P(x)$ be a predicate. Then $\neg \forall x P(x) \equiv \exists x \neg P(x)$ and $\neg \exists x P(x) \equiv \forall x \neg P(x)$ .

Does that name look familiar? Here is our second application of DeMorgan’s laws. (The third will appear in the exercises.) You will see how they are used in the proof.

Proof. Consider the universal statement $\forall x P(x)$ on the domain $\Omega = \{x_1, x_2, \ldots x_n\}$ . We may rewrite $\forall x P(x) = P(x_1) \wedge P(x_2) \wedge \cdots \wedge P(x_n)$
and $\exists x P(x) = P(x_1) \vee P(x_2) \vee \cdots \vee P(x_n).$
Therefore, $\neg \forall x P(x) = \neg (P(x_1) \wedge P(x_2) \wedge \cdots \wedge P(x_n)).$
Using DeMorgan’s law, $\neg \forall x P(x) = \neg P(x_1) \vee \neg P(x_2) \vee \cdots \vee \neg P(x_n)),$
which is the same as the statement $\exists x \neg P(x)$ .
That $\neg \exists x P(x) \equiv \forall x \neg P(x)$ may be proven in the same way. $\square$

The idea is that a universal statement is a “big conjunction” and an existential statement is a “big disjunction.” The proof follows from the DeMorgan’s laws you learned in the preceding chapter .

The next two theorems deal with how universal and existential statements may imply one another.

Theorem 3.3.2 (Universal instantiation) If $a$ is a member of the domain of the predicate $P$ , then $\forall x P(x) \Rightarrow P(a)$ .

Proof. If $P$ is true for every member of its domain and $a$ is a member of said domain, $P$ must be true for $a$ . $\square$

Theorem 3.3.3 (Existential generalization) If $a$ is a member of the domain of the predicate $P$ , then $P(a) \Rightarrow \exists x P(x)$ .

Proof. Let $a$ be in the domain of $P$ . If $P(a)$ is true, then $P$ is true for some member of its domain. $\square$

There are two comments to make about these two theorems. First, universal generalization and existential instantiation are not valid:

Examples. “I was able to easily pay off my college loans, therefore everyone can” is an example of invalid universal generalization.

“Someone in this room killed the wealthy magnate, so it must have been you!” is an example of invalid existential instantiation.

Finally, transitivity gives us that a universal statement implies an existential one: $\forall x P(x) \Rightarrow P(a) \Rightarrow \exists x P(x)$ .

We are ready to prove our tautology is, in fact, a tautology. Remember that the statement was “If Hypatia is a woman and all women are mortal, then Hypatia is mortal.”

We have seen that “Hypatia is a woman” may be rendered as $W(h)$ for the predicate $W(x)$ being " $x$ is a woman" and $h$ being Hypatia. Likewise, “Hypatia is mortal” could be rendered $M(h)$ where $M(x)$ is " $x$ is mortal". Finally, the statement “All women are mortal” would be $\forall x (W(x) \to M(x))$ .

Theorem 3.3.4 The statement $[P(a) \wedge \forall x (P(x) \to Q(x))] \to Q(a)$ is a tautology.

We have changed our labeling to the more standard $P$ , $Q$ , and $a$ to emphasize that this statement is always true no matter what we are talking about.

Proof. Suppose that for some predicate $P$ and some $a$ in the domain of $P$ that $P(a)$ is true, and that for some predicate $Q$ the statement $\forall x (P(x) \to Q(x))$ is true.

By universal instantiation, $\forall x (P(x) \to Q(x))$ means that $P(a) \to Q(a)$ is true.

By modus ponens on $P(a)$ and $P(a) \to Q(a)$ , we know $Q(a)$ must be true.

Therefore, $[P(a) \wedge \forall x (P(x) \to Q(x))] \to Q(a)$ is always true. $\square$

Takeaways

If we need to communicate that two statements talk about the same thing, propositional logic won’t cut it: we need predicates.
A predicate is a function that takes an input and returns true or false. The predicate itself isn’t a statement, but it is once it’s evaluated on a subject.
A quantifier yields a statement that a predicate is true for some, or all, objects in its domain.
"Every – is a – " is a universal conditional statement. "Some – is a – " is an existential conjunction.
A universal statement implies a specific one, which implies an existential statement.
The negation of a universal statement is an existential statement and vice-versa.

Chapter 3 problems

Proofs

Review:

Chapter 2: Inferences and equivalences for conditional and biconditional statements
Chapter 3: Inferences and equivalences for quantifiers

Videos:

Now we have things to talk about – sets and their elements – and ways of talking about those things – propositional and predicate logic. This opens the door to the true game of mathematics: proof.

Mathematics stands apart from the natural sciences because its facts must be proven. Whereas theories in the natural sciences can only be supported by evidence, mathematical facts can be shown to be indisputably true based on the agreed definitions and axioms.

For example, once you have agreed upon the definitions of “integer,” “even,” and the rules of arithmetic, the statement “An even integer plus an even integer is an even integer” is without doubt. In fact, you will prove this statement yourself in the exercises to this chapter.

Now, this introduction is not meant to imply that mathematical epistemology (the study of knowledge) is without its issues. For example: What definitions and axioms are good ones? (“Is there such a thing as a real number?” can entertain you for a while.) What sorts of people decide which mathematical research is worth doing, and who decides which proofs are convincing? Are all published mathematical proofs correct? (They aren’t.) What is the role of computers in mathematical proof?

Fortunately for us, we don’t have to worry about these questions. (Unless we want to. They’re quite interesting.) In this chapter we will simply learn what a mathematical proof is, learn some basic facts about the integers, and practice writing direct and indirect proofs about mathematical facts that are not particularly in dispute.

4.1 Direct proofs

What is a proof? In this book we will adopt Cathy O’Neil’s position that a proof is a social action between speaker/writer and listener/reader that convinces the second party that a statement is true.

Several proofs have been written in this book so far. Many begin with an italicized Proof. and end in a little $\square$ , as a way to set them off from definitions, theorems, comments, and examples. Hopefully they have all been convincing to you. (DW: If not, my e-mail address is somewhere on this webpage.)

As you practice writing your own proofs, take for an audience someone else in your position: a new student of discrete mathematics. Imagine the reader knows slightly less than you do. A good proof gently guides the reader along, making them feel smart when they anticipate a move.

It is also this book’s position that talking about proofs is nearly useless. They must be read and practiced. With that in mind, let’s get to writing some proofs about integers.

We will take for granted that sums, differences, products, and powers of integers are still integers. (Depending on how the set of integers is defined, these qualities may be “baked in” to the definition.) Every other fact will either be stated as a definition or a theorem, and proven in the latter case.

Definition 4.1.1. An integer $n$ is even if there exists an integer $k$ such that $n=2k$ .

Definition 4.1.2. An integer $n$ is odd if there exists an integer $k$ such that $n=2k+1$ .

For examples, $28=2(14)$ , and $39=2(19)+1$ .

Theorem 4.1.3. If an integer $n$ is odd, so is $n^2$ .

This statement may be obvious to you. If so, great; but we are trying to learn about the writing, so it is best to keep the mathematics simple. (It will only get a little more challenging by the end of the chapter.) To prove this statement, we must start with an arbitrary odd integer and prove that its square is odd. The integer in question must be arbitrary: if we prove the statement for 3, then we must justify 5, 7, 9, and so on.

Proof. Let $n$ be an odd integer.

The first step in the proof is to assume any of the theorem’s hypotheses. The theorem says that if we have an odd integer, its square we will be odd; so, we are entitled to assume an odd integer. However, there is nothing more we may assume. Fortunately, we know what it means for an integer to be odd.

That means there exists an integer $k$ such that $n=2k+1$ .

A great second step is to “unpack” any definitions involved in the assumptions made. We assumed $n$ was odd. Well, what does that mean? However, now we are “out of road.” Let’s turn our attention to the object we mean to say something about; $n^2$ .

Therefore, $n^2 = (2k+1)^2 = 4k^2 + 4k + 1 = 2(2k^2+2k) + 1,$ which is an odd number by definition.

Remember: sums, powers and products of integers are integers – this is the only thing we take for granted, other than, y’know, arithmetic – and so $2k^2 + 2k$ is an integer, call it $m$ . Then $2m+1$ is odd. The final step – obviously rendering the result in the explicit form of an odd number – is a good one to take because we assume our reader to be as new at this as we are. They likely recognize $4k^2+4k+1$ is odd, but we take the additional step to ensure maximum clarity.

Thus, if $n$ is an odd integer so is its square. $\square$

Definition 4.1.4. Let $a$ be a nonzero integer and let $n$ be an integer. We say $a$ divides $n$ or that $n$ is divisible by $a$ if there exists an integer $k$ such that $n=ak$ . We write $a|n$ when this is the case.

Theorem 4.1.5. Divisibility has the following three properties:

For all integers $n$ , $n|n$ . (reflexiveness)
For all positive integers $m$ and $n$ , if $m|n$ and $n|m$ then $m=n$ . (antisymmetry)
For all integers $\ell$ , $m$ , and $n$ , if $\ell|m$ and $m|n$ then $\ell|n$ . (transitivity)

(These three properties together define something called a partial order, which is studied in the last chapter. Foreshadowing!)

You will prove two of these properties for yourself in the exercises. Here is a demonstration of the proof of the second property.

Proof. Suppose that $m$ and $n$ are positive integers where $m|n$ and $n|m$ .

As before, we are assuming only what is assumed in the statement of the theorem: that $m$ and $n$ are positive divide one another. By the way: why did we have to say they are positive?

That means that there exist integers $k_1$ and $k_2$ such that $n=k_1m$ and $m=k_2n$ .

The second step is usually to apply relevant definitions.

Combining these two statements, we have $m=(k_1k_2)m$ . Since $m$ is not zero, we see $1=k_1k_2$ . Because $k_1$ and $k_2$ must be integers by definition, the only way that $k_1k_2=1$ is for $k_1=k_2=1$ .

If we didn’t have the restriction that $k_1$ and $k_2$ were integers, then $k_1=1/k_2$ . However, this is only possible for integers when both numbers are one.

Therefore, if $m$ and $n$ are positive integers that divide one another, then $m=n$ . $\square$

In the next section we will explore indirect proofs and, from there, bidirectional proofs. You are highly recommended to try some exercises before moving on.

4.2 Indirect and bidirectional proofs

Consider the following theorem.

Theorem 4.2.1 If $n$ is an integer such that $n^2$ is odd, then $n$ is odd.

This theorem may seem the same as Theorem 4.1.3, but note carefully that it is not: it is the converse $Q(n) \to P(n)$ to the previous theorem’s $P(n) \to Q(n)$ .

Let’s try to imagine a direct proof of this theorem. Since $n^2$ is odd, there exists an integer $k$ such that $n^2 = 2k+1$ . And then, seeking to prove a fact about $n$ , we take the positive square root of each side: $n = \sqrt{2k+1}$ . And…ew. Have we even proven anything about square roots? (We haven’t, and we won’t.)

We shall, instead, employ some sleight of hand. Recall from Chapter 2 that a conditional statement $p \to q$ is equivalent to its contrapositive $\neg q \to \neg p$ .

Proof (of Theorem 4.2.1). Suppose that $n$ is even. Then there exists an integer $k$ such that $n=2k$ . In that case, $n^2 = (2k)^2 = 4k^2 = 2(2k^2),$ which is even. Therefore for $n^2$ to be odd, $n$ must also be odd. $\square$

Another equivalence from Chapter 2 was that $p \to q$ and $q \to p$ combine to create $p \leftrightarrow q$ . That is, we can prove an “if and only if” statement by proving both the “forward” and “backward” directions (which is which is a matter of perception). Therefore, in this chapter we have proven the following statement:

Theorem 4.2.2 An integer $n$ is odd if and only if $n^2$ is.

Remember that an “if and only if” statement is true if the component statements are both true or both false. So, this theorem also tells us that an integer is even if and only if its square is.

Theorem 4.2.3 Let $n$ be an integer. The integer $n$ is odd if and only if $n^2+4n+1$ is even.

Proof. Let $n$ be an integer. First, suppose $n$ is odd. That means that there exists an integer $k$ where $n=2k+1$ . So, $n^2 + 4n +1 = (2k+1)^2 + 4(2k+1) + 1 = 4k^2 + 4k + 1 + 8k + 4 + 1,$ which may be written as $2(2k^2 + 6k + 3),$ which is even. Therefore if $n$ is odd then $n^2+4n+1$ is even.

Conversely, suppose $n$ is even. That means that $n=2k$ for some integer $k$ . In that case, $n^2 + 4n + 1 = (2k)^2 + 4(2k) + 1,$ which is $2(2k^2+4k)+1,$ which is odd. Therefore for $n^2+4n+1$ to be even, $n$ must be odd.

So, $n$ is odd if and only if $n^2+4n+1$ is even. $\square$

4.3 Counterexamples

When proving a universal statement on a domain, it never suffices to just do examples (unless you do every example, which is usually not feasible or possible). However, remember from Chapter 3 that the negation of a universal statement is an existential one. You can prove that "every $P$ is a $Q$ " is false by finding just one $P$ that isn’t a $Q$ .

Examples. The statement “all prime numbers are odd” is false because $2$ is prime and even.

The statement “all odd numbers are prime” is false because $9$ is odd and not prime.

Takeaways

A proof is a social action between speaker and listener. Keep your audience in mind as you write - try to make them feel smart! If you are taking a class, your proofs should be written for your classmates.
If you are trying to prove that a statement is true for every object of a particular type, it is never appropriate to just do a couple of examples.
In a direct proof of $p \to q$ , you should start by assuming any hypotheses and unpacking the relevant definitions before arriving at the conclusion via valid reasoning.
Sometimes it is hard to prove $p \to q$ directly, in which case maybe you should try proving the contrapositive $\neg q \to \neg p$ .
You can prove $p \leftrightarrow q$ by proving $p \to q$ and $q \to p$ .
You have to practice proof writing a lot. (DW: I wasn’t a good proof writer until the end of graduate school, and one may argue I am still not a good proof writer.)

Chapter 4 problems

5. Recursion and integer representation

Videos:

One of the most important ideas in discrete mathematics is that of recursion, which we will explore in one way or another for many of the remaining chapters in the course. In this chapter, we will explore it through the idea of integer representation; representing integers in bases other than the common base 10. Bases you are likely to encounter in computer science contexts are base 2 (binary), base 8 (octal), and base 16 (hexadecimal). We will learn how to represent an integer in binary using a recursive algorithm, and then we will cheat our way into octal and hexadecimal.

5.1 Recursion

Recursion is when an algorithm, object, or calculation is constructed recursively.

Just kidding. Reread this joke after the chapter and it’ll make more sense.

Recursion means that an algorithm, object, or calculation is written, defined, or performed by making reference to itself. It is something more easily understood by example, and there will be plenty as we proceed. For now, consider the following example.

Definition 5.1.1 Let $n$ be a positive integer. Then $n!$ (" $n$ factorial") is the product of all positive integers at most $n$ : $n! = 1 \times 2 \times \cdots \times n = n!$ Furthermore, we define $0!=1$ .

Anyone questioning the usefulness of $n!$ will have their questions answered in excruciating detail over the course of Chapters 9-13. If the reader has had calculus, then they may recognize $n!$ as the denominator of the terms in the Taylor expansion of the function $e^x$ ; it is good for “absorbing” constants created by taking derivatives. However, we are not interested in any of its uses right now.

Example. Here are the values of a few factorials: $0!=1$ , $1!=1$ , $2!=2\times 1 = 2$ , $3!=3\times 2 \times 1 = 6$ , $4!=4 \times 3 \times 2 \times 1 = 24$ , and $5!=5\times 4\times 3 \times 2 \times 1=120$ .

You may have already noticed the recursion at work. Take for instance $5!=5 \times 4 \times 3 \times 2 \times 1$ . We see the value of $4!$ lurking here: $5!=5 \times 4!$ .

In fact, for all positive $n$ , we can make the alternative (usually more useful) definition $n!=n(n-1)!$ . This is the recursive defintion of the factorial, because we are defining the factorial in terms of the factorial.

Why is this okay? Well, we’re really collapsing a lot of work into two lines: it may make more sense to break it down one step at a time. First, define $0!=1$ . This definition will make sense in a later chapter, but remember that mathematical definitions don’t necessarily need to “make sense.”

Next, define $1!=1 \times 0!$ . This definition is valid, because the number $1$ , the operation $\times$ , and the number $0!$ have all been defined. And when we go to define $2!=2 \times 1!$ , we have already defined all of these objects as well. And so on. Therefore, the following definition is a valid one.

Definition 5.1.2 Let $n$ be a natural number. Then $n!$ is equal to 1 when $n=0$ , and $n(n-1)!$ otherwise.

5.2 Binary numbers

Now that we know how recursion works (or at least have more of an inkling than we did upon starting the chapter), we will see another example of it in the calculation of the binary representation of a number.

When we speak of the number one thousand five hundred eight or $1,508$ , we are implicitly talking about the composition of the number in terms of powers of 10: $1,508 = 1\times 10^3 + 5 \times 10^2 + 0 \times 10^1 + 8 \times 10^0,$ remembering that $b^0=1$ for all nonzero $b$ .

African and Asian mathematicians were writing in base-10, or decimal notation, as early as 3,000 BCE. Presumably the prevalence of decimal notation throughout human history is due to the fact that most of us have ten fingers and ten toes, making the base convenient. However, computers only have two fingers and two toes.

That last line is a joke, but here’s the idea: Computers read information by way of electrical signals, which are either on or off. So the simplest way for a computer to process information is in terms of two states. Here is a higher-level example: suppose you are writing some code. Is it easier to have a single if-then statement (two possible states), or nine (ten possible states)? This is why base-2, or binary numbers, are so prevalent in computer science.

The first step to becoming adept with binary numbers is to know many powers of $2$ . Fortunately, powers of $2$ are easy to calculate: simply multiply the last value by $2$ to get the next one. (This is a recursive definition for $2^n$ !)

$n$	$0$	$1$	$2$	$3$	$4$	$5$	$6$	$7$	$8$	$9$
$2^n$	$1$	$2$	$4$	$8$	$16$	$32$	$64$	$128$	$256$	$512$

Algorithm 5.2.1 Let $n$ be a natural number. To write the $d$ -digit binary expansion $(a_{d-1}a_{d-2}\ldots a_1a_0)_2$ of $n$ :

If $n=0$ , its binary expansion is $(0)_2$ .
If $n=1$ , its binary expansion is $(1)_2$ .
If $n \ge 1$ , find the highest number $d$ such that $2^{d-1} \le n$ . This is the number of binary digits of $n$ . Then, the binary expansion of $n$ is $(1a_{d-2}\ldots a_1a_0)_2$ . The remaining digits $a_{d-2}$ , $a_{d-3}$ , $\ldots$ , $a_0$ , are the digits of binary expansion of $n-2^{d-1}$ , or $0$ if the binary expansion of $n-2^{d-1}$ does not contain that digit.

This is perhaps a little intimidating at first glance, so let’s press on with an example.

Example. Let’s calculate the binary expansion of $n=1,508$ . Our target $n$ is neither $0$ nor $1$ , so we must find the highest power of $2$ that is no more than $n$ . This number is $2^{10}=1,024$ . Therefore, our binary number will have eleven digits, and right now it looks like $(1a_9a_8\ldots a_1a_0)_2.$

To find the values of the remaining digits $a_9$ , $a_8$ , etc., let’s find the binary expansion of the remainder $1,508 - 1,024 = 484$ .

Here’s the recursion, by the way: to find the binary expansion of $n$ , we must find the binary expansion of a smaller number.

We repeat the process. The highest power of $2$ that divides $484$ is $2^8=256$ . So, the binary expansion of $484$ is the nine-digit number $(1a_7a_6\ldots a_1a_0)_2.$ Be aware that it is okay to reuse the names $a_i$ for the unknown digits, because the algorithm says they are the same for both numbers.

Next, we find the second remainder $484-256=228.$ The highest power of 2 dividing 228 is $2^7=128.$ Continue the example and see for yourself that the binary expansion of $228$ is $(11100100)_2$ . (Your remainders after $228$ will be $100$ , $36$ , $4$ , and $0$ .)

Now, the binary expansion of $484$ is $(111100100)_2$ and the binary expansion of $1,508$ is $(1a_9111100100)_2.$

The algorithm tells us what to do with our missing $a_9$ : since it does not appear in the expansions of any of our remainders, it is 0. Therefore, $1,508 = (10111100100)_2.$

This examples was made much more tedious than it needed to be so that the recursive steps were laid bare. In practice, you will simply continue subtracting off powers of $2$ until none of the target number remains, writing a $1$ if you subtracted a particular power of $2$ and a $0$ if you didn’t. Since we had to subtract $1,024$ , $256$ , $128$ , $64$ , $32$ , and $4$ , we write a $1$ for the eleventh, ninth, eighth, seventh, sixth, and third digits, and a $0$ everywhere else.

You have no doubt noticed that our binary numbers are all enclosed in the notation $()_2$ . This is to signify that the $(101)_2$ , for example, is referring to the number five and not the number one hundred one.

5.3 Octal and hexadecimal numbers

The word octal refers to the base-8 system and hexadecimal refers to the base-16 system. These two bases are useful abbreviations of binary numbers: octal numbers are used to modify permissions on Unix systems, and hexadecimal numbers are used to encode colors in graphic design.

We say “abbreviation” because of the following algorithm.

Algorithm 5.3.1 Suppose $n$ is an integer with a base $b$ representation and we desire to know its base $b^k$ representation, for some integer $k > 1$ . Then we may group the digits of $n$ in base $b$ into groups of $k$ and rewrite each group as a digit in base $b^k$ . The resulting string is the binary expansion of $n$ in base $b^k$ .

The idea is that it is easy to go from some base to a base that is a power of the original base. Consider the following example.

Example. Write the number $236$ in octal and in hexadecimal.

The starting point will be to write $236$ in binary. Since $236=128+64+32+8+4,$ its binary expansion is $(11101100)_2$ .

To write $236$ in octal, we will regroup the binary digits of $236$ by threes because $8=2^3$ . However, there are eight digits. No worries; we will simply add a meaningless $0$ as the leftmost digit of the binary expansion. Those groups are: $011$ , $101$ , and $100$ from left to right. In order, those numbers are $3$ , $5$ , and $4$ ; so, $236$ is $(354)_8$ .

Observe that in a base $b$ number system, the digits are $0, 1, 2, \ldots, b-1$ . Therefore the highest digit we will see in octal is $7$ .

However – what about hexadecimal? Its set of digits is larger than our usual base-10 set. It is standard usage to use $A$ through $F$ to represent $10$ through $15$ .

Decimal digit	$10$	$11$	$12$	$13$	$14$	$15$
Hexadecimal digit	$A$	$B$	$C$	$D$	$E$	$F$

Since $16=2^4$ , to write $236$ in hexadecimal we will group the digits in its binary expansion by fours this time. Those groups are $1110$ and $1100$ , which represent fourteen and twelve respectively. So, $236=(EC)_{16}$ .

To summarize our example, here are four different representations of the number $236$ .

System	Decimal	Binary	Octal	Hexadecimal
	$236$	$(11101100)_2$	$(354)_8$	$(EC)_{16}$

Takeaways

Recursion is when something is defined or implemented in terms of its own definition or implementation. Recursion is incredibly useful both mathematically and programmatically.
Given a positive integer $n$ , its factorial $n!$ – the product of all positive integers at most $n$ – may be recursively expressed as $n(n-1)!$ . Meanwhile, $0!=1$ .
Binary numbers are convenient in programming. We may write a binary number by recursively writing a smaller number in binary first.
We may express a binary number in octal or hexadecimal by grouping its digits by threes or fours.

Chapter 5 problems

6. Sequences and sums

Videos:

6.1 Sequences

Remember that a set has no order. For example, $\{1,2\}$ and $\{2,1\}$ are the same set. Often, we care about the order objects are in. Enter the sequence.

Definition 6.1.1 Let $I \subseteq \mathbf{N}$ and $X$ be some set. A sequence is a function $a:I \to X$ . We write $a_n$ instead of $a(n)$ . Here, $n$ is the index and $a_n$ is the term. We denote the entire sequence as $(a_n)$ (when the start and endpoints are stated elsewhere or are unimportant), $(a_n)_{n=k}^\infty$ (for an infinite sequence starting with $a_k$ ), or $(a_n)_{n=k}^m$ (for a finite sequence starting at $a_k$ and ending at $a_m$ ). A finite sequence is called a word from the alphabet $X$ .

Once again we have chosen a more technical, functional definition where we instead could have said “a sequence is a totally ordered list of objects.” It is worth considering the way in which a function can “port structure” from one set to another. If we wanted a totally ordered set it is hard to imagine a better candidate than $\mathbf{N}$ . The sequence transfers the order structure from $\mathbf{N}$ into whatever set $X$ we like.

Example. Consider the sequence $(a_n)_{n=1}^\infty = (1, 4, 9, 16, 25, \ldots).$ Observing that each term is the square of its index, this sequence may be represented in closed form by writing $a_n=n^2$ .

Not every sequence may be so easily identified in closed form.

Example. Consider the sequence $(F_n)_{n=0}^\infty = (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, \ldots).$ You may recognize this sequence as the famous Fibonacci sequence. There is a closed-form expression for $F_n$ , but you are unlikely to see it by visual inspection. However, notice that each term in the sequence is equal to the sum of the preceding two terms: for instance, $13=8+5$ .

Therefore instead we can represent the Fibonacci sequence recursively with a recurrence relation: $F_n = \begin{cases} F_{n-1} + F_{n-2}, & n > 1 \\ 1, & n = 1\\ 0, & n = 0 \end{cases}$

Once a recurrence relation is established, the terms of a sequence may be calculated iteratively or recursively. Each method has its strengths and weaknesses depending on the context; typically, recursive calculations are more difficult but require storing less information.

Example. Consider the sequence $(x_n)$ where $x_n = \begin{cases} x_{n-1} - 2x_{n-2}, & n > 2 \\ 1 & n = 2, \\ -1 & n = 1 \end{cases}$

To calculate $x_4$ iteratively, we simply calculate each term of $(x_n)$ until we arrive at $x_4$ . So: $x_1=-1$ , and $x_2 = 1$ . Then $x_3 = x_2 - 2x_1 = 1 - 2(-1) = 3,$ and $x_4 = x_3 - 2x_1 = 3 - 2(1) = 1.$ Thus, $x_4 = 1$ .

In the recursive calculation, we will begin with $x_4$ and repeatedly apply the recurrence relation until we have only known values of the sequence: so, $x_2$ and $x_1$ . First, $x_4 = x_3 - 2x_2.$ However, we “don’t know” what $x_3$ is, so we repeat the recursion: $x_4 = (x_2 - 2x_1) - 2x_2.$ Now we substitute in the given values: $x_4 = (1 - 2(-1)) - 2(1) = 1.$ It is important that we arrive at the same value both ways!

6.2 Sums

Given a sequence, a natural course of action is to add its terms together. To avoid writing a bunch of plus signs, we write $\sum_{n=k}^m a_n = a_k + a_{k+1} + \cdots + a_{m-1} + a_m$ for the sum of the members of $(a_n)$ , starting with the $k$ -th member and ending with the $m$ -th member.

Example. Consider the sum $\sum_{i=1}^4 i^2.$ Notice that $i$ is being used for the index rather than $n$ . That’s okay! The sum’s value is $\sum_{i=1}^4 i^2 = 1^2 + 2^2 + 3^2 + 4^2 = 30.$

We can have sums of sums, and sums of sums of sums, etc. Double sums are common in computer science when evaluating the runtimes of doubly-nested for loops.

Example. Consider the sum $\sum_{m=1}^3 \sum_{n=2}^4 (m+n).$ This sum is rectangular in the sense that the bounds of one sum are not determined by the other, and so we could do this sum in either order: $\sum_{m=1}^3 [(m+2) + (m+3) + (m+4)]$ or $\sum_{n=2}^4 [(1+n) + (2+n) + (3+n)].$ Either way, the sum’s value is $45$ .

For another example, take $\sum_{i=1}^4 \sum_{j=1}^i 2^{i+j}.$ Because the bounds of the $j$ -sum are dependent on the value of $i$ , we should iterate over $i$ first: $\sum_{j=1}^1 2^{j+1} + \sum_{j=1}^2 2^{j+2} + \sum_{j=1}^3 2^{j+3} + \sum_{j=1}^4 2^{j+4}.$

Each of these sums must be done separately. The first sum, of just one term, is equal to $4$ . The second sum is $24$ , the third is $112$ , and the fourth is $480$ ; so altogether the sum is $620$ .

We will end on two theorems that are important for manipulating sums, the second of which will be very useful later.

Theorem 6.2.1 [Linear property of summation] Let $(a_n)$ and $(b_n)$ be sequences and $c$ be a constant value. Then, $\sum_{n=k}^m (a_n + b_n) = \sum_{n=k}^m a_n + \sum_{n=k}^m b_n$ and $\sum_{n=k}^m ca_n = c\sum_{n=k}^m a_n.$

Proof. The associative property of addition – $(x+y)+z = x+(y+z)$ – and the distributive property – $c(x+y) = cx+cy$ – guarantee this theorem. $\square$

Example. Suppose we know that $\sum_{n=2}^{38} a_n = 14 \qquad \text{and} \qquad \sum_{n=2}^{38} b_n = 19.$ Then the value of the sum $\sum_{n=2}^{38} (a_n+2b_n)$ is $\sum_{n=2}^{38} a_n + 2 \sum_{n=2}^{38} b_n = 14 + 2(19) = 52.$ Notice that both sums must start and end on the same index.

Theorem 6.2.2 [Recursive property of summation] Let $(a_n)$ be a sequence. Then $\sum_{n=k}^m a_n = \left( \sum_{n=k}^{m-1} a_n \right) + a_m = a_k + \left( \sum_{n=k+1}^{m} a_n \right).$

Proof. By definition, $\sum_{n=k}^m a_n = a_k + a_{k+1} + \cdots + a_{m-1} + a_m.$ By the associative property of addition we may rewrite this as either $(a_k + a_{k+1} + \cdots + a_{m-1}) + a_m = \left( \sum_{n=k}^{m-1} a_n \right) + a_m$ or $a_k + (a_{k+1} + \cdots + a_{m-1} + a_m) = a_k + \left( \sum_{n=k+1}^{m} a_n \right),$ proving the theorem. $\square$

Example. Suppose it is known that $b_{107} = 72$ and that $\sum_{k=1}^{106} b_k = 291.$ Then, $\sum_{k=1}^{107} b_k = 363.$

Takeaways

A sequence is a (possibly infinite) ordered collection of objects.
Terms of a sequences may be defined in closed form, via a formula, or recursively based on prior terms of the sequence.
Sums may be calculated directly or indirectly, using linearity or recursiveness.

Chapter 6 problems

7. Asymptotics

Videos:

Consider the following example of two programs, $A$ and $B$ , that process some data. The following table provides the number of computations taken by each program to process $n$ data points.

$n$	Program $A$	Program $B$
1	1	58
10	100	508
20	400	1008
50	2500	2508
60	3600	3008
100	10,000	5008

At first, Program $A$ looks much better than Program $B$ . However, as $n$ increases, it becomes clear that $A$ is worse at handling large quantities of data than $B$ . If $(A_n)$ and $(B_n)$ represent sequences of the numbers of steps taken by each program, then $A_n=n^2$ and $B_n = 50n+8$ . The way we quantify the relationship between the two sequences is by saying $A_n$ is “big-O” of $B_n$ .

7.1 Growth bounds on sequences

Definition 7.1.1 Let $a_n$ and $b_n$ be terms of sequences. We say $a_n$ is big-O of $b_n$ if there exist a real number $C$ and an integer $k$ such that for all $n > k$ , $|a_n| \le Cb_n.$ We write $a_n = O(b_n)$ , even if the equals sign is technically incorrect.

This is another technical definition that can be hard to parse, but the idea is that eventually (after the $k$ -th term of the sequence) $a_n$ does not grow faster than $b_n$ . In this way, $b_n$ serves as an upper bound on the growth of $a_n$ .

As we will see again in Chapter 18, there may be many upper bounds. For example in a classroom of twenty-nine students I may say “Here are fewer than thirty students” or “Here are fewer than a million students.” The first statement is obviously more informative. So often you will be asked to find the least upper bound on the growth of $a_n$ ; this is usually a $b_n$ that is the simplest and is itself big-O of all the other options.

Theorem 7.1.2 Any non-negative sequence is big-O of itself.

Proof. Let $(a_n)$ be a non-negative sequence; that is, $a_n \ge 0$ for all $a_n$ . Then the statement $|a_n| \le a_n$ is satisfied by equality since $|a_n| = a_n$ . If we put $C=k=1$ , then this statement implies that $a_n = O(a_n)$ . $\square$

We can form a basic hierarchy of sequences and their big-O relationships. In the below table, a sequence $a_n$ appearing to the left of a sequence $b_n$ means that $a_n=O(b_n)$ .

Slowest							Fastest
$1$	$\log n$	$n$	$n \log n$	$n^2$	$n^3$	$2^n$	$n!$

At the left-most end of the chart, the slowest-growing sequence is $(a_n) = (1, 1, 1, \ldots)$ . It is slowest-growing because it does not grow at all! In fact, any constant sequence is big-O of any other sequence.

Next is the sequence $(\log n)$ . In this book, $\log n$ refers to the base-2 logarithm that you might have elsewhere seen written as $\log_2 n$ . Here are a couple of ways to conceptualize this sequence and its relevance to computer science.

$\log n$ is big-O of the number of digits in the binary expansion of $n$ .
$\log n$ is roughly the number of times a finite sequence of length $n$ can be cut in half.

The rest of the functions in the list should be familiar. Notice that all polynomial functions of higher degrees appear between $n^3$ and $2^n$ , and higher-order exponential functions like $e^n$ occur between $2^n$ and $n!$ .

The graphs of n squared and 2 to the n are compared. After n=4, 2 to the n is always above n squared. So, n squared is big-O of 2 to the n.

Figure. Two graphs are compared. Since after a point the blue graph is always above the red graph, the function in red is big-O of the function in blue.

Now we know where basic functions fit into the big-O hierarchy, but what about more complicated expressions like $50n+8$ ? The following theorem will help.

Theorem 7.1.3 Let $(a_n)$ and $(b_n)$ be sequences.

If $c$ is a nonzero constant, then $ca_n = O(a_n)$ .
Suppose there exists a sequence $c_n$ such that $a_n = O(c_n)$ and $b_n=O(c_n)$ . Then, $a_n+b_n$ is big-O of $c_n$ as well.
Suppose there exist sequences $A_n$ and $B_n$ such that $a_n=O(A_n)$ and $b_n=O(B_n)$ . Then, $a_nb_n = O(A_nB_n)$ .

In plain English, this theorem says:

Constant multiples do not affect growth rate.
When adding two sequences, the larger absorbs the smaller. In other words, one must only look at the highest-order term of a sum.
Growth rates are multiplicative; “big-O of the product is the product of the big-O’s,” though that language is very informal.

It is worthwhile to read the below proof of the theorem, but do not worry if you don’t understand it. Read it once, then skip it until after Chapter 8. You can work the problems at the end of the chapter even if you don’t totally understand the following proof.

Proof. To prove the addition rule (second bullet point), suppose $a_n = O(c_n)$ and $b_n = O(c_n)$ . That means there exist witnesses $k_1, k_2, C_1,$ and $C_2$ where $|a_n| \le C_1c_n$ for all $n > k_1$ and $|b_n| \le C_2c_n$ for all $n > k_2$ . Then, $a_n + b_n \le C_1c_n + C_2c_n = (C_1+C_2)c_n$ for all $n > \text{max}\{k_1,k_2\}$ . In other words, this inequality holds for all $n$ far enough along that both the big-O relationships hold. By definition, this means $a_n + b_n$ is big-O of $c_n$ .

To prove the multiplication rule (third bullet point), suppose there exist sequences $A_n$ and $B_n$ such that $a_n=O(A_n)$ and $b_n=O(B_n)$ . Therefore there exist witnesses $k_1, k_2, C_1,$ and $C_2$ where $|a_n| \le C_1A_n$ for all $n > k_1$ and $|b_n| \le C_2B_n$ for all $n > k_2$ , so that for all $n > \text{max}\{k_1, k_2\}$ , $|a_nb_n| \le (C_1A_n)(C_2B_n) = (C_1C_2)A_nB_n.$

To prove the theorem about constant multiples (first bullet point), apply the multiplication rule where $(b_n)$ is the constant sequence $(c, c, c, \ldots)$ . Observe that this sequence is big-O of the sequence $(1)$ by taking $k=0$ and $C=1/c$ . So, since $a_n = O(a_n)$ and $c = O(1)$ , $ca_n = O(1\times a_n) = O(a_n)$ as claimed. $\square$

Finally, we have enough tools to do some examples!

Examples. Consider the sequence whose terms are given by the formula $a_n = n^2 + 5n + 7$ . We would like to find the simplest function $b_n$ of lowest order such that $a_n=O(b_n)$ .

Notice that $a_n$ is a sum of three terms: $n^2$ , $5n$ which is big-O of $n$ , and $7$ which is big-O of $1$ . Only the highest-order of these matters; both the constant term and the linear term $n$ are big-O of $n^2$ , but not the other way around. Therefore $a_n = O(n^2)$ .

What is the simplest and lowest-order function $b_n$ such that $(1 + 5n + 2n^2)(\log n + 3n \log n)$ is big-O of $b_n$ ?

Theorem 7.1.3 says we only need the highest-order term of each factor, and then we must multiply those terms together. The highest-order term of the first factor is big-O of $n^2$ and the second factor is big-O of $n \log n$ ; so, the whole sequence is big-O of $n^3 \log n$ .

7.2 Other growth relationships

We will briefly mention two other growth relationships here. Recall that saying $a_n = O(b_n)$ is to say that $b_n$ is an upper bound on the growth rate of $a_n$ . Then saying $a_n = \Omega(b_n)$ (big-Omega) is to say that $b_n$ is a lower bound on the growth rate of $a_n$ , and $a_n = $\Theta(b_n)$ (big-Theta) is to say that $a_n$ and $b_n$ grow in the same way.

Definition 7.2.1 Let $(a_n)$ and $(b_n)$ be sequences. Then

$a_n = \Omega(b_n)$ if $b_n = O(a_n)$ , and
$a_n = \Theta(b_n)$ if $a_n = O(b_n)$ and $b_n = O(a_n)$ .

Takeaways

“Big-O” relationships are a way to quantify when one sequence is eventually bounded in its growth by another.
There is a hierarchy of basic sequences that may be committed to memory.
Beyond the basic sequences, three rules govern the relationships between more complicated functions: constant multiples do not matter; addition is absorptive; and multiplication is, well, multiplicative.

Chapter 7 problems

8. Proof by induction

Review.

Basic algebra! This is probably the most algebraically taxing chapter in the book.
Chapter 4: Proofs and the definition of divisibility.
Chapter 5: Recursion.
Chapter 6: Sequences and sums, particularly sigma notation and the recursive property of summation.

It is time to come full circle on our discussion of proofs and recursion to establish the recursive proof technique, proof by induction.

What are the two ingredients to knocking over a line of domino tiles? Each domino must be sufficiently close to the next, and you must tip over the first one.

Proof by induction proceeds in much the same way. We want to prove some statement $P(n)$ holds for every natural number $n$ past a base value $n_0$ . We prove the following two claims:

The base case: That $P(n_0)$ is true. The base case is easy to prove - just plug in $n_0$ and make sure the statement is true!
The inductive step: That the inductive hypothesis, $P(k)$ , for some integer $k \ge n_0$ , implies $P(k+1)$ .

Once both of these statements are proven we will have $P(n_0)$ , $P(n_0+1)$ , $P(n_0+2)$ , and so on; we will know $P(n)$ to be true for all $n \ge n_0$ .

8.1 Inductive proofs involving sums

We will discuss three types of inductive proofs in this chapter. The first type involve proving facts about sums. Recall the recursive property of summation from Chapter 6, which will mostly look like this now: $\sum_{i=1}^{k+1} a_i = \left( \sum_{i=1}^{k} a_i \right) + a_{k+1}.$

Theorem 8.1.1 For all $n \ge 1$ , $\sum_{i=1}^n i = \frac{n(n+1)}{2}.$

Proof. First, check the base case. Since $\sum_{i=1}^1 i = 1 = \frac{1(1+1)}{2},$ the statement is true in the base case.

Next, suppose that for some integer $k \ge 1$ that $\sum_{i=1}^k i = \frac{k(k+1)}{2}$ (this is the inductive hypothesis). We claim that $\sum_{i=1}^{k+1} i = \frac{(k+1)(k+2)}{2}.$ This statement is $P(k+1)$ –notice that $k+1+1$ is $k+2$ . We say up front that we “claim” it so that our reader knows where we are headed.

The preceding part of the proof is basically the same for all proofs by induction you’ll do. The next part is where the real work comes in, and it depends on what sort of thing you’re doing induction on. Since this is a sum, we will use the recursive property.

We may rewrite $\sum_{i=1}^{k+1} i$ as $\left( \sum_{i=1}^k i \right) + (k+1).$ Notice that the sum is known by the inductive hypothesis; so, this is $\frac{k(k+1)}{2} + (k+1).$ We put the fractions over common denominators: $\sum_{i=1}^{k+1} i = \frac{k(k+1)}{2} + \frac{2(k+1)}{2}.$ Then, we regroup the fractions: $\sum_{i=1}^{k+1} i = \frac{(k+2)(k+1)}{2},$ as claimed.

Therefore by induction, $\sum_{i=1}^n i = \frac{n(n+1)}{2}$ for all $n \ge 1$ . $\square$

If desired, you could expand out $k(k+1)$ and $2(k+1)$ and add them together and check the result against the expansion of $(k+1)(k+2)$ . Instead, we noted that $(k+2)(k+1)=k(k+1)+2(k+1)$ . Factoring!

8.2 Inductive proofs involving divisibility

Remember that $a$ divides $n$ (or $a|n$ ) means that there exists an integer $k$ such that $n=ak$ . Induction may be used to prove that every element of a sequence is divisible by some number.

Theorem 8.2.1 If $n \ge 1$ , then $3|(4^n+5)$ .

Proof. Observe that when $n=1$ , $4^n+5=9$ , which is divisible by 3. So, the statement is true in the base case.

Assume that for some integer $k \ge 1$ that $3|(4^k+5)$ . We mean to show that $3|(4^{k+1}+5)$ .

Observe that $4^{k+1}+5$ may be rewritten as $4\cdot 4^k + 5.$ This is the same as $3\cdot 4^k + 4^k + 5.$ The assumption that $3|(4^k+5)$ means there exists an integer $m$ such that $4^k+5 = 3m$ . So, $3\cdot 4^k + 4^k + 5 = 3 \cdot 4^k + 3m = 3(4^k+m),$ implying that $3|(4^{k+1}+5)$ as desired.

Therefore, by induction, $3|(4^n+5)$ for all $n \ge 1$ . $\square$

The key calculation, that $4^{k+1} = 3\cdot 4^k + 4^k$ , may merit some discussion. Recall that $4^{k+1} = 4 \cdot 4^k$ ; this is the “addition rule” for exponents. Then we observe that $4=3+1$ and apply the distributive property. In general, we may say that $a^{k+1} = (a-1)a^k + a^k$ for all real numbers $a$ ; this fact will be helpful when you do these types of proofs on your own.

8.3 Inductive proofs involving inequality

Remember the definition of big-O involves the key line $|a_n| \le Cb_n$ for sequences $(a_n)$ and $(b_n)$ and a real number $C$ . So, computer scientists are often interested in justifying that an inequality is true past some given threshold.

Before we begin, we should discuss some tips for manipulating inequalities. They are much more flexible than equalities; this is both a positive and a negative in the sense that it is sometimes difficult to know the next step to take! So, outside of the proof we will “work backwards” to see if that illuminates the process; more on this in a moment.

Theorem 8.3.1 Suppose that $a$ , $b$ , $c$ , and $d$ are positive real numbers satisfying $a \le b$ and $c \le d$ . Then $a + c \le b + d$ and $ac \le bd.$

To get an idea of what sort of “scratch work” is helpful or appropriate, take the following theorem.

Theorem 8.3.2 If $n \ge 5$ , then $2n^2 + 4n + 1 \le 3n^2$ .

Notice that this statement is a proof that $2n^2 + 4n + 1$ is big-O of $n^2$ . (Do you see why?)

In a proof, it is not valid to imagine that the desired statement is true and work backwards. However, working these “backwards” steps can give us a crucial idea as to what the valid, forward-thinking proof should look like. Knowing how proof by induction works, we will assume that $2k^2 + 4k + 1 \le 3k^2$ and hope to claim that $2(k+1)^2 + 4(k+1) + 1 \le 3(k+1)^2.$ How to prove such a claim? Well, if we expand the desired statement, we see $2k^2 + 4k + 2 + 4k + 4 + 1 \le 3k^2 + 6k + 3.$ Using Theorem 8.3.1, we can cancel the terms of the inductive hypothesis to arrive at $4k + 6 \le 6k + 3.$ This is nearly an obvious statement – because $k \ge 5$ we know $4k+6$ is at least 26 and $6k+3$ is at least 33, and the latter grows faster – but we can rewrite it as $3 \le 2k,$ which is clear for $k \ge 5$ .

Having done this work now, we know exactly what do in our proof. Crucially, all of the above steps are reversible!

Proof. The statement is true in the base case because $2(5)^2 + 4(5) + 1 = 71 \le 75 = 3(5)^2.$

Assume that for some integer $k \ge 5$ that $2k^2 + 4k + 1 \le 3k^2$ . We aim to show that $2(k+1)^2 + 4(k+1) + 1 \le 3(k+1)^2.$

First, since $k \ge 5$ , we know that $3 \le 2k,$ and that this implies (by adding $4k+3$ to both sides) that $4k+6 \le 6k+3$ .

Having assumed $2k^2 + 4k + 1 \le 3k^2$ , we may combine both inequalities: $2k^2 + 4k + 1 + 4k + 3 \le 3k^2 + 6k + 3,$ which may be regrouped and rearranged into $2(k+1)^2 + 4k + 1 \le 3(k+1)^2,$ as claimed.

Therefore, by induction $2n^2 + 4n + 1 \le 3n^2$ for all $n \ge 5$ . $\square$

Proofs involving sums and divisors are relatively straightforward to do without scratch work, because there is only one way to manipulate an equality. There are many ways to manipulate inequalities, and so the scratch work can point you in the right direction.

Takeaways

Proof by induction is a recursive proof technique where $P(n)$ is proven for all $n \ge n_0$ by proving (1) $P(n_0)$ is true and (2) $P(k+1)$ follows from $P(k)$ .
The type of work involved in the proof varies depending on what sort of structure the theorem is about. Sum and divisibility proofs involve exploiting the recursiveness of the sum or the exponential function.
When proving an inequality, it is often useful to “work backward” first (but not within the proof!) to see what steps should be taken.
Again: practice, practice, practice.

Chapter 8 problems

Interlude

At this point in the book you have been exposed to the basics of formal mathematics: sets, logic, and proof. The motivated independent student could quit here and go off and read something else; you now know enough to read introductory books in various mathematical topics such as algebra, topology, and analysis (though calculus will help with the latter two).

The second half of this book will deal with two topics in discrete math: combinatorics, or methods to solve counting problems; and relations, a subset of graph theory that touches on the areas of functions, orders, and equivalences. Familiarity with all of the previous material, especially sets, logic, proofs, sums, and recursion will be necessary.

9. Introduction to combinatorics

Review.

Chapter 1: Sets, unions, products, and cardinality
Chapter 5: Recursion and factorials
Chapter 8: Proof by induction

The time has come to learn how to count. Counting problems that require more sophisticated methods than simple listing include knowing how a program’s computation time scales with more data or counting the number of objects or structures of a particular type. In this chapter we will familiarize ourselves with the basics of counting: enumeration, addition, and multiplication.

9.1 Enumeration

In blatant disregard for the previous paragraph, a powerful tool in combinatorics is enumeration: systematically listing the objects to be counted. If the target set is small and finite, enumeration is perfectly fine. When the target set is cumbersomely large, enumeration can still help give an idea of how the elements are generated so that formal rules can give their number without the need to fully list them.

Example. How many ways are there to shuffle a deck of 52 cards?

It will be impossible to fully enumerate the space of objects to be counted (we will discuss just how impossible once we have an answer). However, let’s pretend the deck has just four cards: $A$ , $B$ , $C$ , and $D$ .

Now it is possible to systematically list out all possible shuffles of the deck. Let’s arrange them into a table where each column corresponds to a different card on top of the deck. For example, we will list the two decks where $AB$ are on top; then, we will list the two decks where $AC$ is on top; then the two decks with $AD$ on top. Here is the full list of decks:

$A$ on top	$B$ on top	$C$ on top	$D$ on top
$ABCD$	$BACD$	$CABD$	$DABC$
$ABDC$	$BADC$	$CADB$	$DACB$
$ACBD$	$BCAD$	$CBAD$	$DBAC$
$ACDB$	$BCDA$	$CBDA$	$DBCA$
$ADBC$	$BDAC$	$CDAB$	$DCAB$
$ADCB$	$BDCA$	$CDBA$	$DCBA$

There are $24$ such shuffles.

Another enumeration technique is the “tree diagram.” The tree diagram for all possible shuffles of four cards is pictured below.

Figure. The 24 arrangements of abcd are visualized via a tree diagram.

We have enough information here to figure out how many shuffles there are of a deck of $52$ cards. However, we need a theorem first.

9.2 Addition and multiplication principles

In combinatorics we count the number of outcomes that lead to an event coming to pass. In the previous example, “a deck of four cards is shuffled” is the event and the outcome $BCDA$ is an outcome. We may phrase this set-theoretically by saying an event is a set and that the outcomes are the elements of that set. So, $BCDA \in \text{A Deck Of Four Cards Is Shuffled}.$

With this in mind we have already seen the addition and multiplication rules for counting!

Theorem 9.2.1 Let $E$ and $F$ be events. Then the event " $E$ or $F$ " has $|E \cup F|=|E| + |F| - |E \cap F|$ outcomes and the event " $E$ and $F$ " and $|E \times F|=|E|\cdot |F|.$

Example. A particular restaurant sells many varieties of burritos. Of the selections, seven burritos are spicy and six are vegetarian. Two vegetarian burritos are spicy. How many burritos are spicy or vegetarian?

Letting $S$ be the set of spicy burritos and $V$ be the set of vegetarian burritos, we are looking for $|S \cup V| = |S| + |V| - |S \cap V|,$ which is $7+6-2$ or $11$ .

Example. How many ways are there to shuffle a deck of 52 cards?

Again we will consider the example of the four card deck $A$ , $B$ , $C$ , and $D$ . Some card must go first; there are four such choices. Then a card must go second, for which there are three choices. Then two, then one.

To give an example of how set theory is relevant, we may decompose the set $\text{Deck Is Shuffled}$ into a product of four sets: $\text{Deck Shuffled} = \text{First Card} \times \text{Second Card} \times \text{Third Card} \times \text{Fourth Card},$ where the content of (for instance) $\text{Third Card}$ depends on the choices of the first and second cards.

By the multiplication rule, there are $4 \cdot 3 \cdot 2 \cdot 1$ ways to shuffle this deck. Fortunately for us this is $24$ , which was the answer we got last time!

You may have noticed we have done this calculation before. The product $4 \cdot 3 \cdot 2 \cdot 1$ is exactly $4!$ . By similar reasoning, there are $52!$ ways to shuffle a deck of standard size: there are $52$ choices for the top card, then $51$ choices for the next card, then (etc.)

In fact the following theorem may be proven by induction.

Theorem 9.2.2 There are $n!$ ways to order a set of $n$ elements. Such an ordering is called a permutation of the set.

The first line of the following proof finally gives some intuition as to why $0!=1$ is a good definition.

Proof. Observe that there is one ordering of the elements of $\varnothing$ : the empty ordering $\{ \}$ . Since $0!=1$ , the statement is true in the base case.

Suppose there are $k!$ permutations of a $k$ -element set. Consider then a set with $k+1$ elements. We must choose an element to be first in the permutation; there are $(k+1)$ choices. After doing so we are left to permute the remaining $k$ elements; by the inductive hypothesis there are $k!$ ways to do so. Using the multiplicative principle of counting there must be $(k+1) k! = (k+1)!$ permutations of the $(k+1)$ -element set, so the theorem is true by induction. $\square$

To get an idea of how mind-blowingly large factorials can become, check out this article about how long it would take for $52!$ seconds to go by.

Takeaways

Some basic tools in the combinatorist’s kit are enumeration, addition, and multiplication. Another is correspondence, which will be developed slowly over the course of the chapters. A very exciting fifth tool, called the generating function, will be introduced in Chapter 11.
Enumeration is not just listing the elements of a set; it is systematically listing them so that the set’s structure may be better understood.
Addition is used to count elements of “or” events and multiplication to counts elements of “and” events.
Factorials are big.

Chapter 9 problems

10. Ordered samples

Review.

Chapter 6: sequences

Many of the things we are trying to count can be conceptualized as samples being taken from a sample space (think universal set). Whether that sample is ordered or unordered, and whether the elements may be repeated or replaced, affects the way the samples should be counted.

In this chapter we will discuss samples where the order of the elements is relevant. Examples include “words” (strings of characters that may have meaning attached, whether those characters be the usual letters or something else) and groups of people where each person has a different role (so switching the people in their roles would create a different configuration of the team).

Ordered samples with replacement are finite sequences, and ordered samples without replacement are permutations.

10.1 Finite sequences

Let $n$ and $r$ be natural numbers and $\Omega$ be a set of $n$ elements. Interchangeably we will call $\Omega$ the universe or sample space (in the context of this particular section $\Omega$ may also be called the alphabet).

A finite sequence in $\Omega$ is exactly what it sounds like: an ordered sample $(a_k)_{k=1}^r = (a_1,a_2,\ldots, a_r)$ where each $a_k$ may be the same. We say that the elements may be “repeated” or, if they are concrete objects being drawn or selected, “replaced.” (Imagine a Scrabble tile being put back in the bag, or a card back in the deck.)

Given $\Omega$ , how many sequences of length $r$ are in $\Omega$ ? Let’s consider the following example.

Example. Suppose a password of length $10$ is to be made of only lowercase letters. The alphabet, or universe, is the set $\{a, b, c, \ldots, z\}$ . We must choose a letter for the first character of the password: there are $26$ total choices. Then we must choose the second character. Since usually passwords allow for character repetition, there are still $26$ choices; and for the third, and fourth, so on. Hence there are $26 \times 26 \times \cdot \times 26 = 26^{10}$ total passwords. (See the problems at the end of the chapter for why you should vary the types of characters in your passwords.)

This example generalizes to the following theorem.

Theorem 10.1.1 Let $n, r \in \mathbf{N}$ . There are $n^r$ finite sequences of length $r$ in a set $\Omega$ with $|\Omega|=n$ .

The following construction will be useful in other counting contexts.

Definition 10.1.2 A bit string of length $n$ is a sequence of length $n$ in the set $\{0,1\}$ .

Be sure you understand why there are $2^n$ bit strings of length $n$ .

10.2 Permutations

Example. Suppose a team of three people, each with a different role, is to be made from seven volunteers. Are there $7^3$ possible teams? No, because this formula assumes that one person could take all three roles. That would not be a “team of three people.” Usually when we are counting groups of people, we are not allowing for repetition.

In the last chapter, we defined a permutation of $\Omega$ to be an ordering of its elements, and discovered that there are $n!$ such orderings if $\Omega$ has $n$ elements. In the above example, there are $7!$ ways to arrange the entire set of volunteers. That’s closer to what we want, but we still only need three of the volunteers.

Consider that there are $7$ choices for the first role. Now that person may not be chosen again, and there are only $6$ choices for the second role. Accordingly, there are only $5$ choices for the third role. Altogether according to the multiplication principle, we have $7 \times 6 \times 5 = 210$ options.

There is another way to come up with the number $7 \times 6 \times 5$ . Remember that there are $7!$ ways to arrange the whole set of people, but that we only care about the arrangement of three of them. In other words, there are four people whose arrangement we don’t care about at all. What if we divided $7!$ by the $4!$ ways to arrange those people? $\frac{7!}{4!} = 7 \times 6 \times 5 = 210$ That’s the same answer! How convenient…

Theorem 10.2.1 The number of $r$ -permutations, or ordered samples without replacement, of a set of $n$ elements is $P(n,r) = \frac{n!}{(n-r)!},$ where “ $P(n,r)$ ” is read " $n$ permute $r$ ".

One hopes that if $r=n$ , we recover the number of ways to permute the entire set. Good news:

Theorem 10.2.2 $P(n,n)=n!$ .

Proof. $P(n,n) = n!/(n-n)! = n!/0! = n!$ , because $0!=1$ . $\square$

Takeaways

Ordered samples with replacement are called finite sequences and there are $n^r$ of them.
Ordered samples without replacement are called $r$ -permutations and there are $n!/(n-r)!$ of them.

Chapter 10 problems

11. Unordered samples

Review.

Chapter 1: Sets and subsets
Chapter 5: Recursion
Chapter 6: Recursion and summation

In the last chapter, we counted the number of ordered samples with $r$ elements we could make from a universe $\Omega$ of $n$ elements. If the sample allows repetition or replacement of the elements, it is a finite sequence and there are $n^r$ of them. If not, they are called $r$ -permutations and there are $P(n,r) = n!/(n-r)!$ of them.

This chapter will explore the very rich combinatorics that arise from failing to distinguish between orders.

11.1 Counting subsets

This section will answer two questions: how many unordered samples without repetition are there, and what are they? The title of this section answers the latter question, and this viewpoint is going to give us a lot of extra power when counting these structures.

Example. How many ways can the three elements be chosen in any order with no replacement from the universe $\{a,b,c,d,e\}$ ?

We’ve already done a little work already. We can start by taking the ordered samples, the $3$ -permutations of the set, of which we already know there are $P(5,3)=5!/2!=60$ . We won’t list all $60$ $3$ -permutations, but we will make note of a few: $abc \qquad acb \qquad bac \qquad bca \qquad cab \qquad cba$ All six of these ordered samples correspond to the single unordered sample $abc$ . In fact, for any $3$ -element unordered sample there are $3!=6$ ways to order that sample, so in counting the ordered samples we have overcounted the unordered samples by a factor of $6$ . Thus, there are $\frac{5!}{3!2!} = 10$ unordered samples without repetition, and they are tabulated below. $abc \qquad abd \qquad abe \qquad acd \qquad ace$ $ade \qquad bcd \qquad bce \qquad bde \qquad cde$

So the program to count unordered samples without replacement is to count the ordered ones, and then divide by the number of possible orders.

What are we counting? Well, we are counting objects with no order and no repetition: that’s a set! The unordered samples without replacement from a universe are exactly the subsets of that universe.

Theorem 11.1.1 Let $n$ and $r$ be natural numbers where $r \le n$ . A set with $n$ elements has ${{n}\choose{r}} = \frac{n!}{r!(n-r)!}$ subsets of cardinality $r$ , where the symbol above is a binomial coefficient read " $n$ choose $r$ ." When typeset inline, we may use $C(n,r)$ instead for the same number.

Example. Enumerate the subsets of $\{a,b,c,d\}$ , separated by cardinality.

$r$	subsets	$C(n,r)$
$0$	$\varnothing$	$\frac{4!}{0!4!}=1$
$1$	$\{a\}$ , $\{b\}$ , $\{c\}$ , $\{d\}$	$\frac{4!}{1!3!}=4$
$2$	$\{a,b\}$ , $\{a,c\}$ , $\{a,d\}$ , $\{b,c\}$ , $\{b,d\}$ , $\{c,d\}$	$\frac{4!}{2!2!}=6$
$3$	$\{a,b,c\}$ , $\{a,b,d\}$ , $\{a,c,d\}$ , $\{b,c,d\}$	$\frac{4!}{3!1!}=4$
$4$	$\{a,b,c,d\}$	$\frac{4!}{4!0!}=1$

There are a couple of worthwhile patterns to point out. First, notice the symmetry. We can see why $C(4,3)$ and $C(4,1)$ would be the same–the denominators are the same because multiplication is commutative. But combinatorially, in terms of what they are counting, why would they be the same? Observe that when we create a three-element subset of $\{a,b,c,d\}$ , we are implicitly choosing a one-element subset to be “left behind.” For instance, when we take the subset $\{a,c,d\}$ we are not taking $\{b\}$ . So, these subsets are in correspondence with each other.

Second, notice that $1+4+6+4+1=16=2^4$ , which is the total number of subsets of $\{a,b,c,d\}$ .

11.2 Binomial coefficients

The binomial coefficient ${{n}\choose{r}} = \frac{n!}{r!(n-r)!}$ counts the number of subsets with $r$ elements from a universe $\Omega$ with $n$ elements. It will be useful to note that $C(n,r)$ also counts the number of bit strings of length $n$ that have $r$ 1’s.

To see why, consider a bit string of length $n$ . If $n=7$ , the string can be written $a_1a_2a_3a_4a_5a_6a_7$ . Consider also a set with $7$ elements. We may number of the elements of that set so that the set is labeled $\{x_1, x_2, x_3, x_4, x_5, x_6, x_7\}$ . Then a bit string $S$ with $r$ 1’s corresponds to a subset $E$ of $r$ elements by saying $a_i=1$ if and only if $x_i \in E$ . For example, the string $0111010$ corresponds to the subset $\{x_2,x_3,x_4,x_6\}$ . So, there are exactly as many bit strings with a certain number of 1’s as there are subsets of a fixed size, and vice-versa.

Next, we will explore a recurrence relation on the binomial coefficients that will help us compute many of them quickly. This theorem (and the associated visual) are typically named after 17th century French mathematician Blaise Pascal. However, it was known much earlier to Islamic, Chinese, and Indian mathematicians, including Pingala, the Indian mathematician who used binomial coefficients to study poetic meter. Therefore we will not contribute to the western colonization of mathematics by naming these theorems after a man who did not invent them.

Theorem 11.2.1 Let $n$ and $r$ be natural numbers so that $1 \le r \le n-1$ . Then ${{n}\choose{r}} = {{n-1}\choose{r-1}} + {{n-1}\choose{r}}.$

We will prove this theorem with a combinatorial proof, where we prove that two numbers are equal by proving they each count the same set of objects.

Proof. As previously shown, there are $C(n,r)$ bit strings of length $n$ with exactly $r$ 1’s. We claim that there are also $C(n-1,r-1)+C(n-1,r)$ such strings.

Suppose we would like to make a bit string of length $n$ with exactly $r$ 1’s. We must make a choice for the first digit of the string.

If we choose a 1 to be the first digit of the string, then the remaining $n-1$ characters must have $r-1$ 1’s; there are $C(n-1,r-1)$ ways to make this choice.

Likewise, if we choose a 0 to be the first digit of the string, then the remaining $n-1$ characters must have $r$ 1’s. There are $C(n-1,r)$ ways to make this choice.

Our first digit must be 0 or 1 and cannot be both, so there are $C(n-1,r-1)+C(n-1,r)$ total choices by the addition principle of counting. Therefore, $C(n,r)=C(n-1,r-1)+C(n-1,r)$ . $\square$

The recurrence relation gives rise to a visual representation of the binomial coefficients that we will simply call the binomial triangle. We will arrange the binomial coefficients in a triangle so that each row corresponds to a value of $n$ starting at $0$ , and that within each row $r$ increases from $0$ to $n$ as we read from left to right. Because $C(n,0)=n!/(0!n!)=1$ for all $n$ , the tip and sides of the triangle are all $1$ 's. The recurrence relation says that any other entry in the triangle is the sum of the two that come above it. So the first seven rows of the binomial triangle are displayed below. Check carefully that you understand how to generate the triangle yourself.

$1$ $1 \; \; 1$ $1 \; \; 2 \; \; 1$ $1 \; \; 3 \; \; 3 \; \; 1$ $1 \; \; 4 \; \; 6 \; \; 4 \; \; 1$ $1\, \; 5 \; 10 \; 10 \; 5 \; \, 1$ $1 \; 6 \; 15 \; 20 \; 15 \; 6 \; 1$

Our final theorem in this section will explain why binomial coefficients are named as they are. First, consider this example.

Example. Calculate the squared binomial $(x+y)^2$ : $(x+y)^2 = x^2 + 2xy + y^2.$ “Why?” You exclaim. “We already needed to know how to do this back in Chapter 4.” Okay, then, what about the cubed binomial $(x+y)^3$ ? Likely, you will need to use the distributive property and do this one longhand. $(x+y)^3 = (x+y)(x+y)(x+y)$ Each term will be a collection of three variables, some of which are $x$ and some of which are $y$ . $(x+y)^3 = xxx + xxy + xyx + yxx + xyy + yxy + yyx + yyy$ Combine like terms: $(x+y)^3 = x^3 + 3x^2y + 3xy^2 + y^3$

You may have noticed the punchline already. $(x+y)^1 = \mathbf{1}x + \mathbf{1}y$ $(x+y)^2 = \mathbf{1}x^2 + \mathbf{2}xy + \mathbf{1}y^2$ $(x+y)^3 = \mathbf{1}x^3 + \mathbf{3}x^2y + \mathbf{3}xy^2 + \mathbf{1}y^3$

The coefficients of the powers of $(x+y)^n$ are exactly the binomial coefficients $C(n,r)$ . How could this possibly be true? Well, consider what happened when we expanded $(x+y)^3$ . For each factor, we chose either an $x$ or a $y$ . We can liken that to having to choose a 1 or a 0 for each digit in a bit string, and so the correspondence follows.

Theorem 11.2.2: The binomial theorem Let $n$ and $r$ be natural numbers with $r \le n$ . Then $(x+y)^n = \sum_{r=0}^n {{n}\choose{r}} x^{n-r} y^r.$

Proof. The function $(x+y)^n$ may be expanded: $(x+y)^n = \sum_{r=0}^n a_r x^{n-r} y^r,$ for some coefficients $a_r$ . We claim that $a_r=C(n,r)$ .

A term of $(x+y)^n$ is the product of $n$ variables, $r$ of which are $y$ 's and $n-r$ of which are $x$ 's. Before the $x$ 's and $y$ 's are combined with exponential notation we can associate a bit string of length $n$ with exactly $r$ 1’s to each term. For instance, if $n=8$ and $r=3$ , the term $xxxyyxyx$ can be identified with the string $00011010$ . Then when we rewrite the term as $x^5y^3$ , the coefficient of the term will be the number of such strings, which is exactly $C(n,r)$ . $\square$

Example. Let’s expand and simplify $(a-3)^4$ . Using the binomial theorem, we have $(a-3)^4 = a^4 + 4a^3(-3) + 6a^2(-3)^2 + 4a(-3)^3 + (-3)^4.$ Then, $(a-3) = a^4 - 12a^3 + 54a^2 - 108a + 81.$ Look how much easier that is than multiplying $(a-3)(a-3)(a-3)(a-3)$ would have been!

Example. We hinted back in Section 11.1 that the sum of all the binomial coefficients for fixed $n$ is $2^n$ . You will have the opportunity to give a combinatorial proof of this fact in the problems. However, we can also use the binomial theorem to prove this and many other interesting facts about binomial coefficients via “proof by plugging in the right numbers”. We claim $\sum_{r=0}^n {{n}\choose{r}} = 2^n.$ Observe that the left-hand side of the equation would be $\sum_{r=0}^n {{n}\choose{r}} x^{n-r} y^r$ where $x=y=1$ . So, using the Binomial Thoerem, we have $\sum_{r=0}^n {{n}\choose{r}} 1^{n-r} 1^r = (1+1)^n,$ which implies $\sum_{r=0}^n {{n}\choose{r}} = 2^n.$

11.3 Multisets

So far we have solved the following counting problems.

sample	type of object	number of objects
ordered with replacement	finite sequences	$n^r$
ordered without replacement	$r$ -permutations	$\dfrac{n!}{(n-r)!}$
unordered without replacement	subsets	$\dfrac{n!}{r!(n-r)!}$

We will close the chapter by answering the natural fourth question: how many unordered samples with replacement of size $r$ may be taken from a universe of $n$ elements?

Example. You intend to spend three days visiting your friend. They have five things they want to show you. It’s not really important which order you do the activities in or what you do each day, just how many things are done each day. In other words we want to know the distribution of the activities. How many such distributions are there, or how many ways are there to choose how many activities are done each day?

Let’s consider one such allocation of activities. Maybe you’ll do one activity on the first day, three on the second, and one on the third. The visit with your friend can be diagrammed like so: $* | *** | *$ Notice that there are two dividers between the activities. You may have heard the expression “three days two nights,” or something similar, to describe a trip away.

This diagram may be directly ported to a bit string: $0100010,$ where the $1$ 's represent the dividers and the $0$ 's represent the activities. There must be $5$ 0’s and $2$ 1’s; altogether there are $2+5=7$ characters in the bit strings. We already know there are $C(7,5)=21$ such bit strings. So there are $21$ distributions of activities over the trip.

More generally, we want to sample from three days five times. In this example, $n=3$ and $r=5$ . (It is okay if $r > n$ when we allow repetition!) If $\{1,2,3\}$ is the universe of days on the trip, our sample is the multiset $\{1,2,2,2,3\}$ .

Definition 11.3.1 Let $\Omega$ be a set with $n$ elements. A multiset $\alpha$ (“alpha”) of cardinality $r$ is a function $\alpha:\Omega \to \mathbf{N}$ that assigns to each element $x \in \Omega$ a natural number, $\alpha(x)$ , called the multiplicity of $x$ , subject to the restriction that $\sum \alpha(x) = r$ .

Example. Consider the universe $\{1,2,3\}$ and the multiset $\{1,2,2,2,3\}$ . Then the multiplicity of $1$ is $\alpha(1)=1$ , because $1$ appears once in the multiset. Likewise $\alpha(2)=3$ and $\alpha(3)=1$ . Observe that $\alpha(1) + \alpha(2) + \alpha(3) = 5.$ Finally, we may instead choose to write the multiset as $\{1^1, 2^3, 3^1\}$ .

As another example, take the multiset $\{1,1,1,3,3\}$ . For this multiset, $\alpha(1)=3$ , $\alpha(2)=0$ , and $\alpha(3)=2$ .

Theorem 11.3.2 Let $n$ and $r$ be natural numbers. There are ${{n-1+r}\choose{r}} = \frac{(n-1+r)!}{r!(n-1)!}$ multisets of cardinality $r$ from a universe of $n$ elements.

sample	type of object	number of objects
ordered with replacement	finite sequences	$n^r$
ordered without replacement	$r$ -permutations	$\dfrac{n!}{(n-r)!}$
unordered without replacement	subsets	$\dfrac{n!}{r!(n-r)!}$
unordered with replacement	multisets	$\dfrac{(n-1+r)!}{r!(n-1)!}$

Takeaways

Unordered samples without replacement are just subsets of the universe, and there are $C(n,r) = n!/(r!(n-r)!)$ of them.
The subset perspective on the binomial coefficients reveals many useful properties, including a recurrence relation and an algebraic tool called the binomial theorem.
Unordered samples with replacement are new objects called multisets – not sets but functions, and there are $C(n-1+r,r)$ of those.

Chapter 11 problems

12. Multinomial coefficients

In the preceding chapters we discussed binary samples in the sense that an element of our universe $\Omega$ is either in the sample or isn’t. In this chapter, we are going to count the ways to arrange the elements of $\Omega$ into one of $k$ states, where $k$ is a positive integer.

12.1 Multinomials

We will answer the following question. Let $[k]=\{1, 2, \ldots, k\}$ for a positive integer $k$ . Given a multiset $\alpha$ of cardinality $n$ from $[k]$ , how many ways are there to sort the elements of $\Omega$ according to that multiset? Recall that a multiset $\alpha$ associates to each $i \in [k]$ a number $\alpha(i)$ , called its multiplicity. We will write $\alpha(i)=r_i$ to be consistent with the notation from the preceding chapter. We would like to know how many ways there are to choose $r_1$ elements of $\Omega$ to be of the first type, $r_2$ elements to be of the second type, etc., $r_k$ elements of the $k$ -th type, so that $\sum r_i = n$ . Furthermore we do not care about the ordering within each type.

We will relate this idea to the example from the last section of the previous chapter.

Example. You are visiting your friend for three days, and they have five things they want to do while you’re in town. Recall you counted there were $21$ ways (multisets) to distribute the activities across the days. You and your friend decided to do one activity the day you get into town, then two the next day, and two the last day. In other words your chosen multiset is $\{1^1, 2^2, 3^2\}$ .

Now that you have chosen that distribution, how many ways can you choose which activity is done on each day? There are two equivalent ways to answer this question; one will lead us to the other. Let’s first pick the first day’s activity. There are five activities and we’d like to choose one; so there are $C(5,1)$ options. That leaves four activities from which to choose two; so, $C(4,2)$ ways to pick the second day’s activites. Finally there are $C(2,2)$ ways to pick the last day’s activities. Applying the multiplication principle there are ${{5}\choose{1}}{{4}\choose{2}}{{2}\choose{2}}$ ways to choose which activity is done on each day. Applying the formulas for the binomial coefficients, that’s $\frac{5!}{1!4!}\frac{4!}{2!2!}\frac{2!}{2!0!}$ ways. Observe that we can cancel two of the numerators, and that $0!=1$ : $\frac{5!}{1!2!2!} = 30.$

This is the second way to solve the problem. We may imagine the five activities are a small deck of cards, and shuffle the deck. The top card gives the activity to do on the first day. The second two cards give the second day’s activities. However, we have double-counted by $2!$ since there are $2!$ ways to arrange those cards, and we don’t care which is first. Likewise with the second two cards.

Theorem 12.1.1 Take a multiset from $[k]$ where $r_i$ is the multiplicity of $i \in [k]$ , and the sum of the $r_i$ is $n$ . Consider a function $f:\Omega \to [k]$ such that the number of $x \in \Omega$ such that $f(x) = i$ is $r_i$ . The number of such functions is given by the multinomial coefficient ${{n}\choose{r_1, r_2, \ldots, r_k}} = \frac{n!}{r_1!r_2!\cdots r_k!}$

Example. How many distinguishable arrangements are there of the word ARRANGEMENT?

“Distinguishable” means that we can tell the difference in the arrangements. For example, there are two R’s in the word. If we call them R1 and R2, there is no difference between A(R1)(R2)ANGEMENT and A(R2)(R1)ANGEMENT.

There are $7$ different letters in the word ARRANGEMENT. If we number the $11$ characters in the word, then we are trying to count the number of functions $f: \{1, 2, 3, \ldots, 11\} \to \{A, R, N, G, E, M, T\}$ such that $r_x$ characters are the letter $x$ , $r_A = r_R = r_E = r_N = 2,$ and $r_G = r_M = r_T = 1.$ By the preceding theorem there are ${{11}\choose{2,2,2,1,2,1,1}} = \frac{11!}{2!2!2!1!2!1!1!} = 2,494,800$ total distinguishable arrangements of the word.

That explanation was perhaps a bit painstaking to make clear the viewpoint of counting functions based on a multiset, but there is a cleaner path to the answer. There are $11!$ ways to arrange all the letters; however, we have double-counted by the $2!$ arrangements of the A’s, R’s, N’s, and E’s, and so we divide by $2!$ four times.

12.2 Multinomial theorems

Observe that when $k=2$ , a multinomial coefficient is exactly a binomial coefficient with $r_1=r$ and $r_2=n-r$ . When $k=3$ we call the numbers trinomial coefficients and thereafter typically say “multinomial coefficients”.

Accordingly, theorems about binomial coefficients generalize to multinomial coefficients. For example, the trinomial coefficients exhibit a similar recurrence relation.

Theorem 12.2.1 Let $n$ be a positive integer and $r_1, r_2, r_3$ be positive integers summing to $n$ . Then, ${{n}\choose{r_1, r_2, r_3}} = {{n-1}\choose{r_1-1,r_2,r_3}} + {{n-1}\choose{r_1,r_2-1,r_3}} + {{n-1}\choose{r_1,r_2,r_3-1}}.$

There is, accordingly, a multinomial theorem:

Theorem 12.2.2 Let $n$ and $k$ be positive integers. Then $(x_1 + x_2 + \cdots + x_k)^n = \sum {{n}\choose{r_1, r_2, \ldots, r_k}} x_1^{r_1} x_2^{r_2} \cdots x_k^{r_k}.$

This theorem tells us that when we expand the polynomial $(x_1 + x_2 + \cdots + x_k)^n$ ,

there are $C(k-1+n, n)$ different terms (one for each multiset of cardinality $n$ from $\{x_1, x_2, \ldots, x_k\}$ –careful with the notation here because the $n$ has switched places!),
each of which is multiplied by the appropriate multinomial coefficient.

Example. There are $C(15,10) = 3,003$ terms of the polynomial $(a+b+c+d+e+f)^{10}$ . Obviously we will not list them all. The $a^5bc^2ef$ term has the coefficient ${{10}\choose{5,1,2,0,1,1}} = \frac{10!}{5!1!2!0!1!1!} = 15,120.$

Takeaways

Counting multisets (discussed in the preceding chapter) counts the number of ways to distribute $r$ objects into $n$ types. Once you have chosen a multiset, you now must choose which objects will be which type according to the distribution. Multinomial coefficients count the number of ways to do so.
Multinomial coefficients are generalizations of binomial coefficients for $k>2$ , so many theorems “come up,” including the recurrence relation and the multinomial theorem.

Chapter 12 problems

13. Probability

Review.

Chapter 9: Enumeration and basic counting principles

The time has come to end our short tour through combinatorics with a conversation about probability, the study of random phenomena. There is a lot more to say about both topics, but that’s for other books to do.

13.1 Introduction to probability

Speaking very loosely, an event’s probability is the “chance” it will occur. There are two kinds of probabilities: empirical probability, the proportion of times an event occurred over a number of trials; and theoretical probability, the proportion of times an event should occur over the long run. We assign theoretical probabilities according to mathematical principles so that the long-run empirical probability tends towards the theoretical probability. It will not surprise you to learn, then, that probability is a function of sets.

Definition 13.1.1 Let $E$ be an event contained in the universe $\Omega$ (in probability-theoretic contexts the universe is called the sample space). If every outcome of $E$ is equally likely, then the probability of $E$ is $P(E) = \frac{|E|}{|\Omega|}.$

To brush some interesting technical details away, we are assuming that $E$ and $\Omega$ are finite sets and doing discrete probability with something called the “counting measure.” There are other ways to measure sets, and those measures have associated probabilities as well; but this is beyond our scope.

Example. Two dice are rolled. They are distinguishable; you may suppose one is red and the other is blue, or that the person rolling kept track of which was rolled first. The sample space of $36$ possible sums of the two dice is tabulated below.

	1	2	3	4	5	6
1	2	3	4	5	6	7
2	3	4	5	6	7	8
3	4	5	6	7	8	9
4	5	6	7	8	9	10
5	6	7	8	9	10	11
6	7	8	9	10	11	12

Let $E$ be the event that the sum is prime and let $F$ be event that the first roll was even. (You may choose the row or column to be the first roll; it won’t matter in this case.) Then, $P(E) = \frac{15}{36}$ and $P(F) = \frac{18}{36}.$ Check these counts yourself.

Sometimes it is desirable to know how one event affects another. For example: does the first roll being even make it more or less likely that the sum will be prime? Or is there no effect? Conditional probability answers this question for us.

Definition 13.1.2 Let $E$ and $F$ be events. The conditional probability of $E$ given that $F$ is known to occur is $P(E|F) = \frac{|E \cap F|}{|F|}.$

A very good interpretation of the above definition is that the original sample space is replaced with or “localized” to be $F$ .

Example. Suppose the rectangle below is a dartboard. We have two players: a novice who throws the dart perfectly randomly, or someone with a little more experience who will always land the dart within quadrilateral $F$ . Who is more likely to land in the target $E$ ? The more experienced player will land in $E$ more often on average. In probabilistic notation, $P(E|F) > P(E)$ .

The set E is visualized as a small circle in the middle of a large rectangle, enclosed in a smaller quadrilateral F.

Example. Revisiting our dice question, does the first roll being even make it more or less likely the sum will be prime?

	1	2	3	4	5	6
1	2	3	4	5	6	7
2	3	4	5	6	7	8
3	4	5	6	7	8	9
4	5	6	7	8	9	10
5	6	7	8	9	10	11
6	7	8	9	10	11	12

There are $18$ outcomes where the first roll is even. Counting, we see $7$ outcomes where the sum is prime and the first roll is even. (Check this.) By definition, $P(E|F) = \frac{7}{18} = \frac{14}{36},$ which is less than $P(E) = 15/36.$ So, prime sums are less likely when the first roll is even (so, more likely when the first roll is odd).

13.2 Algebraic rules

The probability that at least one of $E$ or $F$ occurs is denoted $P(E \cup F)$ , and the probability that both events occur is denoted $P(E \cap F)$ .

Theorem 13.2.1 If $E$ and $F$ are events, then $P(E \cap F) = P(E)P(F|E) = P(F)P(E|F)$ and $P(E \cup F) = P(E) + P(F) - P(E \cap F).$

Proof. Fundamental counting principles, together with the observation that one of $E$ or $F$ has an effect on the other. $\square$

Definition 13.2.2 If $P(E \cap F) = 0$ , the events $E$ and $F$ are disjoint.

Example. The events “the first roll is odd” and “the first roll is even” are disjoint since they cannot both happen at the same time.

Theorem 13.2.3 The following are equivalent.

$P(E \cap F) = P(E)P(F)$
$P(E|F) = P(E)$
$P(F|E) = P(F)$
$E$ and $F$ are independent. (This is a definition of the other three bullets.)

Example. Any event involving just the first die and any event involving just the second die will be independent.

Example. Again with the dice:

	1	2	3	4	5	6
1	2	3	4	5	6	7
2	3	4	5	6	7	8
3	4	5	6	7	8	9
4	5	6	7	8	9	10
5	6	7	8	9	10	11
6	7	8	9	10	11	12

The probability that a sum is prime and a first roll is even is, by counting, $P(E \cap F) = \frac{7}{36}.$ We may also use the multiplication rule to calculate this probability: $P(E \cap F) = P(F)P(E|F) = \frac{18}{36}\frac{7}{18} = \frac{7}{36}.$

The probability that a sum is prime or a first roll is even is, again by counting first, $P(E \cup F) = \frac{26}{36}.$ We may also use the addition rule: $P(E \cup F) = P(E) + P(F) - P(E \cap F) = \frac{15}{36} + \frac{18}{36} - \frac{7}{36} = \frac{26}{36}.$

Takeaways

Probability is in some sense the study of relative measures of sets.
When the elements of the sample space can be enumerated, this is usually the correct approach.
If not, the addition and multiplication principles lift to rules about probability (with the added wrinkle of conditional probability).

Chapter 13 problems

14. Boolean matrices

Review.

Chapter 1: Union, intersection, and complement
Chapter 2: Disjunction, conjunction, and negation
Chapter 3: Predicates

Relations allow us to mathematically encode various relatonships between sets. There are many such relationships we may be interested in: graphs, functions, equivalences, and orderings. But before we tell that story, we have to tell this one: the story of Boolean matrices.

14.1 Matrices and Boolean algebras

Did you notice at the beginning of the book that logic and set theory were incredibly similar to one another? They are indeed different implementations of the same idea, Boolean algebra. As you read the following technical details, just keep in mind that a Boolean algebra is formalizing the relationships between logical connectives and set operations.

Definition 14.1.1 A bit is an element of the set $\{0,1\}$ together with commutative binary operations join $\vee$ and meet $\wedge$ and unary operation complement $\neg$ such that $a \vee b = \begin{cases} 0 & a=b=0 \\ 1 & \text{otherwise} \end{cases},$ $a \wedge b = \begin{cases} 1 & a=b=1 \\ 0 & \text{otherwise} \end{cases},$ and $\neg a = \begin{cases} 1 & a=0 \\ 0 & a=1 \end{cases}.$

Definition 14.1.2 A Boolean algebra is a set $\mathcal{A}$ together with commutative binary operations $\vee$ and $\wedge$ such that $a \vee 0 = a, \qquad a \wedge 1 = a,$ $a \vee (b \wedge c) = (a \vee b) \wedge (a \vee c),$ and $a \wedge (b \vee c) = (a \wedge b) \vee (a \wedge c),$ and a unary operation $\neg$ such that $a \vee \neg a = 1, \qquad a \wedge \neg a = 0.$

Examples. The set $\{0,1\}$ is the smallest non-degenerate Boolean algebra. Here are two other Boolean algebras. You will be asked to verify that these are Boolean algebras in the exercises.

Let $\mathcal{S}$ be a finite collection of statements formed from a set of $n$ distinct statement letters, and let $\mathcal{S^*}$ be the set of statements in $\mathcal{S}$ together with all possible conjunctions, disjunctions, and negations of those statements. (As an exercise, prove that the tautology $T$ and the contradiction $F$ must be in $\mathcal{S^*}$ .) Then $\mathcal{S^*}$ is the smallest Boolean algebra containing $\mathcal{S}$ .
Let $\mathcal{E}$ be a finite collection of subsets of a set $X$ . Let $\mathcal{E^*}$ be the set of all finite intersections, unions, and complements of those sets. (You can prove that $X$ and $\varnothing$ must be in $\mathcal{E^*}$ .) Then $\mathcal{E^*}$ is the smallest Boolean algebra containing $E$ .

Loosely speaking, a matrix is a rectangular array of data. This data can be anything with an algebraic structure, including bits and numbers. While matrices serve as a convenient way to store information, they also have applicatons in solving systems of linear equations and in encoding relations and functions. It is this last usage that we are the most interested. Before you read the definition below remember $[m]=\{1, 2, \ldots, m\}$ when $m$ is a positive integer.

Definition 14.1.3 An $m \times n$ matrix is a function $A:[m]\times[n] \to X$ where $X$ is a set with certain algebraic structure (for us, bits or numbers) such that the entry $A(i,j)$ in the $i$ -th row and $j$ -th column is written $A_{i,j}$ . To specify the elements that belong to the matrix we sometimes write $A=(A_{i,j})$ . We say $A$ has $m$ rows and $n$ columns and can represent the matrix visually as $A = \left( \begin{array}{cccc} A_{1,1} & A_{1,2} & \cdots & A_{1,n} \\ A_{2,1} & A_{2,2} & \cdots & A_{2,n} \\ & & \ddots & \\ A_{m,1} & A_{m,2} & \cdots & A_{m,n} \end{array} \right).$

More than you wanted to know: The desired properties of the set $X$ are the presence of an operation playing the role of addition and an operation playing the role of multiplication, such that subtraction is possible and otherwise these operations satisfy all the usual properties that addition and multiplication do. Such a structure is called a ring. The integers $\mathbf{Z}$ are the classic example as the smallest ring containing all the elements of $\mathbf{N}$ , and the Boolean algebra $\{0,1\}$ is another. If all the non-zero elements of the ring can be inverted – i.e., you can divide as well – the ring is called a field and then matrix division may be possible as well.

Definition 14.1.4 A Boolean matrix is a matrix whose entries are bits.

Examples. Consider the Boolean matrix $A = \left( \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right).$ This is a matrix with $2$ rows and $3$ columns, so we would say it is a $2 \times 3$ matrix. The entry $A_{1,3}=1$ , the entry $A_{2,1}=0$ , and the entry $A_{3,1}$ does not exist for this matrix.

What follows are some worthwhile definitions for studying matrices and relations.

Definition 14.1.5 An $m \times n$ matrix is square if $m=n$ .

Definition 14.1.6 Let $A$ be a square $n \times n$ matrix. The entries $A_{i,j}$ where $i=j$ are called diagonal entries and entries that are not diagonal are off-diagonal entries.

Definition 14.1.7 A square matrix whose only nonzero entries are diagonal is a diagonal matrix.

Definition 14.1.8 The identity matrix $I_n$ of dimension $n$ is the $n\times n$ diagonal matrix whose diagonal entries are all $1$ .

Examples. The matrix $A = \left( \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 0\end{array} \right)$ is square but not diagonal, the matrix $B =\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0\end{array} \right)$ is diagonal but not $I_3$ , and $I_3=\left( \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right).$

Definition 14.1.9 Let $A$ and $B$ both be $m \times n$ Boolean matrices. We say $A \le B$ if $a_{i,j} \le b_{i,j}$ (as numbers) for all $(i,j) \in [m]\times[n]$ . Equivalently, we can say that $A \le B$ if $B$ never has a $0$ in a position where $A$ has a $1$ .

Example. For instance, $\left( \begin{array}{cc} 1 & 0\\ 0 & 1 \\ 1 & 0\end{array} \right) \le \left( \begin{array}{cc} 1 & 0\\ 1 & 1 \\ 1 & 1\end{array} \right)$ but $\left( \begin{array}{cc} 1 & 0\\ 0 & 1 \\ 1 & 0\end{array} \right) \not\le \left( \begin{array}{cc} 0 & 1\\ 0 & 1 \\ 1 & 1\end{array} \right).$

14.2 Basic matrix operations

In this section we will explore basic operations involving matrices: meet, join, complement, and transposition. Multiplication requires enough work to necessitate its own section.

Definition 14.2.1 Let $A$ and $B$ be $m\times n$ matrices. Their meet (or join) is the $m\times n$ matrix $A \wedge B = (A_{i,j} \wedge B_{i,j})$ (or $A \vee B=(A_{i,j} \vee B_{i,j})$ ) whose $i,j$ -th entry is the meet (join) of the $i,j$ -th entries of $A$ and $B$ .

Examples. $\left (\begin{array}{ccc} 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right) \wedge \left (\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right) = \left (\begin{array}{ccc} 0 & 0 & 0 \\ 0& 0 & 1 \end{array} \right)$

$\left (\begin{array}{ccc} 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right) \vee \left (\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right) = \left (\begin{array}{ccc} 0 & 1 & 1 \\ 1 & 1 & 1 \end{array} \right)$

Definition 14.2.2 Let $A=[A_{i,j}]$ be a $m\times n$ Boolean matrix. Its complement is the $m\times n$ Boolean matrix $\neg A = (\neg A_{i,j})$ .

Example.

$\neg \left (\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 1 \end{array} \right) = \left (\begin{array}{ccc} 1 & 1 & 0 \\ 1 & 0 & 0 \end{array} \right)$

The last operation we need to look at in this section is transposition, where a matrix’s rows and columns are interchanged.

Definition 14.2.3 Let $A=(A_{i,j})$ be a $m\times n$ Boolean matrix. Its transpose is the $n \times m$ Boolean matrix $A^T = (A_{j,i})$ .

Example.
$\left (\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{array} \right)^T = \left (\begin{array}{ccc} 1&1&0 \\ 1&0&1 \end{array} \right)$

14.3 Multiplication of Boolean matrices

Multiplication of matrices in general, and Boolean matrices in particular, is not as straightforward as you would hope at first. As we will see in the next chapter, each Boolean matrix represents a relation, and multiplication of matrices should correspond to a combination of relations.

Real matrices, i.e. matrices with values in $\mathbf{R}$ , correspond to linear transformations, functions that rotate, stretch, or shear space. Multiplication of real matrices corresponds to composition of these functions.

Definition 14.3.1 Let $A = (\begin{array}{cccc}A_1 & A_2 & \ldots & A_n\end{array})$ be a $1\times n$ Boolean row matrix and $B = (\begin{array}{cccc}B_1 & B_2 & \ldots & B_n\end{array})^T$ be a $n\times 1$ Boolean column matrix. Their Boolean dot product is the bit $A \odot B = (A_1 \wedge B_1) \vee (A_2 \wedge B_2) \vee \cdots \vee (A_n \wedge B_n).$ We may also write this in “big vee” notation: $A \odot B = \bigvee_{r=1}^n (A_i \wedge B_i).$

Example. Let $A=\left(\begin{array}{cccc} 0 & 1 & 0 & 1\end{array}\right)$ and $B=\left( \begin{array}{c} 1 \\ 1 \\ 0 \\ 0 \end{array} \right).$ Then,
$A \odot B = (0 \wedge 1) \vee (1 \wedge 1) \vee (0 \wedge 0) \vee (1 \wedge 0).$ Evaluating each of the conjunctions, we have $A \odot B = 0 \vee 1 \vee 0 \vee 0.$ Since $A \odot B$ is a join with at least one $1$ , $A \odot B = 1.$ In fact, $A\odot B$ is only $1$ because there is an entry (the second one) where both $A_i$ and $B_i$ are 1. If there are no such entries, the Boolean dot product will always be 0. This theorem is highly useful in quickly calculating Boolean products.

Theorem 14.3.2 Let $A=(A_i)$ be a $1\times n$ Boolean matrix and $B=(B_i)$ be a $n\times 1$ Boolean matrix. Their Boolean dot product $A \odot B$ is $1$ if and only if there exists an integer $k$ such that $A_k$ and $B_k$ are both $1$ .

Reflect for a moment that this definition does not drop out of thin area, as most don’t. Suppose we have predicates $A(i)$ and $B(i)$ whose domains are both $[n]$ . The Boolean dot product $A\odot B$ is 1 if and only if there exists an object in the domain for which $A(i) \wedge B(i)$ is true. As an example, let $A(i)$ be the statement " $i$ is even" and $B(i)$ be the statement “ $i \le 2$ ”, both over the domain $[4]$ . Then the above example answers the question, “Is there an element of $[4]$ that is both even and less than or equal to $2$ ?”

Definition 14.3.3 Let $A=(A_{i,j})$ be a $m\times k$ Boolean matrix and $B=(B_{i,j})$ be a $k \times n$ Boolean matrix. Their Boolean product is the $m\times n$ matrix $A\odot B = \left( \bigvee_{r=1}^k (A_{i,r} \wedge B_{r,j})\right);$ that is, the $i,j$ -th entry of $A \odot B$ is the Boolean dot product of the $i$ -th row of $A$ with the $j$ -th column of $B$ .

Note that $A$ must have exactly as many columns as $B$ has rows, else the Boolean dot products will be undefined.

Example. Let $A = \left(\begin{array}{cc} 0 & 1 \\ 0 & 1\\ 1 & 0 \end{array}\right)$ and $B = \left(\begin{array}{ccc} 1 & 1 & 0 \\ 0 & 1 & 0 \end{array}\right).$ Their Boolean product $A \odot B$ is the matrix whose entries are the Boolean dot products of the appropriate rows of $A$ with the corresponding columns of $B$ .

For example, the entry in the first row and first column is $(0 \wedge 1) \vee (1 \wedge 0) = 0 \vee 0 = 0,$ as evidenced by the fact that the first row $(0,1)$ of $A$ and the first column $(1,0)^T$ of $B$ do not have any $1$ 's in the same spot. Meanwhile, the $1,2$ -th entry of the product is $1$ , because the first row and the second column both have a $1$ in the second position. Here is the full product: $A \odot B = \left( \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{array} \right)$

As an exercise, compute $B \odot A$ and verify that it is not equal to $A \odot B$ .

From the logical perspective, a Boolean matrix is a family of statements. The Boolean product then gives a matrix whose entries answer the question, “Is there a subject for which the statement represented by this row and the statement represented by that column are both true simultaneously?”

Takeaways

A Boolean algebra is a structure that models the algebraic behavior of collections of sets, as well as statements. The simplest of these is the algebra $\{0,1\}$ of bits.
A Boolean matrix is an array of bits.
One can take the meet, join, and complement of a matrix just like one would take the meet, join, and complement of the corresponding bits.
We will see soon that products of Boolean matrices represent combinations of relations. In fact, we can already understand how Boolean matrices represent predicate statements quantified over finite domains.

Chapter 14 problems

15. Relations

Review.

Chapter 1: Sets, especially subsets and cartesian products
Chapter 2 and 3: Logic, particularly proofs relying on the emptiness of the empty set (e.g., “The empty set is a subset of any set.”)

As previously stated, a relation is a way to mathematically encode the idea of a relationship between objects. What does that mean? When we “encode” an idea mathematically, we are defining a mathematical object that captures the important data of that idea. Because the thing is now a mathematical object, we can do mathematics to it. If the thing has become an expression involving real numbers, it is now subject to algebra. If the thing has become a set, it is now subject to set operations, logic, proof, combinatorics, so on. One may imagine importing a package into a programming language so they have new objects to “play with” in that language.

In this chapter we will define relations, discuss various ways to represent them, and discuss some properties of interesting relations.

15.1 Basic notions and representation

Remember that the product $X \times Y$ of two sets is the collection of all pairs $(x,y)$ whose first component is from $X$ and whose second component is from $Y$ . Therefore, this is a good place to start thinking about connecting two sets.

Definition 15.1.1 Let $X$ and $Y$ be sets. A relation $R$ from $X$ to $Y$ is any subset $R \subseteq X \times Y$ . We say that $X$ is the domain and $Y$ is the codomain of $Y$ . When $(x,y) \in R$ , we say " $x$ is related to $y$ " and may write $xRy$ .

Example. Let $X = \{x,y,z\}$ and $Y=\{1,2\}$ . One of the many relations from $X$ to $Y$ is the set $\{ (x,1), (x,2), (y,2) \}.$ We would say (for instance) that $x$ is related to $1$ , or that $z$ is related to nothing.

Some relations have their own symbols. We will be a little sloppy and sometimes identify the statement $xRy$ with the relation itself.

Example. The relation $\Delta=\{(x,x) \;|\; x \in X\}$ is the relation of all pairs $(x,y) \in X \times X$ where $x=y$ . (When the domain and codomain of a relation are the same we say it is a relation on $X$ .) This relation is called the diagonal relation on the set $X$ . (Exercise: Figure out why.) It may be denoted with the symbol $\Delta$ (capital delta), or it may be denoted with the statement characterizing its elements. In other words, we might say “Consider the relation $x=y$ on a set $X$ .”

You have seen functions in algebra, or perhaps calculus, and we have used them in various definitions. In other words, a function $f:X \to Y$ is an assignment that pairs every element $x \in X$ with a unique element $f(x) \in Y$ . A function is a relation, subject to the extra criteria that every element of $X$ is paired to exactly one element of $Y$ , whereas a relation can have elements of $X$ related to many elements of $Y$ , or no elements of $Y$ .

If $X$ and $Y$ are finite sets there are two convenient ways to represent a relations between $X$ and $Y$ . Without loss of generality, label the elements of $X$ as $x_1, x_2, \ldots, x_m$ and those of $Y$ as $y_1, y_2, \ldots, y_n$ . Consider that each element $x_i \in X$ induces the statement " $x_i$ is related to $y_j$ ". Remembering that Boolean matrices encode families of statements, we may define the $m\times n$ matrix $M_R = (m_{i,j})$ where $m_{i,j} = 1$ if and only if $x_i$ is related to $y_j$ .

Example. Earlier we had the relation $\{ (x,1), (x,2), (y,2) \}$ from $X = \{x,y,z\}$ to $Y=\{1,2\}$ . There are many different matrices representing this relation depending on how the sets $X$ and $Y$ are ordered. When there is an “obvious” ordering, say numerical or alphabetical, we will use that one. Under these orderings the matrix for the relation is $\left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{array} \right)$ .

Consider a relation $R$ on a set $X$ (meaning domain and codomain are the same set). We can also represent $R$ visually with something called a directed graph or digraph for short. From this perspective, the elements of $X$ are called the vertices of the graph and the pairs $(x,y)$ in the relation are called edges or arrows. An arrow is drawn from one vertex to another if those vertices are related.

Example. Let $X = \{a,b,c,d\}$ , and consider the relation $R = \{ (a,b), (a,d), (b,b), (b,c), (d,a) \}.$ The directed graph for this relation is shown below.

Figure. The directed graph for the relation is shown.

It is possible to draw many graphs for one relation, so why can we say “the directed graph”? Observe that any graphs will have the same arrows between the same vertices. Two graphs for which there is a 1-1 correspondence that preserves the edges are isomorphic and are functionally the same. Therefore, we say “the” directed graph for a relation.

15.2 Properties of relations

As we study relations in the remaining chapters of this book there are a few properties that will make for valuable terminology. For the remainder of the chapter, every relation has the same domain and codomain.

Definition 15.2.1 A relation $R$ on a set $X$ is reflexive if every element is related to itself, in other words $(x,x) \in R$ for all $x \in X$ . The pair $(x,x)$ is a diagonal pair.

Definition 15.2.2 A relation $R$ on a set $X$ is antireflexive if no element is related to itself, in other words $(x,x) \not\in R$ for all $x \in X$ .

So a relation is reflexive if every element is related to itself, and antireflexive if none are. A relation can be neither, if some elements are related to themselves and some aren’t! (One relation is both, but that’s an exercise.)

In all of this section’s examples you should draw a digraph for each relation yourself and make sure you agree with which properties each relation has.

Examples. Let $X = \{a,b,c,d\}$ . Consider the relations $R_1 = \{ (a,b), (a,d), (b,c), (b,d), (d,b), (d,c) \},$ $R_2 = \{ (a,a), (a,d), (b,b), (b,d), (c,c), (c,d), (d,d) \},$ $R_3 = \{(a,b), (b,b), (b,c),(c,d),(d,a), (d,d)\},$ as well as the full relation $X \times X$ and the empty relation $\varnothing$ .

The relations $R_2$ and $X\times X$ are reflexive as they contain all pairs of the form $(x,x)$ . The relations $R_1$ and $\varnothing$ are likewise antireflexive. The relation $R_3$ is neither reflexive nor antireflexive, as it contains some of these “diagonal pairs” but not all.

Definition 15.2.3 A relation $R$ on a set $X$ is symmetric if for all $x, y \in X$ where $(x,y) \in R$ , then $(y,x) \in R$ . Such a pair $(x,y)$ is said to be invertible.

Definition 15.2.4 A relation $R$ on a set $X$ is antisymmetric if for all $x, y \in X$ where $(x,y) \in R$ , then $(y,x) \not\in R$ unless $x=y$ . Equivalently the only invertible pairs in an antisymmetric relation are the diagonal pairs.

It is crucial to understand that symmetry and antisymmetry are two ends of a spectrum, measuring how many of the pairs in the relation are invertible. If they all are, the relation is symmetric. We would like to begin the next sentence with “if none are” but this is actually impossible, because if $(x,x) \in R$ then it is automatically invertible. In other words the fewest amount of pairs that can be inverted are the diagonal pairs. So, an antisymmetric relation is one where the only invertible pairs are on the diagonal. Finally, a relation may be neither or both.

Examples. As before let $X = \{a,b,c,d\}$ and consider the relations $R_1 = \{ (a,b), (a,d), (b,c), (b,d), (d,b), (d,c) \},$ $R_2 = \{ (a,a), (a,d), (b,b), (b,d), (c,c), (c,d), (d,d) \},$ $R_3 = \{(a,b), (b,b), (b,c),(c,d),(d,a), (d,d)\},$ as well as the full relation $X \times X$ and the empty relation $\varnothing$ .

The relation $X \times X$ is symmetric, because it contains all the possible pairs; so every $(x,y)$ will find its $(y,x)$ . It is less obvious but equally true that $\varnothing$ is symmetric. The definition is that every pair must be invertible; if there are no pairs then the definition is satisfied automatically.

The relations $R_1$ , $R_2$ , and $R_3$ are each not symmetric because they contain a pair that is not invertible. The relation $R_1$ has $(a,d)$ but no $(d,a)$ ; $R_2$ has $(b,d)$ but no $(d,b)$ ; and $R_3$ has $(a,b)$ but no $(b,a)$ , for example.

Of these, $R_1$ is not antisymmetric, because it has “some” symmetry (the pairs $(b,d)$ and $(d,b)$ ). The relations $R_2$ and $R_3$ are antisymmetric. The empty set $\varnothing$ manages to be both symmetric and antisymmetric because just as there are no pairs to invert, there are also no pairs to not invert. Vacuousness!

The final property to discuss is transitivity. You have heard this word before, both without this book (transitivity of equality is used to solve equations in algebra) and within (inclusion, implication, and divisibility are all transitive).

Definition 15.2.5 A relation is transitive if for all $x, y, z \in X$ , whenever $(x,y) \in R$ and $(y,z) \in R$ we also have $(x,z)\in R$ .

There are two implications of this definition. The first is that any local symmetry within a transitive relation – that is, the presence of an invertible pair – must involve a loop. That is, suppose we have $(x,y) \in R$ and $(y,x) \in R$ in our relation. If we replace $z$ with $x$ in the above definition it forces us to include the loop $(x,x)$ in our relation.

The next implication involves a new definition.

Definition 15.2.6 Consider a relation $R$ on the set $X = \{x_1, x_2, \ldots, x_n\}$ . A walk of length $n$ is a sequence $(x_1, x_2, \ldots, x_n, x_{n+1})$ where $(x_i, x_{i+1}) \in R$ for all $1 \le i \le n$ .

(The length of the walk is a measure of the number of arrows involved, not the number of elements in the sequence.)

The second implication is that in a transitive relation, if $(x_i)$ is a path beginning at $x_1$ and ending at $x_n$ , then $x_1$ must be related to $x_n$ .

Examples. One more time: Let $X = \{a,b,c,d\}$ and consider the relations $R_1 = \{ (a,b), (a,d), (b,c), (b,d), (d,b), (d,c) \},$ $R_2 = \{ (a,a), (a,d), (b,b), (b,d), (c,c), (c,d), (d,d) \},$ $R_3 = \{(a,b), (b,b), (b,c),(c,d),(d,a), (d,d)\},$ as well as the full relation $X \times X$ and the empty relation $\varnothing$ .

The relation $R_1$ is missing the edge $(a,c)$ , for example, the loop $(b,b)$ , so it is not transitive.

However, $R_2$ is transitive. There are no pairs $(x,y)$ and $(y,z)$ missing their $(x,z)$ . A common mistake is to ask, “Shouldn’t we require $(a,b)$ for transitivity?” but neither $a$ nor $b$ are actually related to each other. Both are related to $d$ , but the direction of the arrow matters. (If the $(b,d)$ arrow were reversed, then a $(a,b)$ arrow would be necessary.)

The relation $R_3$ is not transitive. It is missing many edges required for transitivity; $(b,d)$ is one example. As an exercise, try to find all of the missing edges that keep $R_1$ and $R_3$ from being transitive.

The relation $X\times X$ is transitive because it contains all possible pairs. No pairs can possibly be missing, so it is transitive “by default.” Likewise $\varnothing$ is transitive because there are no pairs $(x,y)$ and $(y,z)$ that can fail to have their $(x,z)$ .

Takeaways

A relation takes the idea of a relationship and “makes it mathematical” so that it may be studied with the tools of mathematics.
A relation is a subset of the product of two sets. Relations may be represented as sets of pairs, Boolean matrices, or directed graphs.
A relation on a set $X$ may be reflexive or antireflexive depending on how many of the elements of $X$ are related to themselves.
A relation on $X$ may be symmetric or antisymmetric depending on how many of its relationships “go the other way.”
A relation on $X$ may be transitive depending on how information “flows” through the relation.

Chapter 15 problems

16. Closure and composition

Review.

Chapter 1: That throwaway comment about infinite unions.
Chapter 14: The Boolean matrix product and its application to logic.

Many times we will have a mathematical object in hand but desire it to have a property it does not have. For example, suppose I have the natural numbers $\mathbf{N}$ , but would like to be able to subtract these numbers. Sometimes I will be successful. For instance, $5-3=2 \in \mathbf{N}$ . But othertimes I will not; say, in the case of $3-5=-2$ , which is not a natural number.

Then we construct a new object, the closure of the original, that contains the same data and the minimum amount of extra data required to allow the desired property. In the case of closing $\mathbf{N}$ under subtraction, the result is $\mathbf{Z}$ , the smallest set containing both the natural numbers and the results of subtracting any two natural numbers.

A closure $Y$ of the set $X$ under the property $P$ must satisfy the following three conditions:

$Y$ must have property $P$ .
$X \subseteq Y$ . In other words, $Y$ must be a superset of $X$ . (“Subset” and “superset” are converses.)
For any $Z$ such that $Z$ has property $P$ and $X \subseteq Z$ , it must also be the case that $Y \subseteq Z$ . In this sense $Y$ is minimal among the supersets of $X$ that have $P$ . (Chapter 18 will make more sense of this idea, but not specifically as it pertains to closure.)

In this chapter we will close a relation under reflexiveness, symmetry, and transitivity. The third of these will require a detour into the operation of composing relations, and finally provide the response to Chapter 14’s call.

16.1 Reflexive and symmetric closures

In this section, $R$ is a relation on the set $X$ . The set $X$ need not be finite unless we want to talk about a matrix or a directed graph.

Example. Suppose we have the relation $R = \{(a,b), (a,c), (b,b), (c,b)\}$ on $\{a,b,c\}$ . To construct the reflexive closure of $R$ , we must add the pairs $(a,a)$ and $(c,c)$ . (Notice $(b,b)\in R$ already.) So the reflexive closure of $R$ is the new relation $\{(a,a),(a,b),(a,c),(b,b),(c,b),(c,c)\}.$

Definition 16.1.1 Given a set $X$ , the diagonal relation $\Delta$ on $X$ is the relation $\Delta = \{(x,x) \;|\; x \in X \}.$

The diagonal relation is the smallest reflexive relation on a set.

Definition 16.1.2 Given a relation $R$ on a set $X$ , the reflexive closure of $R$ is the relation $R \cup \Delta$ .

In other words, to close a relation under reflexiveness it is only necessary to ensure it contains all the pairs in the diagonal relation.

Are these really definitions? Well, no. In the Form of discrete math textbook in the library of Heaven, the definition of closure is the one given in the preamble to the section – in this case, the smallest reflexive superset of $R$ – and then it is a theorem that $R \cup \Delta$ is the reflexive closure. (It is also a theorem that a closure is necessarily unique, but this is easier to prove and only needs proving once for all closures.) Fortunately for us, this is not the Form of the discrete math textbook in the library of Heaven; so we will take these as definitions. Prove that they are truly what we say they are, as an exercise, if you like.

Next we will construct the symmetric closure of any relation.

Example. Again take $R = \{(a,b), (a,c), (b,b), (c,b)\}$ on $\{a,b,c\}$ . To construct the symmetric closure of $R$ , we must add the pairs $(b,a)$ , $(c,a)$ , and $(b,c)$ . (As previously discussed, $(b,b)$ is invertible automatically.) Therefore the symmetric closure of $R$ is $\{(a,b), (a,c), (b,a), (b,b), (b,c), (c,a), (c,b)\}.$

Definition 16.1.3 Given a relation $R$ on a set $X$ , its inverse relation is the relation $R^{-1} = \{ (y,x) \; | \; (x,y) \in R \}.$

Example. You have seen this before. If $f$ is a function that has an inverse and $(x,y) \in f$ (meaning $y=f(x)$ ), then $(y,x) \in f^{-1}$ (meaning $x=f(y)$ ).

Definition 16.1.4 Given a relation $R$ on a set $X$ , the symmetric closure of $R$ is the relation $R \cup R^{-1}$ .

16.2 Towards a transitive closure

We can’t define a transitive closure yet, but we can probably build one. Let’s do an example that is particularly relevant at the time this book was written.

Example. In the past two weeks, Alicia has hung out with Bob. Bob and Carlos are roommates, and Davina was Carlos’s cashier at the grocery store the other day. Davina lives with her boyfriend, Erik. If Alicia gets sick, who is at risk?

The underlying relation is
$R = \{ (a,b), (b,c), (c,d), (d,e) \}$
on the set $\{a,b,c,d,e\}$ (where $a$ refers to Alicia, etc.), the set of all pairs $(x,y)$ where $x$ interacted with $y$ over the past two weeks. However, this is not the relation that totally represents the spread of the illness, because if $y$ was exposed to the illness in $x$ and likewise $z$ was exposed by $x$ , then $z$ could have caught something from $y$ . So if $(x,y)$ and $(y,z)$ are in $R$ , we would like $(x,z)$ to be in our relation. Therefore we are looking for the transitive closure of $R$ .

Transitivity is best understood visually. Draw a directed graph with vertices representing $a$ , $b$ , $c$ , $d$ , and $e$ , and draw arrows between $a$ and $b$ , $b$ and $c$ , $c$ and $d$ , and $d$ and $e$ . These are the arrows in $R$ . If our relation is to be transitive, these arrows will induce arrows between $a$ and $c$ , between $b$ and $d$ , and between $c$ and $e$ , i.e. the arrows corresponding to the paths of length 2 in the original relation. Then we will get arrows between $a$ and $d$ and between $b$ and $e$ (paths of length 3). Finally, we will draw one last arrow between $a$ and $e$ (paths of length 4). You can verify for yourself that the relation $R^* = \{(a,b), (a,c), (a,d), (a,e), (b,c), (b,d), (b,e), (c,d), (c,e), (d,e)\}$ is transitive.

If you understand that example, then you understand the idea of a transitive closure. It remains to put some formal machinery behind the idea.

16.3 Composition

Composition may be a familiar idea from algebra. Suppose you have functions $f$ and $g$ such that $f(x)=y$ and $g(y)=z$ . Then you have a third function $fg$ such that $fg(x) = z$ , called the composition of $f$ and $g$ .

You are extremely likely to have seen $fg$ written as $g\circ f$ before, and probably again outside of this book. We will have a material reason to prefer the $fg$ notation very shortly. (And I prefer it for aesthetic reasons. - DW)

A composition of relations is the identical idea, but without the extra restrictions that make a relation into a function.

Definition 16.3.1 Let $R$ be a relation from $X$ to $Y$ , and $S$ be a relation from $Y$ to $Z$ . Their composition is the relation $RS = \{(x,z) \; | \; \exists y \text{ s.t. } (x,y) \in R \text { and } (y,z) \in S \}.$ The composition of $R$ with itself $n$ times is denoted $R^n$ .

Again, elsewhere you may see $RS$ written as $S \circ R$ .

Example. Let $R = \{ (x_1,y_2), (x_2,y_3), (x_3,y_1) \}$ be a relation from $X=\{x_1,x_2,x_3\}$ to $Y=\{y_1,y_2,y_3\}$ , and $S = \{(y_1,z_1), (y_1,z_2), (y_2,z_2)\}$ be a relation from $Y$ to $Z=\{z_1,z_2\}$ . What is $RS$ ?

The relation $RS$ is the set of all pairs $(x,z)$ for which there is a $y$ “connecting” the $x$ and the $z$ . Looking at the given relations we may identify the following pairs: $(x_1,z_2)$ via $y_2$ , $(x_3,z_1)$ via $y_1$ , and $(x_3,z_2)$ also via $y_1$ . Therefore $RS = \{(x_1, z_2), (x_3,z_1), (x_3,z_2)\}.$

The following diagram illustrates this composition.

A picture of the composition of two relations is shown. Two elements are related in the composition if a third in the middle set 'bridges' them.

Figure. A picture of the composition of two relations is shown. Two elements are related in the composition if a third in the middle set "bridges" them.

Observe that $SR$ does not exist because the codomain of $S$ is not the domain of $R$ .

You may tell yourself, “I don’t want to draw this diagram every time I compute a composition.” And you may tell yourself, “If the relations involved were large, I would like to be able to make a computer do this job for me.” Fortunately this apparatus is already in place! Recall the criteria for a pair $(x_i,z_j)$ appearing in the composition for sets whose elements are labeled numerically:

" $x_i$ is related to $z_j$ if there exists a $k$ such that $x_k$ is related to $y_k$ and $y_k$ is related to $z_k$ ."

You may recall this exact language has appeared before:

“The Boolean dot product of a row matrix $(A_i)$ and a column matrix $(C_j)$ " is $1$ if there exists a $k$ such that $A_k$ and $C_k$ are both $1$ .”

It is precisely the multiplication of Boolean matrices that will find the necessary $y_k$ 's for us. Indeed,

Theorem 16.3.2 Let $R$ and $S$ be relations such that the composition $RS$ exists. Then $M_{RS} = M_R \odot M_S,$ i.e. the matrix for $RS$ is the product of the matrix for $R$ with the matrix for $S$ .

Example. In the last example we let $R = \{ (x_1,y_2), (x_2,y_3), (x_3,y_1) \}$ be a relation from $X=\{x_1,x_2,x_3\}$ to $Y=\{y_1,y_2,y_3\}$ , and $S = \{(y_1,z_1), (y_1,z_2), (y_2,z_2)\}$ be a relation from $Y$ to $Z=\{z_1,z_2\}$ .

The matrix for $R$ is $M_R = \left( \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right)$ and the matrix for $S$ is $M_S = \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{array} \right).$ Their product is $\left( \begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right) \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{array} \right) = \left( \begin{array}{cc} 0 & 1 \\ 0 & 0 \\ 1 & 1 \end{array} \right),$ which is also the matrix for $RS$ .

16.4 Transitive closure

Composition takes the pairs $(x,y)$ and $(y,z)$ and returns $(x,z)$ , so you can already imagine how this would relate to transitivity. A transitive relation is one that contains all of its self-compositions.

Theorem 16.4.1 A relation $R$ on a set $X$ is transitive if and only if $R^n \subseteq R$ for all positive integers $n$ .

Therefore, the smallest transitive relation containing $R$ is the union of all of the $R^n$ . You will recall a very, very long time ago that we defined the infinite union $\bigcup_{n=1}^\infty X_n = X_1 \cup X_2 \cup \cdots$ to be the set of all elements that are in at least one of the $X_i$ , and said that this definition would be needed exactly once in the book. Well, here we are!

Definition 16.4.2 The transitive closure of a relation $R$ on a set $X$ is the relation $R^* = \bigcup_{n=1}^\infty R^n.$

Of course, in practice, we will not need to calculate infinitely many composites. You will be able to tell that there are no new pairs that would arise from a new composition.

The following comment makes it easy to conceptualize $R^n$ . Recall a walk of length $n$ is a sequence $(x_1, x_2, \ldots, x_{n+1})$ where $(x_i,x_{i+1}) \in R$ . We can call these $n$ -walks for short.

Theorem 16.4.3 Given a relation $R$ on a set $X$ , the relation $R^n$ is precisely the set of all pairs $(x,y)$ such that there is an $n$ -walk in $R$ beginning at $x$ and ending at $y$ .

Example. Consider the set $X=\{w,x,y,z\}$ and the relation $R = \{(w,x), (x,y), (y,z), (z,y) \}$ . Construct $R^*$ .

We will find $R^2$ and $R^3$ , and the union of these with $R$ will be the transitive closure. (We will verify that $R^4$ will yield nothing new.)

To find $R^2$ , find all of the pairs that are “two steps away” in the original relation. You may immediately name $(w,y)$ and $(x,z)$ as two such pairs, but there are two less obvious pairs: $(y,y)$ and $(z,z)$ . Recall from the last chapter that any time you have invertible pairs in a transitive relation, that symmetry must produce loops; because $y$ is related to $z$ and $z$ is related to $y$ , for the relation to be transitive $y$ must be related to $y$ . (Same for $z$ .) Therefore $R^2 = \{(w,y), (x,z), (y,y), (z,z)\}.$

The triple composite $R^3$ consists of all the pairs that are “three steps away,” which at first glance is just $(w,z)$ . In fact, if we stopped here we would be correct about $R^*$ , but we should be careful: because of the loops in $R^2$ there are more pairs in $R^3$ . For example, one iteration of $R$ may take $x$ to $y$ ; another takes $y$ to $z$ ; and a third takes $z$ back to $y$ . Therefore the pair $(x,y)$ is in $R^*$ , even though it is also in $R$ . Check for yourself that $R^3 = \{(w,z), (x,y), (y,z), (z,y) \}$ is the set of all pairs that start and end 3-paths in the original relation.

Finally, we take the union of these three sets to make $R^*= \{(w,x), (w,y), (w,z), (x,y), (x,z), (y,y), (y,z), (z,y), (z,z)\}.$

Takeaways

In many fields of mathematics we would like to take one object and endow it with structure it does not necessarily already have, its closure with respect to that structure. We are not violating the golden rule, “Thou shalt not change anything in math”; rather we are creating an entirely new object that contains the data of the old object.
The reflexive closure of a relation is constructed by taking the union with the diagonal relation.
The symmetric closure of a relation is constructed by taking the union with its inverse.
Composition is a way of combining two relations, or a relation with itself.
The transitive closure of a relation is constructed by taking the infinite union of a relation and all compositions with itself.

Chapter 16 problems

17. Equivalence relations

Review:

Basic arithmetic! Numbers are back.
Chapter 4: Divisibility

17.1 Introduction

Consider the digraph generated by the relation $R$ on $X=\{a,b,c,d,e,f,g\}$ , where $R= \{ (a,a), (a,d), (a,g),$ $(b,b), (b,c), (b,e),$ $(b,f), (c,b), (c,c),$ $(c,e), (c,f), (d,a),$ $(d,d), (d,g), (e,b),$ $(e,c), (e,e), (e,f),$ $(f,b), (f,c), (f,e), (f,f)\}$ . The digraph is drawn below:

Figure. The directed graph of the relation. Three elements occupy their own triangle, while the other four have their own square.

The most visually striking feature of this directed graph is that the set $X$ is partitioned into separated pieces: the piece with $a$ , $d$ , and $g$ ; and the piece with $b$ , $c$ , $e$ , and $f$ .

To understand this phenomenon, what properties does $R$ have? It is reflexive, it is symmetric, and it is transitive. (Verify these for yourself.) Taken together they imply a “fullness” of the relation; the arrow $(a,d)$ brings with it the arrows $(a,a)$ , $(d,a)$ , and $(d,d)$ ; if we add $(d,g)$ , we get $(a,g)$ , $(g,d)$ , $(g,a)$ , and $(g,g)$ .

Such a relation gets its own title.

Definition 17.1.1 An equivalence relation is any relation that is reflexive, symmetric, and transitive.

Examples. The relation given by the statement $x=y$ on any set $X$ is an equivalence relation. In fact it is the prototypical equivalence relation, in the sense that the point of an equivalence relations is to get the idea of equality in a less strict way.

For example, the typical way to define equivalence on a set is to say that two sets $X$ and $Y$ are isomorphic if $|X|=|Y|$ . This relation is an equivalence relation, but $X$ and $Y$ do not need to actually be equal to be equivalent. This is the reason we freely rename the elements of a set whenever we like.

Another example is the equivalence of statements from Chapter 2, where $\phi \equiv \psi$ if $\phi \leftrightarrow \psi$ is a tautology.

Definition 17.1.2 Let $R$ be an equivalence relation on a set $X$ . The equivalence class of $x \in X$ is the set of all elements to which it is equivalent: $[x] = \{ y \in X \; | \; (x,y) \in R \}.$

Example. Let again $R= \{ (a,a), (a,d), (a,g),$ $(b,b), (b,c), (b,e),$ $(b,f), (c,b), (c,c),$ $(c,e), (c,f), (d,a),$ $(d,d), (d,g), (e,b),$ $(e,c), (e,e), (e,f),$ $(f,b), (f,c), (f,e), (f,f)\}$ be a relation on $X = \{a,b,c,d,e,f,g\}$ whose directed graph is drawn above. This relation divides its underlying set into the equivalence classes $[a] = \{a,d,g\}$ and $[b]=\{b,c,e,f\},$ precisely the discrete “sections” of the digraph. Note that the name of the equivalence class is an arbitrarily chosen representative; $[b]$ , $[c]$ , $[e]$ , and $[f]$ all refer to the same set.

17.2 Modular arithmetic

Now we consider a special equivalence relation on the set of integers.

Definition 17.2.1 Fix a nonzero integer $a$ . We declare $m$ and $n$ to be congruent modulo $\boldsymbol{a}$ , and write $m \equiv n \,(\text{mod }a)$ , if $a|(m-n)$ .

We are functionally declaring $m$ and $n$ “the same” if they differ by a multiple of $a$ . In a way, we are “destroying” the information of $a$ . Throughout even very different areas of mathematics, the word “modulo” or “mod” refers to collapsing some information in a structure, to be very vague.

Theorem 17.2.2 For a fixed nonzero integer $a$ , congruence mod $a$ defines an equivalence relation on $\mathbf{Z}$ .

Proof. Up to you, in the exercises.

Examples. In the right context this relation is a familiar one. Suppose it is 10:00 in the morning. What time will it be 37 hours from now? Of course we subtract a full day, or 24 hours, from 37; so what time is it 13 hours from now? It will be 23:00 or 11:00 in the evening, depending on how you tell time. This addition is done modulo 24 (or 12).

A lot of trigonometry is done modulo $2\pi$ , but since $2\pi$ is not an integer, we don’t care about it in this book.

$17 \equiv 33 \,(\text{mod }8)$ , because $33-17=16$ and $8|16$ . You may also think of this as being true because the remainder of both $17$ and $33$ is $1$ upon being divided by $8$ .

However, $23 \not\equiv 49 \,(\text{mod }5)$ because $49-23=26$ , which is not divisible by $5$ .

The equivalence class of a given element mod $a$ is not difficult to generate with the following observation, via example. Suppose we want to know the equivalene class of $11$ modulo $6$ . We are looking for integers $m$ satisfying $6|(11-m)$ ; equivalently, integers $m$ for which there exists a $k$ such that $11-m=6k$ . Or, $m=11-6k$ . Therefore, we will allow $k$ to run through all of its options–both positive and negative–so the integers in the equivalence class of $11$ mod $6$ are just the integers that are $11$ plus some multiple of $6$ : $[11] = \{\ldots, -7, -1, 5, 11, 17, 23, \ldots \}.$ Another equivalence class mod 6 is $[3] = \{\ldots, -9, -3, 3, 9, 15, \ldots \}.$ Observe that there is one equivalence class per possible remainder when dividing by $6$ . The smallest non-negative member of each class is the remainder of all the members of its class modulo 6.

Finally, we will note some useful algebraic properties of this relation.

Theorem 17.2.3 Suppose $u \equiv v$ and $x \equiv y$ modulo $a$ . Then $(u+x) \equiv (v+y) \,(\text{mod }a)$ and $ux \equiv vy \,(\text{mod }a).$

Example. In certain cryptography contexts it is helpful to know the remainders of large integers mod something, but it may expensive to calculate those large integers. Instead we exploit the above theorem. Suppose we want to know the value of $(19^{12} - 27 \times 48) \,(\text{mod }9).$

First, observe that since $27$ is divisible by $9$ , $27 \equiv 0$ . (It is okay to drop the “ $mod 9$ ” here because it should be clear from context after the above paragraph.) Since $0$ times anything is $0$ , $(19^{12} - 27 \times 48) \equiv 19^{12}$ .

Next, observe $19 \equiv 1$ . Then, $19^{12} \equiv 1^{12}$ , which is just $1$ .

So, $(19^{12} - 27 \times 48) \equiv 1 \,(\text{mod }9)$ . In fact, since $1$ is the smallest nonnegative member of its equivalence class, it is the remainder we would get if we divided whatever $19^{12}-27\times 48$ is by $9$ .

Takeaways

An equivalence relation is any reflexive, symmetric, and transitive relation.
Equivalence relations are meant to capture the idea of two things being functionally the same (from a certain point of view) without necessarily being equal. Examples abound even in this course.
An important equivalence relation on the set of integers is the relation of congruence modulo $a$ , which respects addition and multiplication as well.

Chapter 17 problems

18. Partial orders

Review.

Chapter 1: Subsets, again.
Chapter 4: Divisibility.
Chapter 14: Boolean algebras.

In the last chapter we looked at equivalence relations, which generalize the idea of equality. Partial orders will do the same for inequalities. How can we say one object is “less than” another when the two objects are not numbers? You may be able to think of a way to order sets, or strings, or matrices. That way is probably a partial order.

18.1 Introduction

The secret ingredient to order is that an order must only go one way. If $x$ is less than $y$ , the only way that $y$ should be able to be less than $x$ is if the two are equal (and by “less than” we really mean “less than or equal to”). Fortunately, we have a word for this: antisymmetry!

Definition 18.1.1 A relation is a partial order if it is reflexive, antisymmetric, and transitive. If $R$ is a partial order on a set $X$ and $(x,y) \in R$ , we write $x \le_R y$ , $x \preceq y$ , or $x \le y$ . The third is only used when the partial order is clear from context.

Examples. The prototypical partial order is the relation $x \le y$ on the integers. This relation is reflexive (any integer is less than or equal to itself), antisymmetric ( $x \le y$ and $y \le x$ means $x=y$ ), and transitive (as we saw in Chapter 8).

Given a family of sets, there are two “natural” ways to order them: by inclusion, or by size. The first is the relation where sets $E$ and $F$ are related if and only if $E \subseteq F$ . The other is given by $|E| \le |F|$ . In Chapter 15 you verified that the former was a partial order; you will verify the latter in this chapter.

Given a set of integers we may say that $m$ is related to $n$ if and only if $m|n$ . In Chapter 4 you proved that this is a partial order (even if you didn’t know it at the time).

Why the word “partial”? Take the last example, the order defined by divisibility, and suppose our underlying set is all of $\mathbf{Z}$ . Neither $3|5$ nor $5|3$ . Since we have two elements that cannot be ordered, the order is only partial. The following definitions formalize the notion.

Definition 18.1.2 Suppose that $R$ is a partial order on a set $X$ . If $x \preceq y$ or $y \preceq x$ in the partial order, we say $x$ and $y$ are comparable; if not, they are incomparable.

Definition 18.1.3 An order on a set $X$ is total or linear if $x$ and $y$ are comparable for every $x,y \in X$ .

Example. The usual order on the integers is a total order. The divisibility order is total if restricted to the right set, for example, the set of powers of $2$ .

Definition 18.1.4 A set $X$ is well-ordered if it has a total order $\preceq$ and an element $x_0$ such that $x_0 \preceq x$ for all $x \in X$ .

In language we will learn later, a well-ordered set is a set with a total order and a “minimum” element.

Example. The poster child for well-ordered sets is $\mathbf{N}$ , with the usual total order and the least element $0$ . Observe that one of the most important features of $\mathbf{N}$ , the validity of mathematical induction, is not special to $\mathbf{N}$ at all: all we need is somewhere to start and a way to move “forward” through the set. So, well-ordered sets are precisely those for which mathematical induction is possible.

18.2 Hasse diagrams

As relations, partial orders can be represented with directed graphs. Let’s give it a try with a simple partial order: the inclusion order on the power set of $\{a,b,c\}$ .

Figure. The directed graph for the power set of a three-element set, ordered under inclusion. The number of arrows make it incomprehensible.

This visual is, in a word, awful. There are too many arrows! However, the fact that we are dealing with a partial order implies that certain arrows are redundant. Since every partial order is reflexive, we can lose the loops. Transitivity says that the edges from $\varnothing$ to $\{a\}$ and from $\{a\}$ to $\{a,b\}$ are sufficient; we do not need the edge from $\varnothing$ to $\{a,b\}$ (and etc.) Finally if we impose the restriction that all edges flow upward, we can draw the following, much more pleasant, picture.

Figure. The Hasse diagram of the previous digraph, where the redundant arrows (those implied by reflexiveness and transitivity) are removed.

Such a diagram is called a Hasse diagram.

Definition 18.2.1 Suppose $X$ is partially ordered and $x,y,z \in X$ . If $x \le y$ and there is no intermediate element $z$ such that $x \le z$ and $y \le z$ , we say $y$ covers $x$ .

Definition 18.2.2 The Hasse diagram of a partial order is a graph on a set $X$ where there is an edge from a vertex $x$ up to $y$ if and only if $y$ covers $x$ in the order.

Example. Let $X=\{a,b,c,d,e\}$ and $R=\{(a,a),(a,b),(a,e),$ $(b,b),(b,e),(c,b),$ $(c,c),(c,e),(d,b),$ $(d,c),(d,d),(d,e),(e,e)\}$ . Draw the Hasse diagram for this partial order.

To draw a Hasse diagram out of a set of pairs, observe that the elements related to the most other elements should be on the bottom of the diagram, while the elements related to the fewest other elements should be near the top. You can typically then “fit in” the others. Here is the diagram for the above relation:

Figure. The Hasse diagram for the example order.

18.3 Extreme features of a partial order

We will close by discussing various “extreme” features a partial order can have.

Definition 18.3.1 Let $X$ be a partially ordered set. We say that $x$ is less than $y$ (in the partial order) if $x \preceq y$ , and then $y$ is greater than $x$ .

This definition might seem weird, but remember that we are not necessarily talking about numbers. Once everyone knows which partial order is being discussed, we use “less than” and “greater than” to describe relations within the order.

Definition 18.3.2 An element $x$ in a partially ordered set $X$ is minimal/maximal if it is not greater/less than any other element. If the minimal/maximal element is unique, it is called the minimum/maximum.

Example. It is important to note that minimal does not mean less than everything; it means not greater than anything, and these are not the same! Consider the order $\{(w,w), (w,y), (x,x),$ $(x,y), (x,z), (y,y), (z,z)\}$ . Its Hasse diagram is shown below:

Figure. The Hasse diagram for the example order, with two minimal elements and two maximal elements.

Above, $w$ is minimal, but it is not less than $z$ . (Likewise $z$ is maximal even though it is not greater than $w$ .)

Definition 18.3.3 Fix a subset $A$ of the partially ordered set $X$ . A lower bound/upper bound of $A$ is any element of $X$ that is less/greater than every element in $A$ .

Example. Consider the order $\{(a,a), (a,b), (a,c),$ $(a,d), (a,e), (a,f), (b,b),$ $(b,c), (b,d), (b,e), (b,f),$ $(c,c), (c,e), (c,f), (d,d),$ $(d,e), (d,f), (e,e), (f,f),$ $(g,d), (g,e), (g,f), (g,g)\}$ and the subset $A=\{b,c,d\}$ of the ordered set.

Figure. A Hasse diagram with a three-element subset highlighted.

The upper bounds of $A$ are the elements $e$ and $f$ , because both are greater than every element in $A$ .

The lower bounds of $A$ are $a$ and $b$ . Since partial orders are reflexive, $b$ is “less than” (remember, really “less than or equal to”) itself. Because $g$ is only less than $d$ , it is not a lower bound of $A$ .

Definition 18.3.4 Let $A$ be a subset of a partially ordered set. The greatest lower bound of $A$ , if it exists, is the maximum element of the set of lower bounds of $A$ . The greatest lower bound is denoted $\text{glb}$ and sometimes called the infemum, or $\text{inf}$ , of the set. (The least upper bound, $\text{lub}$ , supremum, or $\text{sup}$ is the minimum upper bound.)

Sound familiar? Indeed, we have talked about finding “tightest bounds” when we discussed growth rates of sequences all the way back in Chapter 7.

Example. As before take the order $\{(a,a), (a,b), (a,c),$ $(a,d), (a,e), (a,f), (b,b),$ $(b,c), (b,e), (b,f),$ $(c,c), (c,e), (c,f), (d,d),$ $(d,e), (d,f), (e,e), (f,f),$ $(g,d), (g,e), (g,f), (g,g)\}$ whose Hasse diagram is shown above. And again fix the set $A=\{b,c,d\}$ . The greatest lower bound is the element $b$ ; so, $\text{glb}(A)=b$ .

The least upper bound of $A$ does not exist. The set of upper bounds of $A$ is $\{e,f\}$ , neither of which is less than the other.

You may have noticed that some Hasse diagrams tend to “sprawl,” while others have pleasing geometric shapes. Our final definition puts some structure on that idea.

Definition 18.3.5 A partially ordered set is a lattice if any pair of elements has both a least upper bound and a greatest lower bound.

Examples. The set $\{a,b,c,d,e,f,g\}$ with the order $\{(a,a), (a,b), (a,c),$ $(a,d), (a,e), (a,f), (b,b),$ $(b,c), (b,e), (b,f),$ $(c,c), (c,e), (c,f), (d,d),$ $(d,e), (d,f), (e,e), (f,f),$ $(g,d), (g,e), (g,f), (g,g)\}$ from the last two examples is not a lattice, because (for instance) the pair $\{e,f\}$ fails to have a least upper bound.

An example of a lattice is the inclusion order on any power set; see Section 18.2 for an example. Surprisingly (or maybe not), any linear order is a lattice, even though they don’t visually “look like” lattices. Take a pair $\{a,b\}$ in a linear order where (say) $a \preceq b$ (it’s either that or the other way around). Then, the greatest lower bound of the pair is $a$ and the least upper bound of the pair is $b$ .

Any Boolean algebra yields a lattice, and any finite lattice yields a Boolean algebra. This is an exercise.

Takeaways

A partial order generalizes the idea of inequality to objects that aren’t numbers.
Examples include inclusion of sets and divisibility of integers.
A partial order can be visually represented by a Hasse diagram.
There are several ways to discuss extreme elements in a partial order: minimal/maximal elements, minimum/maximum elements, and lower/upper bounds.

Chapter 18 problems

ONE MORE FINAL: Oh, the places you’ll go

Congratulations! You have either read all of, or scrolled to the end of, Elementary discrete mathematics. What next?

If you are a computer science student in your first year, i.e. a member of the target audience of this book, you either will take calculus or have taken it recently. Asymptotics can be phrased in terms of limits. The binomial theorem will give you a powerful tool in computing difficult integrals. You will look at infinite sequences and sums. Most importantly, you have hopefully gained some confidence in mathematics, and you will go on with a greater ability to read and do math.

With perseverance and a good group of people to work with, there’s not a lot you can’t do after reading this book. I don’t claim this book is magic, or has taught you everything you need to know. What I do claim is that once you know about sets and how to read and write a proof, and maybe have some familiarity with recursion, the tapestry of mathematics is open to you, especially after learning the language of calculus and the examples therein. I am particularly fond of combinatorics and category theory.

Don’t want to do anymore math? That’s fine too, I guess. Maybe this book helped you organize your mental machinery for other areas of your life. Maybe you have found within you a confidence to solve problems you did not think you could.

Whatever powers, if any, this book gave you, use them for good, and live well.

$p$	$q$	$r$	$s$	$p \wedge q$	$\neg s$	$r \to \neg s$	$(p \wedge q) \to (r \to \neg s)$
$T$	$T$	$T$	$T$	$T$	$F$	$F$	$F$
$T$	$T$	$T$	$F$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$T$	$F$	$T$	$T$
$T$	$T$	$F$	$F$	$T$	$T$	$T$	$T$
$T$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$T$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$T$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$T$	$F$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$T$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$F$	$F$	$F$	$F$	$T$	$T$	$T$

$p$	$q$	$(p \to q)$	$(p \to q) \wedge p$	$[(p \to q) \wedge p] \to q$
$T$	$T$	$T$	$T$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$F$	$T$	$F$	$T$

$p$	$q$	$r$	$p\wedge q$	$(p \wedge q)\to r$
$T$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$F$
$T$	$F$	$T$	$F$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$T$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$T$
$F$	$F$	$F$	$F$	$T$

$p$	$q$	$r$	$s$	$p \wedge q$	$\neg s$	$r \to \neg s$	$(p \wedge q) \to (r \to \neg s)$
$T$	$T$	$T$	$T$	$T$	$F$	$F$	$F$
$T$	$T$	$T$	$F$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$T$	$F$	$T$	$T$
$T$	$T$	$F$	$F$	$T$	$T$	$T$	$T$
$T$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$T$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$T$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$T$	$F$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$T$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$F$	$F$	$F$	$F$	$T$	$T$	$T$

$p$	$q$	$(p \to q)$	$(p \to q) \wedge p$	$[(p \to q) \wedge p] \to q$
$T$	$T$	$T$	$T$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$F$	$T$	$F$	$T$

$p$	$q$	$r$	$p\wedge q$	$(p \wedge q)\to r$
$T$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$F$
$T$	$F$	$T$	$F$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$T$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$T$
$F$	$F$	$F$	$F$	$T$

Elementary discrete mathematics

Table of contents

0. What is discrete mathematics?

0.1 It’s math you do quietly, right?

0.2 But really, what is it?

0.3 Some examples

0.4 How to use this book

0.5 The point of this book

1. Sets

1.1 Membership

1.2 Containment

1.3 The algebra of sets

1.4 Operations involving more than two sets

Takeaways

2. Propositional logic

2.1 Statements and connectives

2.2 Connecting connectives

2.3 Inference and equivalence

Algebraic properties of conjunction, disjunction, and negation

Distributive properties

Properties of the conditional

Takeaways

3. Predicate logic

3.1 Predicates

3.2 Quantifiers

3.3 Inferences and equivalences for quantifiers

Takeaways

Proofs

4.1 Direct proofs

4.2 Indirect and bidirectional proofs

4.3 Counterexamples

Takeaways

5. Recursion and integer representation

5.1 Recursion

5.2 Binary numbers

5.3 Octal and hexadecimal numbers

Takeaways

6. Sequences and sums

6.1 Sequences

6.2 Sums

Takeaways

7. Asymptotics

7.1 Growth bounds on sequences

7.2 Other growth relationships

Takeaways

8. Proof by induction

8.1 Inductive proofs involving sums

8.2 Inductive proofs involving divisibility

8.3 Inductive proofs involving inequality

Takeaways

Interlude

9. Introduction to combinatorics

9.1 Enumeration

9.2 Addition and multiplication principles

Takeaways

10. Ordered samples

10.1 Finite sequences

10.2 Permutations

Takeaways

11. Unordered samples

11.1 Counting subsets

11.2 Binomial coefficients

11.3 Multisets

Takeaways

12. Multinomial coefficients

12.1 Multinomials

12.2 Multinomial theorems

Takeaways

13. Probability

13.1 Introduction to probability

13.2 Algebraic rules

Takeaways

14. Boolean matrices

14.1 Matrices and Boolean algebras

14.2 Basic matrix operations

14.3 Multiplication of Boolean matrices

Takeaways

15. Relations

15.1 Basic notions and representation

15.2 Properties of relations

$p$	$q$	$r$	$s$	$p \wedge q$	$\neg s$	$r \to \neg s$	$(p \wedge q) \to (r \to \neg s)$
$T$	$T$	$T$	$T$	$T$	$F$	$F$	$F$
$T$	$T$	$T$	$F$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$T$	$F$	$T$	$T$
$T$	$T$	$F$	$F$	$T$	$T$	$T$	$T$
$T$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$T$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$T$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$T$	$F$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$T$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$T$	$F$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$T$	$T$	$F$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$F$	$T$	$T$	$T$
$F$	$F$	$F$	$T$	$F$	$F$	$T$	$T$
$F$	$F$	$F$	$F$	$F$	$T$	$T$	$T$

$p$	$q$	$(p \to q)$	$(p \to q) \wedge p$	$[(p \to q) \wedge p] \to q$
$T$	$T$	$T$	$T$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$F$	$T$	$F$	$T$

$p$	$q$	$r$	$p\wedge q$	$(p \wedge q)\to r$
$T$	$T$	$T$	$T$	$T$
$T$	$T$	$F$	$T$	$F$
$T$	$F$	$T$	$F$	$T$
$T$	$F$	$F$	$F$	$T$
$F$	$T$	$T$	$F$	$T$
$F$	$T$	$F$	$F$	$T$
$F$	$F$	$T$	$F$	$T$
$F$	$F$	$F$	$F$	$T$