Posts by Ivar Sand

Ivar Sand · This is Cool

Example of adding two numbers a and b with the result s:

c 101
a 404
b 678
s 1082

Here, c stands for "carry"(2).

The general schema for adding two non-negative integers(1) a and b looks like this:

where n is a natural number(1).
Note that the sequence(5) of digits(3) for c (the carry) is placed one position to the left with respect to the sequences of digits for a and b.

Below, we shall benefit from the definition of x modulo 10(4):
x mod 10 = x1
and from the definition of x // 10:
x // 10 = (x - x1) / 10
where x stands for a non-negative integer(1) with x1 as its least significant digit.
We note that:
x mod 10 picks out the least significant digit of x
x // 10 shifts the digits of x one position to the right and discards the digit to the right of the decimal point

Let n be a natural number(1).

Let the non-negative integers(1) c, a, b, and s be represented by their sequences(5) of digits(3):

so that the values of a, b, and s can be expressed, respectively:

The number of digits is set to n for both a and b.
If one of a and b should at the outset have less than n number of digits, its number of digits is increased by extending its sequence of digits to the left with digits equal to 0 until its number of digits equals n.

According to the general schema for adding two non-negative integers, we define the digits of c like this:

and we define the digits of s like this:

Then:

in other words:

We take a closer look at s:

as stated in the theorem.

Let n be a natural number(1).
Let x be a non-negative integer(1) represented by its sequence(5) of digits(3):

Let:

Let:

Then:

We have:

as stated in the lemma.

References:
(1) Natural number (non-negative integers) https://en.wikipedia.org/wiki/Natural_number
(2) Carry (arithmetic) https://en.wikipedia.org/wiki/Carry_(arithmetic)
(3) Numerical digit https://en.wikipedia.org/wiki/Numerical_digit
(4) Modulo Operation https://www.mathsisfun.com/numbers/modulo.html
(5) Sequences https://www.mathsisfun.com/algebra/sequences-series.html

Ivar Sand · This is Cool

We intend to prove that 2 + 2 = 4.

We start by defining the sequence of natural numbers (1):
1, 2, 3, ... s,t, ...
Here, s, and t represent two consecutive natural numbers.

So far in our presentation, the natural numbers are only symbols.
We define their values by introducing the value equation, which defines uniquely the value of every natural number except for 1 in terms of the value of 1:
for every natural number s: t = s + 1
The natural number s should go through the natural numbers sequentially from 1 so that the right-hand side of the value equation is defined for every value of s.

To make the definition of the value equation complete, we should define the +-operator.
However, we choose instead to introduce an axiom of the natural numbers, the associativity axiom (2):
for every natural number a, b, and c: a + (b + c) = (a + b) + c

We are now in a position to prove:

Theorem:
2 + 2 = 4
Proof:
We operate on the left-hand side of the equation 2 + 2 = 4:
2 + 2
= 2 + (2) as a parenthesis can be put around a singular natural number
= 2 + (1 + 1) as 2 = 1 + 1 by the value equation
= (2 + 1) + 1 by the associativity axiom
= (3) + 1 as 3 = 2 + 1 by the value equation
= 3 + 1 as a parenthesis around a singular natural number can be removed
= 4 as 4 = 3 + 1 by the value equation
We see from this that we have proved that 2 + 2 = 4.
This is what we intended to prove, so the proof is complete.

References:
1 Natural number https://en.wikipedia.org/wiki/Natural_number
2 Associative Laws https://www.mathsisfun.com/associative-commutative-distributive.html

Ivar Sand · This is Cool

In short. Polar coordinates are used to prove the residue theorem for functions represented by a single Laurent series. The path of integration may have a general shape. The reader should be familiar with complex analysis.

Background. The proof emerged from the idea that the proof should contain terms with no contribution to the integral corresponding to the fact that there is no "height difference between the starting and ending points when walking around a mountain top and ending at the starting point". This, and a wish to use as little as possible of the theory of complex analysis, lead to the use of Laurent series and the introduction of a polar coordinate representation in the complex plane.

Complex analysis is the study of complex functions mapping the complex space ℂ to ℂ. A number z in ℂ has two components, a real component and an imaginary component, which is a real number multiplied by i where i=√-1. Nevertheless, algebraically, we regard z as we do a number in ℝ, i.e. as a single entity.

An analytic function is a function that can be represented by a Taylor series. A meromorphic function is a function that is analytic except at isolated points where it has poles. Locally around a point in ℂ, a meromorphic function can be represented by a Laurent series about that point. A Laurent series is a Taylor series to which terms with negative exponents are added. To represent a meromorphic function throughout its domain, several Laurent series may be required.

An integral of a complex function is like a line integral of a scalar real function mapping ℝ² into ℝ.

Laurent series. We identify a meromorphic function f(z) by its Laurent series about a point c,

where z ∈ D, an open disc with radius > 0 and centre at c. f is analytic everywhere on D except at c where f may have a pole(1).

A Laurent series can be viewed as the sum of two series. The part with index n ranging from -∞ to -1 converges on ℂ - {c}, and the other part with n ranging from 0 to ∞ converges on D. These two series are power series, in 1/(z-c) and z-c respectively.

A Laurent series is integrable term by term along rectifiable paths within closed and bounded subsets of D - {c}.(2)

Parameterised integration path. Let P be a rectifiable (i.e. P has a finite length)(3), closed, and connected path, which is contained in a closed and bounded subset of D - {c}. Let a parameterisation of P be given,
P: z=z(t), t∈[α,β]⊂ℝ, α<β,
z(β)=z(α),
the angular part of z(t)-c increases by 2π as t varies from α to β,
z(t) is continuous,
z(t) is piecewise continuously differentiable(4) on [α,β].
The equation
z(t) = r(t)e^(iθ(t)) + c
defines the radial function r(t) = |z(t)-c| and the angular function(5) θ(t) = arg(z(t)-c). The properties of r(t) and θ(t) correspond to the ones of z(t) (stated without proof) and are,
r(β)=r(α),
r(t) is continuous,
r(t) is piecewise continuously differentiable(4) on [α,β]
and
θ(β)=θ(α)+2π,
θ(t) is continuous,
θ(t) is piecewise continuously differentiable(4) on [α,β].

Polar coordinates in ℂ. We calculate the derivative of z(t)-c, which equals z'(t),

r(t) in the denominator above is never 0 on P because P does not contain c.
We use the last expression for z'(t) above in the first of the following definitions,
dz = z'(t)dt
dr = r'(t)dt
dθ = θ'(t)dt
and get,
dz = z'(t)dt
= r'(t)(z(t)-c)/r(t)dt + i(z(t)-c)θ'(t)dt
= (z(t)-c)/r(t)r'(t)dt + i(z(t)-c)θ'(t)dt
= (z-c)/r dr + i(z-c)dθ
from which we identify the radial part, dz_r, and angular part, dz_a, of dz,
dz = dz_r + dz_a
dz_r = (z-c)/r dr
dz_a = i(z-c)dθ.
Note in the expressions of the differentials of the complex polar coordinates, dz_r and dz_a, the corresponding differentials of the real polar coordinates, dr and dθ.

The residue theorem for a function represented by a single Laurent series.

The direction of integration is counterclockwise around the point c, see the description of P above.
Proof. Since P is a rectifiable path in a closed and bounded subset of D - {c}, we can integrate the Laurent series term by term,

Thus we are led to consider ∮_P (z-c)^n dz, n = ... -2, -1, 0, 1, 2, ..., and we need to prove,

where δ_(n,-1) = 1 if n = -1 otherwise 0.

Now we introduce polar coordinates in the left-hand side of [1],

We separate the n=-1 part from the rest,

We use the fact that n=-1 in the n=-1 part. Also, in the third integral, we make use of the parameterisation of P,

(Note that since r'(t) is only piecewise continuous on [α,β], the integrand of the integral above over [α,β] may not always be defined. This problem is circumvented by defining the value of this integral as the sum of the integrals with the same integrands as the integral above but with integration intervals that are the subintervals of [α,β] where r'(t) is continuous.(4))
We observe ∮_P 1/r dr=0; this is an example of the "walking around a mountain" effect we talked about. Also, since we integrate once around c in a counterclockwise direction, ∮_P dθ=2π,

We observe that we have already gotten the result we are looking for in line one, so we aim at proving that the expression within the last parenthesis is zero,

r^n is the derivative of r^(n+1)/(n+1),

Since r(t) and θ(t) are continuous and piecewise continuously differentiable on [α,β], so are r(t)^(n+1)/(n+1) and e^(iθ(t)(n+1)). This allows us to use integration by parts for the integral over [α,β], see Lemma 1 below,

Now we use r(β)=r(α) and θ(β)=θ(α)+2π,

e^(i2π(n+1))=1 yields,

Two elements have opposite signs and cancel each other; they are another example of the "walking around a mountain" effect. We are left with only two integrals within the last parenthesis,

We find that these two integrals cancel each other,

This is the right-hand side of [1], and the proof is complete.

Appendix.
Lemma 1 (integration by parts).

where g and h are complex-valued functions.
Proof. We separate the real and the imaginary parts, g(t) = g_1(t) + ig_2(t) and h(t) = h_1(t) + ih_2(t) where g_1(t), g_2(t), h_1(t), and h_2(t) are real-valued functions.

We substitute the real-valued functions for the complex-valued functions on the left-hand side of the equation in the lemma,

The functions in the integrands of these four integrals are real-valued functions, and that allows us to use integration by parts(6),

This expression is identical to the right hand side of the equation in the lemma, and the proof is complete.

References.
(1) Laurent series http://www.encyclopediaofmath.org/index.php/Laurent_series
(2) Laurent series (Uniqueness) http://en.wikipedia.org/wiki/Laurent_series#Uniqueness
(3) Arc length https://en.wikipedia.org/wiki/Arc_length
(4) Reinhold Remmert, Theory of Complex Functions http://books.google.no/books?id=uP8SF4jf7GEC&pg=PA173&lpg=PA173&dq=integrate+piecewise+continuously+differentiable&source=bl&ots=yMrRr6wE4v&sig=izem7hy8bHrw8TkKS-NEvhujMrs&hl=no&sa=X&ei=dpPWUbSzGoOv4QS_poC4DA&redir_esc=y#v=onepage&q=integrate%20piecewise%20continuously%20differentiable&f=false, p. 173-174.
(5) Argument (complex analysis) http://en.wikipedia.org/wiki/Argument_(complex_analysis)
(6) Integration by parts http://en.wikipedia.org/wiki/Integration_by_parts

Ivar Sand · This is Cool

In short. The divisibility by 3 rule is shown by using proof by induction. The objects of induction are sets consisting of three consecutive integers.

Background. While out walking, I tried from time to time through a period of several years to figure out how to prove that a number is divisible by 3 if and only if the sum of its digits is divisible by 3. I was not able to do that. It seemed obvious that proof by induction would have to be employed. The first step of proof by induction was easy and natural, but the next step seemed to be difficult. I found that it was necessary to sit down and concentrate in order to be able to write a proof.

Now I know that the proof is naively simple. The proof is also smart, I think, a proof that I would not be likely to find myself unless by accident.

The divisibility by 3 rule was obviously discovered by accident. I think one never suspected it to be true in the first place and therefore never started looking for a proof. If that had been the situation, my idea of there being two kinds of mathematicians, the exploring mathematicians who pioneer the results, and the smart mathematicians who write the elegant proofs that survive to the text books (see my thread "Exploring mathematicians and smart mathematicians"), might apply.

Admittedly, on this background it looks stupid to make an "exploring mathematician's" proof. Nevertheless I wanted to try to construct such a proof because I found the work challenging. Anyway, we get an exercise of doing an exploring trip into the realm of mathematics. Below, Proof 1 is the smart proof and Proof 2 is my proof.

Theorem 1. An integer is divisible by 3 if and only if the sum of its digits is divisible by 3.
Proof 1¹. I'll only sketch the proof here. One observes that any integer x can be written as a sum of products, every product being of the form of a digit of x multiplied by 1, 10, 100, or ... 10 ... 0, where the last number, 10 ... 0, has n digits where n is the number of digits of x. One then observes that the numbers 10, 100, and ... 10 ... 0 can be written as respectively 9 + 1, 99 + 1, ... 9 ... + 1. When these sums are multiplied by their respective digits of x and are summed, one gets one sum of products of factors including the factor 9 plus another sum. The first sum is clearly divisible by 3 so that the divisibility by 3 of x is the same as the divisibility by 3 of the second sum plus the least significant digit of x, which proves to be the sum of the digits of x.

Proof 2. We limit ourselves to non-negative integers. The proof for non-positive integers is similar.

First we observe that the theorem is true for a number consisting of one digit because, in this case, the number itself and the sum of its digit(s) have the same value and therefore have the same divisibility by 3 property, i.e. they are either both divisible by 3 or both not divisible by 3. Now, we intend to use proof by induction. We must decide what objects the proof by induction shall refer to. I suggest these objects to be sets of three consecutive integers. The first set consists of 0, 1, and 2, call this set S_1. The maximum element of S_i, where i ≥ 1, is 1 less than the least element of S_(i+1) so that the union of all the S_i sets is the set of the non-negative integers, the set referred to in the theorem. Let us call S_i valid if the theorem is proved for all three numbers in it.

We have already seen that the theorem is true for each of the numbers in S_1, so S_1 is valid. Suppose S_i is valid for some i ≥ 1. We have to prove that S_(i+1) is valid, as well.

Definitions:
x is the least number in S_(i+1).
s() is a function that gives the sum of the digits of its non-negative integer argument.

Now take a look at "A proof diagram for the proof of Theorem 1" in the appendix.

Let us start with x, the least number of S_(i+1). It is easy to see that x equals 3*i, which is clearly divisible by 3. We must show that the sum of the digits of x, s(x), also is divisible by 3.

x-3 in S_i is divisible by 3 because x is.

Since S_i is valid, s(x-3) is divisible by 3 because x-3 is divisible by 3.

Lemma 1 gives us that s(x) is divisible by 3 because s(x-3) is divisible by 3. This concludes the proof for x.

Let us consider x+1, the middle number of S_(i+1). x+1 is 1 higher than a number divisible by 3 and is therefore itself not divisible by 3. We must show that s(x+1) is not divisible by 3 either.

We send x+1 down to S_i by subtracting 3: x+1 - 3 = x-2. x-2 is not divisible by 3 because x is divisible by 3 while the difference between x and x-2, 2, is not divisible by 3.

Since S_i is valid, s(x-2) is not divisible by 3 because x-2 is not divisible by 3.

Lemma 1 gives us that s(x+1) is not divisible by 3 because s(x-2) is not divisible by 3. This concludes the proof for x+1.

Let us consider x+2, the highest number of S_(i+1) and the last one we have to consider. Its case is similar to the case of x+1, and the proof is left to the reader. QED.

Appendix.
Lemma 1. If the sum of the digits of a non-negative integer x is divisible by 3 then the same holds for the sum of the digits of x + 3, and if the sum of the digits of x is not divisible by 3 then the same holds for the sum of the digits of x + 3.
Proof. Definitions:
d_j() is a function that produces digit number j of its non-negative integer argument where j = 1 for the least significant digit of the argument.
s() is a function that gives the sum of the digits of its non-negative integer argument.

We need to show that s(x+3) and s(x) are either both divisible by 3 or both not divisible by 3.

If d_1(x) ≤ 6 then d_1(x+3) = d_1(x) + 3 and d_j(x+3) = d_j(x) for j > 1. Therefore, for such an x, s(x+3) = s(x) + 3. Since s(x+3) - s(x) is divisible by 3, s(x+3) and s(x) are either both divisible by 3 or both not divisible by 3, and this finishes the treatment of this case.

If d_1(x) ≥ 7 then d_1(x+3) = d_1(x) - 7 and there is a carry equalling 1. Now suppose, without limiting the situation in any way, that d_j(x) = 9 for j = 2, 3, ... k and d_(k+1)(x) < 9 for a k ≥ 1. Then d_j(x+3) = 0 for j = 2, 3, ... k and d_(k+1)(x+3) = d_(k+1)(x) + 1. We find s(x+3) = s(x) - 7 + (-9)*(k-1) + 1 = s(x) + (-9)*(k-1) - 6. Since s(x+3) - s(x) is divisible by 3, s(x+3) and s(x) share the same divisibility by 3 property, and this concludes the proof.

A proof diagram for the proof of Theorem 1.
Below is a proof diagram where y = x, x+1, or x+2. All y, y-3, s(y-3), and s(y) in the diagram have the same divisibility by 3 property.

Reference.
1. http://mathforum.org/k12/mathtips/division.tips.html. (This web page does not exist any more.)

Ivar Sand · This is Cool

Thanks for your post, cmowla.

To be honest, I guess I used the idea of there being smart mathematicians and exploring mathematicians as a way of marketing my proof, hehe. However, a Physics professor at the university where I studied did say to me once that he liked an A- (or B+) student, or not quite first-rate student, better than an A student because an A- student is more innovative than an A student, who is conservative. An A- student is familiar with making mistakes and is not afraid of making them. An A student, on the other hand, is afraid of making mistakes and does not dare say things that might prove to be false.

I have posted a comment on your post about the basic limit laws after having spent some time trying to understand your post. I hope you appreciate my comment although I understand that you are not into calculus at the moment.

Well, I think there is such a thing as mathematical maturity. I remember taking a course in linear algebra once. It was a spring course, and I had the feeling of understanding nothing before Easter and everything after Easter.

I cannot comment on Rubik's cube as I am not into that area.

Regarding your item [1]: I guess we have to accept that mathematics is difficult; there seems to be general acceptance of this. This causes papers to be poorly understood, which is sad. However, maybe someone someday will develop a mathematics language that would make mathematics easy to understand.

Regarding your item [2]: I think an idea behind listing in a paper the prerequisite knowledge is to suggest to the unqualified reader to stop reading the paper so as to avoid wasting his or her time reading it. By the way, I have now included the text "(calculus)" in the name of this thread so that readers not interested in calculus can skip the thread.

By the way, perhaps there is a simple contribution in my post #1 in regard of the effort of making papers easy to read. There, I try to align in successive lines the same texts in mathematical expressions. That way, it is easy for the reader to see what is unchanged in a pair of lines so as to find these most interesting parts of the lines fast.

Ivar Sand · Help Me !

In your post #1 you state that you evaluated the absolute value(s) in the second quote to produce the statement in the third quote. However, I have a counterexample. Set x = a + δ/2 and f(a + δ/2) = L - 2ε. In this case the statement in the second quote is false, whereas the statement in the third quote is true.

Also, in the proof of Sum Law, the theorem statement has ε whereas the last statement in the proof has ε_2. Shouldn't these two epsilons have been the same? The same or something similar goes for Constant Law and Product Law.

Ivar Sand · This is Cool

In short. This post presents the idea that there are two kinds of mathematicians, the exploring mathematicians who pioneer the results, and the smart mathematicians who follow and produce the elegant proofs that survive to the text books. A theorem with a proof that is considered smart is chosen from a text book, and then a more elementary proof is presented.

Background. One day, not so long ago, I started wondering about how to prove that a continuous function defined on a closed interval is bounded. I remembered that I had been thinking about the same problem many years earlier and that I had not been able to find a proof. I remembered that I had a runaround experience while trying to locate a point where the function approached infinity. So I gave up trying to construct a proof and looked up the proof in a text book. I found that the proof was smart, a proof that I could not be depend upon being able to find myself.

In spite of this, I continued to think that a straightforward proof ought to exist for this theorem. I fantasised about there being two kinds of mathematicians, the exploring mathematician and the smart mathematician; the exploring mathematicians being the pioneer who is the first one to prove a theorem. The exploring mathematicians would apply engineering-like methods to construct a proof, which might prove to be long and complicated. Then a smart mathematician would follow and replace the exploring mathematician's proof with a smarter and more elegant proof. The new proof would stand the test of time better than the original proof and survive to the text books. Whether this distinction has anything to do with reality, I don't know.

Nevertheless I wanted to try to construct a more elementary proof of the theorem. So I did (or tried to), and I posted this article to demonstrate how such an exploring mathematician's proof might look like. Below, Proof 1 is the "smart" proof and Proof 2 is my proof.

The theorem¹. A continuous function defined on a closed interval is bounded.
Proof 1. I'll only sketch the proof here. One assumes the contrary, that the continuous function is unbounded. One divides the interval into two equal parts. One observes that the function has to be unbounded on at least one of the two parts. One chooses one of the parts, divides that interval into two equal parts, chooses the part where the function is unbounded, and so on. The lengths of the chosen intervals approach zero, and the leftmost endpoints of the intervals approach a point in the function's interval of definition. One uses the concept of supremum to prove that the sequence of endpoints converges. The function's continuity at that point is then shown to imply that the function cannot be unbounded on an interval around that point after all.

Comment. One needs to verify that the unboundedness property and the continuity property collide. I think a difficulty lies in the fact that the unboundedness property is the property of an area or interval whereas the continuity property is the property of a point. (The unboundedness property is not the property of a point in the interval of definition because if a function were unbounded or infinite at a point it would be undefined there.) I guess the smartness of Proof 1 lies in the fact that the function's unboundedness property is included in every step of the process. So to speak, one keeps the unboundedness property in hand all the way while one searches for a point in order to use the function's continuity there.

Proof 2. Let the function's name be f and its closed interval of definition be [a,b]. As in Proof 1, we assume the opposite that f is unbounded, and aim at a contradiction.

We start looking for f's unboundedness at a. Since f is defined at a it is finite there, so a is not the point we are searching for. If a = b f cannot be unbounded, so we deduce that b must be greater than a.

Now let the argument of f, call it x, increase continuously from the point a while we look for a point where |f(x)| = |f(a)| + 1. Since f is continuous, f(x) varies smoothly as x increases, so there are no jumps in the function value. The first point we encounter that satisfies |f(x)| = |f(a)| + 1 we call x_1. We proceed from x_1 moving continuously towards higher values of x looking for a point where |f(x)| = |f(x_1)| + 1 and call the first such point x_2. Proceeding in this way we get a sequence of points {x_1, x_2, ...}.

Suppose we reach b, and we find that our sequence contains a finite number of elements. If the sequence is empty, f is bounded on [a,b] by |f(a)| + 1, otherwise f is bounded by |f(x_n)| + 1 where x_n is the last element in {x_1, x_2, ...}. In both cases f is bounded. We conclude that the number of points in {x_1, x_2, ...} is infinite.

The function value at a point x_i in {x_1, x_2, ...} for an i > 1 satisfies |f(x_i)| = |f(x_(i-1))| + 1. We find |f(x_i)| = |f(a)| + i for i ≥ 1 which implies that |f(x_i)| approaches ∞ as i increases.

Since the monotonic sequence {x_1, x_2, ...} is bounded (by b), it converges². Call the point of convergence x_0.

Choose an ε > 0. We have:
1. Since f is continuous at x_0, there is a δ > 0 so that |f(x) - f(x_0)| < ε for all x satisfying |x - x_0)| < δ. Now
|f(x) - f(x_0)| < ε ⇒
|f(x) - f(x_0)| + |f(x_0)| < ε + |f(x_0)| ⇒
|f(x) - f(x_0) + f(x_0)| ≤ |f(x) - f(x_0)| + |f(x_0)| < ε + |f(x_0)| ⇒
|f(x)| ≤ |f(x) - f(x_0)| + |f(x_0)| < ε + |f(x_0)| ⇒
|f(x)| < ε + |f(x_0)|
We conclude:
|f(x)| < |f(x_0)| + ε whenever |x - x_0)| < δ (1).
2. Since {x_1, x_2, ...} converges towards x_0, there is an integer m1 so that
|x_i - x_0| < δ whenever i > m1 (2).
3. Since |f(x_i)| approaches ∞ with i, there is an integer m2 so that
|f(x_i)| > |f(x_0)| + ε whenever i > m2 (3).

Now choose m = max (m1,m2). This gives us new versions of (2) and (3):
|x_i - x_0| < δ whenever i > m (2')
|f(x_i)| > |f(x_0)| + ε whenever i > m (3').

The continuity condition (1) expresses that f is bounded by |f(x_0)| + ε on the open interval (x_0 - δ, x_0 + δ) whereas the unboundedness or divergence condition (3') expresses that |f(x_i)| is greater than this bound for an i satisfying i > m, call it j. Now since x_j lies in (x_0 - δ, x_0 + δ) according to the convergence condition (2'), we deduce from (1):
|f(x_j)| < |f(x_0)| + ε
and from (3'):
|f(x_j)| > |f(x_0)| + ε.
These two inequalities are inconsistent, and this contradiction allows us to conclude that our original assumption of f being unbounded is false. QED.

References.
1. Tom M. Apostol, Calculus, Volume 1, One-Variable Calculus, with an Introduction to Linear Algebra, Second edition, 1967, p. 150.
2. Ibid., p. 381.

Ivar Sand · This is Cool

In short. We start with an example: A ⇒ B, B ⇒ C, C ⇒ D, where A is an assumed condition and D is a condition contradictory to A. According to proof by contradiction(a), we conclude ¬A. We demonstrate the truth of this conclusion by proving that A ⇒ B ∧ C ∧ D and then showing that this implication reduces to ¬A.

Background. When we construct proofs, we often use the technique of assuming something to be true while suspecting it is false and then proceeding to derive a contradiction(b) which allows us to conclude (and prove) that what we assumed to be true is instead false.

I often wondered what the logic behind this kind of reasoning was like. Maybe I learned it some long time ago, but I am not sure. As a matter of fact, I think I never learned it. Therefore, I thought it might be fun to do a bit of exploring in order to find out what the logic of proof by contradiction might look like, and this thread is the result.

Outline. Section A shows an example of the use of proof by contradiction, and section B reveals the logic behind the results in section A.

Proven statements are identified by numbers within parentheses.
The right part of the line containing a statement may contain a comment indicating how the statement was produced.

A. An example of the use of proof by contradiction.
I guess there are many ways a proof by contradiction(a) can go. I have chosen to demonstrate one of them.

First we assume A to be true. Note that this does not mean that we consider A to be a proven statement. We procede by deducing something from A:
A ⇒ B (1)
Then we deduce something from B:
B ⇒ C (2)
Then another deducing step:
C ⇒ D (3)
where D is the contradictory condition(b) we are seeking. We conclude from this by the principle of proof by contradiction(a) that:
¬A (4)

B. An analysis of the proof by contradiction in section A.
From the combined implications in section A, we here produce what we might call the grand formula of proof by contradiction. Then we proceed by showing that this formula reduces to (4) which means that ¬A is indeed true.

1. First we combine (1) and (2) into one condition:
(A ⇒ B) ∧ (B ⇒ C) (5) (1) and (2)
which is true because (1) and (2) are true.
Next we manipulate (5). Note: I use = instead of ⇔ since either means the same on the set Bool(e).
(A ⇒ B) ∧ (B ⇒ C) = (5)
(¬A ∨ B) ∧ (B ⇒ C) = by A ⇒ B is defined as ¬A ∨ B(d)
¬A ∧ (B ⇒ C) ∨ B ∧ (B ⇒ C) = by law of distributivity(c)
¬A ∧ (true) ∨ B ∧ (¬B ∨ C) = by (2) and B ⇒ C is defined as ¬B ∨ C(d)
¬A ∨ (B ∧ ¬B ∨ B ∧ C) = by law of identity(c) and law of distributivity(c)
¬A ∨ (false ∨ B ∧ C) = by law of complements(c)
¬A ∨ B ∧ C = by law of identity(c)
A ⇒ B ∧ C (6) by A ⇒ B ∧ C is defined as ¬A ∨ B ∧ C(d)

2. Now we combine (6) and (3):
(A ⇒ B ∧ C) ∧ (C ⇒ D) (7) (6) and (3)
which is true because (6) and (3) are true.
By using (3), we treat (7) in the same way as we used (2) to treat (5):
(A ⇒ B ∧ C) ∧ (C ⇒ D) = (7)
(¬A ∨ B ∧ C) ∧ (C ⇒ D) = by A ⇒ B ∧ C is defined as ¬A ∨ B ∧ C(d)
¬A ∧ (C ⇒ D) ∨ B ∧ C ∧ (C ⇒ D) = by law of distributivity(c)
¬A ∧ (true) ∨ B ∧ C ∧ (¬C ∨ D) = by (3) and C ⇒ D is defined as ¬C ∨ D(d)
¬A ∨ (B ∧ C ∧ ¬C ∨ B ∧ C ∧ D) = by law of identity(c) and law of distributivity(c)
¬A ∨ (false ∨ B ∧ C ∧ D) = by law of complements(c)
¬A ∨ B ∧ C ∧ D = by law of identity(c)
A ⇒ B ∧ C ∧ D (8) by A ⇒ B ∧ C ∧ D is defined as ¬A ∨ B ∧ C ∧ D(d)
(8) is an example of what I call the grand formula of proof by contradiction. It expresses that we can combine the statements (1) (2) and (3) into one statement, i.e. (A ⇒ B) ∧ (B ⇒ C) ∧ (C ⇒ D), and replacing "B) ∧ (B ⇒" with "B ∧" and "C) ∧ (C ⇒" with "C ∧".

3. Let us study the contradictory(b) situation between A and D a bit closer. This situation means that A and D cannot be true at the same time. Therefore, if A is true then D is false and vice versa.
Let us have a closer look at (8):
A ⇒ B ∧ C ∧ D = (8)
¬A ∨ B ∧ C ∧ D = by A ⇒ B ∧ C ∧ D is defined as ¬A ∨ B ∧ C ∧ D(d)
true ∧ (¬A ∨ B ∧ C ∧ D) = by law of identity(c)
(A = true ∨ A = false) ∧ (¬A ∨ B ∧ C ∧ D) = as (A = true ∨ A = false) is true as A is either true or false
(A = true) ∧ (¬A ∨ B ∧ C ∧ D) ∨ (A = false) ∧ (¬A ∨ B ∧ C ∧ D) =
by law of distributivity(c)
(A = true) ∧ (false ∨ B ∧ C ∧ false) ∨ (A = false) ∧ (true ∨ B ∧ C ∧ D) =
as D (the first one) is false when A is true
(A = true) ∧ (false ∨ false) ∨ (A = false) ∧ (true) =
as B ∧ C ∧ false = false(f1) and true ∨ B ∧ C ∧ D = true(f2)
false ∨ ¬A = by law of identity(c) and (A = true) ∧ false = false(f1) and (A = false) = ¬A
¬A by law of identity(c)
which is the same as (4), which is what we wanted to prove.

References.
(a) Proof by contradiction, https://en.wikipedia.org/wiki/Proof_by_contradiction.
(b) Contradiction, https://en.wikipedia.org/wiki/Contradiction.
(c) Boolean algebra (structure), https://en.wikipedia.org/wiki/Boolean_algebra_(structure).
(d) Material implication (rule of inference), https://en.wikipedia.org/wiki/Material_implication_(rule_of_inference).
(e) Logical biconditional, https://en.wikipedia.org/wiki/Logical_biconditional (see Truth table, the column A↔B).
(f) Boolean algebra, https://en.wikipedia.org/wiki/Boolean_algebra.
1. See Logical operation Conjunction.
2. See Logical operation Disjunction.

Ivar Sand · This is Cool

Now I have updated post #4. I have removed the word differentiable and also made other changes in order to make the mathematics more precise.

Ivar Sand · This is Cool

I see, Bob. I agree that f is not differentiable at x_0. I'm sorry about that. I propose to replace in post #4:
"Suppose f is differentiable at a point x_0 and that its derivative at x_0 is infinite"
by:
"Suppose f is not differentiable at a point x_0 because its derivative is infinite there"

Ivar Sand · This is Cool

Well, anonimnystefy, let's see. We go to the Wikipedia page named "Continuous function", the paragraph "Weierstrass and Jordan definitions (epsilon–delta) of continuous functions" and the sentence "For any positive real number ..." This sentence expresses something like: given an ε > 0, a δ > 0 exists ..., where ε is part of a condition in the codomain and δ is part of a condition in the domain. Topsy-turvy continuity, on the other hand, looks like: given a δ > 0, a k > 0 exists ..., where δ is part of a condition in the domain and k is part of a condition in the codomain.

To summarise:
The continuity of a function f at x_0 is defined:
Given an ε > 0, a δ > 0 exists so that ∣f(x) - f(x_0)∣ < ε whenever ∣x - x_0∣ < δ.
The topsy-turvy continuity of a function f at x_0 is defined:
Given a δ > 0 a k > 0 exists so that |f(x) - f(x_0)| ≤ k |x - x_0| whenever |x - x_0| < δ.

Ivar Sand · This is Cool

No, Bob, I haven't. I suppose you are thinking about whether continuity implies topsy-turvy continuity. I don't think this is true so a counter-proof would be called for. I believe the problem is connected to points in f's domain where f has an infinite derivative. Let's have a closer look.

Let f have an infinite derivative at a point x_0. This means that (f(x) - f(x_0)) / (x - x_0) approaches infinity as x approaches x_0, or, in other words, that for every M > 0 a δ > 0 exists so that (f(x) - f(x_0)) / (x - x_0) > M whenever |x - x_0| < δ and x ≠ x_0. (We need the condition x ≠ x_0 to avoid 0/0, which is undefined.) Now we take the absolute value of the left hand side of this inequality and multiply both sides by |x - x_0|, and we get
|f(x) - f(x_0)| > M |x - x_0| whenever |x - x_0| < δ and x ≠ x_0 (1)

Now, topsy-turvy continuity at x_0 means: given a δ' > 0, a k > 0 exists so that
|f(x) - f(x_0)| ≤ k |x - x_0| whenever |x - x_0| < δ' (2)

Let us try to prove that f is not topsy-turvy continuous at x_0. We do that by proposing (2) to be true and see if we can derive a contradiction.

We choose M = k + 1 in (1). Regard the point x_1 = x_0 + min(δ',δ)/2. Since |x_1 - x_0| < δ and |x_1 - x_0| < δ' x_1 fits into both (1) and (2). From (1) we get:
|f(x_1) - f(x_0)| > (k + 1) |x_1 - x_0|,
and from (2) we get:
|f(x_1) - f(x_0)| ≤ k |x_1 - x_0|.
These two inequalities are contradictory to each other, and we conclude that our proposition of (2) being true is false. Therefore f is not topsy-turvy continuous at x_0.

Ivar Sand · This is Cool

Background. As you would now, the continuity(1)(2) of a function f at x_0, a point in the domain of f, is defined using a condition in f's codomain first ("given ε > 0 ∣f(x) - f(x_0)∣ < ε") and then proceeding with a condition in f's domain ("then there exists a δ > 0 so that ∣x - x_0∣ < δ").

Once a friend of mine remarked that he felt that the definition of continuity is unnatural in that it starts in a function's codomain instead of in its domain. I remember that I had the same feeling myself when I learned about continuity. I guess that the natural approach, as indicated by my friend, was adopted first, and that the standard definition came later and proved to be more fruitful than the natural approach in some way. (This seems indeed to have been the case.(2))

I thought that it might be fun to see what such a natural approach would look like. (If you disagree this is the place to stop reading this thread.) So I put my mathematical explorer's boots on and dived into the jungle of mathematics. I wanted to see if I could build myself a natural definition of continuity and compare this concept with the existing definition of continuity. I decided to call this natural definition of continuity topsy-turvy continuity because it does things in the opposite way as compared to the existing definition of continuity. The result follows below.

Definition. A function f is said to be topsy-turvy continuous at a point x_0 if for a given δ > 0 a k > 0 exists such that whenever |x - x_0| < δ, it is true that |f(x) - f(x_0)| ≤ k |x - x_0|.

Comments:
- This is just a simple definition of topsy-turvy continuity.
- Note that k may depend on x_0 and δ.
- Note that ≤ is used instead of <. This is to cover the case x = x_0 (a < would have given 0 < 0).

Now we must check that this definition relates to the standard definition of continuity in a proper way. A topsy-turvy continuous function should be continuous. However, the topsy-turvy continuity concept may be weaker than standard continuity so that a continuous function may not be topsy-turvy continuous at all points. Therefore, topsy-turvy continuity should imply continuity. Indeed it does, as is stated in the following theorem.

Theorem. A topsy-turvy continuous function f is continuous.
Proof: Let ε > 0 be given. Choose some point x_0 in the domain of f. Since f is topsy-turvy continuous at x_0, a positive k exists so that when |x - x_0| < ε it is true that |f(x) - f(x_0)| ≤ k |x - x_0|.

Define δ = min (ε, ε/k). We observe that δ > 0. Since we have δ ≤ ε, the topsy-turvy continuity condition |f(x) - f(x_0)| ≤ k |x - x_0| still holds with the same constant k when |x - x_0| < δ.

When |x - x_0| < δ it is also true that |x - x_0| < ε/k since δ ≤ ε/k, and multiplying both sides by k yields k |x - x_0| < k*ε/k = ε. Using this in |f(x) - f(x_0)| ≤ k |x - x_0| gives |f(x) - f(x_0)| < ε still maintaining |x - x_0| < δ.

We conclude that for an ε > 0 a δ > 0 exists so that |f(x) - f(x_0)| < ε whenever |x - x_0| < δ, and this means that f is continuous at x_0. QED.

References:
(1) Tom M. Apostol, Calculus, Volume 1, One-Variable Calculus, with an Introduction to Linear Algebra, Second edition, 1967, p. 130-131.
(2) The wikipedia page for Continuous function.

Ivar Sand · Exercises

Ivar Sand · This is Cool

bobbym Thanks for your welcome, bobbym. I am glad to be here.

scientia Thanks for your post, scientia. I am not as much familiar with the various kinds of mappings as you are. As you probably have observed, I do not use mappings in my proof, I simply use the fact that if A is a subset of B and B is a subset of A then A and B are identical.

I'll see if I can squeeze my brain into solving one of your exercises.

Ivar Sand · This is Cool

Suppose you have two lists of elements, and you are wondering if they contain the same elements. You observe that the lists have the same lengths. You check that every element of the first list exists among the elements of the second list. To complete the process, you must find out if every element of the second list exists among the elements of the first list. Now you stop and think: is it necessary to do this last step? No, it is not because the following theorem exists:

Theorem: If two finite sets have the same number of elements and one set is a subset of the other, the two sets are equal.

Proof: We name the sets A and B, and let A be the subset. We argue by contradiction: We assume that the theorem is false, meaning that A and B are not equal. Since A is a subset of B, there must be an element of B which is not in A because otherwise A and B would be equal. Since B contains A and an element that does not exist in A, the number of elements of B ≥ the number of elements of A + 1. However, this is against the requirement that the number of elements of B = number of elements of A. We have thus arrived at a contradiction and have proved the theorem.

This theorem has some practical benefit. Suppose, for example, that you are considering two lists of file names. They are sorted differently, and you suspect them to contain the same elements. To see that the lists have the same lengths is often easy. To compare them, if the lists have e.g. 3 elements, glancing back and forth between them a few times should be sufficient, but if the lists have 5-6 elements or more, this eye method becomes unreliable and a more systematic method, like one of the two methods indicated above, must be used. However, both of these methods have the weakness that to compare the two lists element by element is often a complicated, tedious, and time-consuming task. It is the amount of time spent comparing the lists that you can use the theorem to reduce.

Math Is Fun Forum

#1 This is Cool » Proof that the standard procedure for adding two numbers is correct » 2023-09-10 23:11:44

#2 This is Cool » 2 + 2 = 4 » 2023-08-18 01:02:45

#3 This is Cool » A proof of the residue theorem (complex analysis) » 2013-05-15 21:23:27

#4 This is Cool » The divisibility by 3 rule – proof by induction » 2013-04-14 20:41:34

#5 Re: This is Cool » Exploring mathematicians and smart mathematicians » 2013-03-25 00:02:12

#6 Re: Help Me ! » Easier Proofs for the Basic Limit Laws? » 2013-03-19 21:08:44

#7 This is Cool » Exploring mathematicians and smart mathematicians » 2013-03-11 23:36:35

#8 This is Cool » Behind the scenes of proof by contradiction » 2013-03-03 21:40:40

#9 Re: This is Cool » Topsy-turvy continuity » 2013-02-03 23:26:59

#10 Re: This is Cool » Topsy-turvy continuity » 2013-01-31 21:16:55

#11 Re: This is Cool » Topsy-turvy continuity » 2013-01-30 22:44:14

#12 Re: This is Cool » Topsy-turvy continuity » 2013-01-30 22:07:24

#13 This is Cool » Topsy-turvy continuity » 2013-01-30 00:09:39

#14 Re: Exercises » Scientias exercises » 2013-01-24 23:52:16

#15 Re: This is Cool » How to reduce the time checking if two lists contain the same elements » 2013-01-23 21:22:13

#16 This is Cool » How to reduce the time checking if two lists contain the same elements » 2013-01-22 23:51:43

Board footer