# Calculus 100% UNDER CONSTRUCTION { BEWARE: I am not a mathematician, this will be dumbed down for noobs and [programmers](programming.md) like me, actual mathematicians may suffer brain damage reading this. ~drummyfish } Calculus is a bit infamous but hugely important area of advanced [mathematics](math.md) whose focus lies in studying **continuous change**: for example how quickly a [function](function.md) grows, how fast its growth "accelerates", in which direction a multidimensional function grows the fastest etc. This means in calculus we stop being preoccupied with actual immediate values and start focusing on their CHANGE: things like velocity, acceleration, slopes, gradients etc., in a highly generalized way. Calculus is one of the first disciplines one gets confronted with in higher math, i.e. when starting University, and for some reason it's a very feared subject among students to whom the name sounds like a curse, although the basics aren't more difficult than other areas of math (that's not to say it shouldn't be feared, just that other areas should be feared equally so). Although from high school textbooks it's easy to acquire the impression that all problems can be solved without calculus and that it will therefore be of little practical use, the opposite is in fact true: in [real world](irl.md) EVERYTHING is about change, proof of which is the fact that in [physics](physics.md) most important phenomena are described by **[differential equations](differential_equation.md)**, i.e. basically "calculus equations" -- it turns out that many things depend on rate of change of some variable rather than the variable's direct value: for example air friction depends on how fast we are moving (how quickly our position is changing), our ears hear thanks to CHANGE in air pressure, electric current gets generated by CHANGE of magnetic field etc. Calculus is very similar to (and sometimes is interchangeably used with) *mathematical analysis* (the difference is basically that analysis tries to [prove](prove.md) what calculus does, at least according to the "[Internet](internet.md)"). The word *calculus* is also sometimes used to signify any "system for making calculations", for example [lambda calculus](lambda_calculus.md). Is this of any importance to a programmer? Fucking YES, you can't avoid it. Consider [physics engines](physics_engine.md), [machine learning](machine_learning.md), smooth [curves](curve.md) and surfaces in computer graphics, [interpolation](interpolation.md) and animation, scientific simulations, [electronics](electronics.md), [robotics](robotics.md), [signal](signal.md) processing and other kind of various shit all REQUIRE at least basics of calculus. In essence there are two main parts to calculus, two mathematical "operations" that work with functions and are opposite to each other: - **Derivative** (differentiation): says how (how fast and in which direction) a given function changes. - **Integral** (integration): opposite of derivative -- given a function of "change" we get back the original function (well, this is just one possible way to view it, but sufficient for now). One thing shows here: one of the reasons why calculus is considered advanced is probably that instead of simple numbers we suddenly start working with whole [functions](function.md), i.e. we have operators that we apply to function and we get new functions -- this requires some more [abstract](abstraction.md) thinking as a function is harder to image than a number. But then again it's not anything too difficult, it just requires some preliminary study to get familiar with what a function actually is etc. Now listen up, here comes the truth about calculus. Doing it correctly and precisely is difficult and sometimes literally impossible, and this is left for mathematicians. Programmers and engineers HAVE TO know the basic theory, but we are largely saved by one excellent thing: **[numerical](numerical.md) methods**. We can compute derivatives and integrals only [approximately](approximation.md) with algorithms that always work for any function and which will be [good enough](good_enough.md) for almost everything we ever encounter in practice. Besides in [digital](digital.md) computers we deal almost exclusively with non-continuous functions anyway, we just have very dense discrete sets of points because in the end we only have finite memory, integer values and sampled data, so there is nothing more natural than numerical methods here. So where a mathematician spends years trying to figure out how to precisely sum up infinitely many infinitely small parts of some weird function, we just write a program that sums up a very big number of very tiny parts and call it a day. Still there exist programs for so called *symbolic computation* that try to automatically do what the mathematician does, i.e. apply reasoning to get precise results, but these belong to some quite specialized areas. ``` xxx : ### xx : ## xx *** xxxxxxxxx xx ***: ** xxx ##xxx xx ** : *xx # xxx xx ** : xx* # xx * xx * :xx ** ## xxx x xx * xx **## xxx xxx x ** xx *# xxxxx* x * xx: ##* * xx * xx : ## ** * xx * xx : ### * * xxx * xxx : ## * * xxxxxx ### ** * ----------------------*------####----------*--------------**---- ########## : * * * #### * : ** ** * ## ** : * ** * ## * : * * ** ## ** : * * * ## * : *** ** * # ** : ***** ** # * : * # * : ## ** : #** ** : ## ** * : # ** ** : ``` *Graph showing a function (`x`), its derivative (`*`) and (one of) its integral(s) (`#`).* The basics of calculus aren't that hard, however it can go deeper and deeper and one can probably dedicate whole life just to learning more and more; as you learn the basic derivatives and integrals, you move on to multidimensional calculus, vector calculus, integrating over curves and surfaces, various esoteric methods of analytical and numerical integration etcetc. ## Derivative Derivative finds how **quickly a function grows** at any given point. DOING derivatives is called **differentiation** (confusingly because differential is a term distinct from derivative). Since derivative and integral are opposite operations, one would assume they'd be equally difficult to handle, but no, derivative is the **easier** part! So it's always taught first. It's kind of like multiplication and division -- multiplication is a bit easier (division has remainders, undefined division by zero etc.). NOTE on notation: there are several notations used for derivatives. We will use a very simple one here: *f'(x)* to us is the derivative of a function *f(x)*. Mathematicians will probably rather like to write *d/dx f(x)*. Just know that this is a thing. OK, BUT **what exactly IS this "derivative"? What does it say?** Basically derivative is the **tangent** to the graph of a function at given point. Derivative of function *f(x)* is a new function *f'(x)* which for given *x* says the **slope** of the graph of function *f(x)* at the point *x*. Slope here means literally the [tangent](tan.md) function which encodes the angle at which the function is increasing (or decreasing). Tangent is defined as the (unitless) ratio of vertical change to horizontal change (for example if a plane is ascending with tangent equal to 2, we know that for every horizontal meter it gains two meters of height). Note that this is mathematically idealized so that no matter how quickly the function changes we really mean the slope at the exact single point, i.e. imagine drawing a tangent line to the graph of the function and then measuring how quickly it changes vertically versus how quickly it changes horizontally. Mathematicians define this using [limits](limit.md) and infinitesimal intervals, but we don't have to care too much about that now, let's just assume it [magically](magic.md) all works now. Here it is shown graphically: ``` tangent / __ line / .' ''.. / __.'f(x) /-'' /| __../:|dy _-' /__| / dx / : / : : --------+--------------->x A ``` Here we see a tangent line drawn at the graph of function *f(x)* at point *A*. We can draw the small right triangle and like shown -- the derivative at point *A* is now literally computed by dividing *dy* by *dx*. We can actually try to approximate the ideal derivative (and this is kind of how computers do it with the numerical methods) by computing *(f(x + C) - f(x)) / C* where *C* we set to some small number, for example 10^-10. It's basically how it's mathematically defined too, mathematicians just set the *C* to "infinitely small distance". By this notice that the derivative will be: - 0 if the function is monotonic (i.e. going "horizontally", neither increasing nor decreasing). This is because *dy* will be 0 and 0 divided by any *dx* will be 0. This fact is used especially when we're finding where functions have minimum and maximum values as we know at these extreme values they will be monotonic. - > 0 if the function is increasing. This is because *dy* will be positive and since *dx* is always positive, we'll get a positive number by dividing them. - < 0 if the function is decreasing. This is because *dy* will be negative and negative divided by positive *dx* is negative. Now it's important to say that derivatives can only be done with **differentiable** functions, i.e. ones that in fact DO have a derivative. This cyclic definition only says there indeed exist functions which are NOT differentiable -- imagine for example a function *f(x)* that gives 0 for every *x* except when *x = 1* where *f(1) = 1* -- what's slope of such function at *x = 1*? How the hell do you wanna integrate that? Firstly it's infinite (the tangent line goes completely vertically and here computing *dy/dx* just results in division by zero), but we don't even know if it's going up or down (it goes up from left but down to the right), it's just fucked up. Also a function that has holes (is not defined everywhere) clearly also isn't differentiable because if there's nothing to differentiate then what do you wanna do? A function that's not differentiable everywhere may still be differentiable in certain parts of course, but in general if we claim a function is differentiable we imply it's differentiable everywhere. It may also be the case that a function is differentiable but its derivative is not. Actually it further gets a bit more complicated, functions may also be partially differentiable, it is possible that a derivative may exist only from "one side", but we won't go into this. There exist conditions that must hold in order for a function to be differentiable, for example it must be continuous and smooth and whatever, just look that up if you need. OK so to actually compute a derivative of a function we can use some of the following rules: | *f(x)* | *f'(x)* | comment | | ---------------------- | ----------------------------- | --------------- | | *n* | *0* | additive const. | | *x^n* | *n * x^(n-1)* | var. to power | | *e^x* | *e^x* | | | *sin(x)* | *cos(x)* | | | *cos(x)* | *-sin(x)* | | | *ln(x)* | *1/x* | | | *a * g(x)* | *a * g'(x)* | | | *g(x) + h(x)* | *g'(x) + h'(x)* | | | *g(x) * h(x)* | *g'(x) * h(x) + g(x) * h'(x)* | | | *g(h(x))* | *g'(h(x)) * h'(x)* | chain rule | **Monkey example**: let's try to find the derivative of this super retarded function: *f(x) = x^2 - 2 * x + 3* Its graph looks like this: ``` :| : 3 + : |: : 2 + '.._..' | 1 + | --+----+----+----+-- -1 0| 1 2 | ``` To differentiate this function we only need to know (from the table above) that a derivative of a sum equals sum of derivatives and then just invoke a simple rule: derivative of *x^N* is *N * x^(N-1)*. We have very little [work](work.md) to do here because there are no composed functions and similar shit, so we simply get: *f'(x) = 2 * x - 2* So *x^2* became *2 * x*, *-2 * x* became just -2 (because *x^0 = 1*) and *3* just disappeared (this always happens to additive constants -- notice that such constants don't affect the function's slope in any way, so that's why). The graph of the derivative looks like this: ``` | 2 + / | / 1 + / | / --+----+----+----+-- -1 0| /1 2 | / -1+ / |/ -2+ ``` Things to notice here are: - The derivative has value 0 at *x = 1*, which means the function is monotonic at this point -- checking out the graph of the original function we see it really is so, the function turns there from decreasing to increasing. - Before *x = 1* the derivative is negative, meaning the function is decreasing (checks out). The slope is also increasing gradually, meaning the function slows down in decreasing its value. - After *x = 1* the opposite is true: the slope is positive and starts increasing, i.e. the function starts increasing AND it keeps increasing faster and faster. - ... **OK but what if we differentiate the derivative lol?** This is legit, it will give us a **higher order derivative** and it is very useful and common. When we see the first derivative as the "speed" of the function's change, the second order derivative gives us the "speed" of the speed of function's change, i.e. basically it's acceleration. We will write second order derivative of function *f(x)* as *f''(x)*. This can for example tell us where the function is convex versus concave (how it is "bent"), which again helps with finding minimum and maximum values etc. Of course we may continue and make third order derivative, fourth etc. Next we must mention **partial derivatives** which are basically **multidimensional** derivatives, i.e. ones we do with functions of multiple variables. There is one important thing to mention: when differentiating a function of multiple variables, we have to say which variable we are differentiating against, which is an equivalent of choosing the axis along which we differentiate. Practically this will result in us treating the non-chosen variables as if they were constants. So say we have a function of two variables *f(x,y)*: we can differentiate it against the variable *x* and also *y*, i.e. we get two different derivatives. If we imagine the function *f(x,y)* as a two dimensional [heightmap](heightmap.md), then the derivative against *x* means we are getting a slope as if we're going in the *x* axis direction (and accordingly the same holds for *y*). This is why it's called *partial* derivatives: there are multiple derivatives, multiple *parts*. Making a [vector](vector.md) out of all partial derivatives will give us a **[gradient](gradient.md)** which is kind of an "arrow" that can tell us in which direction the increase/decrease if the fastest. This is very important for example for machine learning where we are trying to minimize the error function by following the path of the gradient etc. All this is beyond the scope of this article though. ## Integral Integral is the opposite to derivative. There are usually two main ways to interpret what an integral means: - Literally the opposite of derivative, i.e. it takes a function, which is interpreted as the rate of change, and gives us back the original function. - Geometric interpretation: integral gives the [area](area.md) under the graph of a function, while taking the area below zero to be negative. This is subsequently seen as a **[sum](sum.md)** of infinitely many small "strips" into which we cut the graph of the function. All in all integral can be though of as a kind of fancy sum, and even they symbol for it is a big weird *S*. Both of these interpretations are equivalent in that we will compute the same thing, they only differ in how we think of what we are computing. As already claimed in the section on derivative, integrating is **more difficult** than differentiation. Some reasons for this are: - There is no simple [algorithm](algorithm.md) for integrating general function (only for some specific cases) and many functions do NOT have analytical solutions at all! I.e. while we can make a derivative of any (differentiable) function by just following simple rules, getting an integral of a function is often a matter of trial and error, integrating is kind of [art](art.md) that has to be learned. This may come as a surprise but it is so, it is similar to how for example factoring a number is much more difficult than multiplying the factors back. - Unlike with derivatives there are infinitely many integrals of given function because functions that only differ by an added constant will give the same derivative (for example the functions *f(x) = x* and *f(x) = x + 1* will both have the same derivative) -- so when we're integrating we always get function that has a variable additive constant in it. - Integrals don't have some nice mathematical properties that derivatives have, so we can't assume as much, for example a derivative of an elementary function is always elementary function (the set of elementary functions is closed under differentiation) but this is not the case for an integral. - Integrating a function makes it more complex (e.g. the exponents of variables increase), unlike with derivatives where we are simplifying the function. - Integrals don't usually make sense at single points, they are related to [intervals](interval.md). While with derivatives it's completely fine to ask "what's the derivative of this function at this single point", with integrals we always have to as "what's the integral between points A and B". - Related to the previous point is also the fact that derivative is basically a local operator concerned only with a single point and a small area around it, while integral is accumulating information over a bigger area, i.e. it's more complex in that we have to consider the function more globally. - As another consequence of the non-local properties of integral there are actually TWO types of integrals: definite and indefinite. - There exist quite simple functions that simply don't have an analytical solution (for example *sin(x)/x*). So due to these complications we now yet have to explain the two different types of integrals: - **indefinite integral**: This is the FUNCTION we get by performing integration, i.e. result of indefinite integral is a mathematical expression with variables in it. In fact this expression represents an infinite set of functions because it always has the additive constant *C* in it (like hinted above) -- we can kind of ignore this for now. The important gist is this: indefinite integral kind of gives us a general FORMULA that can further be used to compute definite integrals. For example an indefinite integral of function *f(x) = 1* will be *x + C*. In practice the result we are searching is often a definite integral (a single value), but to compute that we have to start by computing the indefinite integral. However it's also very hard to calculate indefinite integrals -- they are the precise solution and holy grail of integration but in practice we can't always get them and have to resort to approximations. - **definite integral**: This is a single [NUMBER](number.md) which (applying the geometric interpretation of integral) tells us the AREA below the function graph (with area below zero counting as negative) over some specific INTERVAL, i.e. between two given points A and B. This means that definite integral doesn't give us an expression but rather a quantity. For example a definite integral of function *f(x) = 1* over interval [0,1] will give us 1 (imagine the graph: the area is simply that of a square with side 1). Definite integrals are computed from the indefinite integral by plugging the upper interval number into the indefinite integral (in the place of the variable), then plugging the lower interval number, and then subtracting the latter from the former. With numeric methods (computer integration) we always only get definite integrals (and actually only their approximate values) -- the computer here skips computing the indefinite integral (as that's hard) and rather like a dumb machine LITERALLY goes by small steps and computes the area below the function graph. Small note to this: the computer still can draw a graph of a function's integral by plotting definite integral value for interval 0 to *x* for every plotted *x* because when we think about it, the indefinite integral kind of gives us a function of how an indefinite integral grows; so the computer can give us a picture of a graph but it generally cannot give us an analytically computed formula of indefinite integral. Fun fact: before digital computers engineers used very clever methods to find definite integrals of general functions. [Analog](analog.md) computers were particularly good at integrating, their continuous nature makes them a quite elegant solution to the problem, however perhaps even more genius method in its [simplicity](kiss.md) was the following: the engineer would draw the function he wanted to integrate on a sheet of paper (or maybe more preferably some kind of heavier material), then cut it out and simply weight its mass -- this would give him the fraction of the weight of the whole sheet of paper and so also the fraction of the area below the function graph. **Example**: we will now try to make an indefinite integral of the function: *f(x) = 2 * x - 2* This is the derivative we got in the example of differentiation, so by integrating we should get back the original function we differentiated there. Now for the **notation**: the symbol for integral is kind of a big italic *S* ([Unicode](unicode.md) U+222), but for [simplicity](kiss.md) we will just use the uppercase letter *I* here. With indefinite integrals only the symbol alone is used. For definite integrals we additionally write the interval over which we make the integral, i.e. *I(A,B)* (normally *A* is written at the bottom and *B* at the top), where *A* and *B* says the interval. So we will now write our indefinite integral like this: *I (2 * x - 2) dx* **Wait dude WHAT THE FUCK is this dx shit at the end?** This question is expected. Look: it has to do with the theory behind what the integral mathematically means, for starters one can just ignore it and remember that integral starts with *I*, then the integrated function follows, and then there is *dx* at the end. But to give a bit of explanation: firstly notice the *dx* tells us what the integrated variable is -- usually we have a function with single variable *x* and so it's pretty clear, but once we move to more dimensions we'll have more variables and this *dx* tells us what is a variable (i.e. along which axis we are integrating) and what is to be treated as a constant (maybe this doesn't yet make much sense but with integration there is a big difference between a variable and a constant, even if they are both represented by a letter). The real reason for *dx* is that the integral really represents an **infinite sum**. Have you ever seen that big sigma symbol for a sum? The integral symbol (here *I*) is like this, it likewise says "make an infinite sum of what will follow". But if we take a function and make infinitely many steps and keep summing the values the function gives us, we will just get [infinity](infinity.md) as the sum, so something is missing. In fact we don't want to sum the function values but rather areas of "tiny strips" we are kind of drawing below the function graph -- now a strip is basically a rectangle: area of a rectangle is computed as its height times its width. Height of the rectangle is the function value (here *2 * x - 2*) and width is *dx*, which represents the "infinitely narrow" interval. This is just to give some idea about WHY it looks like this, but it's cool to ignore it for now. So now the fuck we can finally move on. Our integral is really easy because it's just a sum of two expressions (and an integral of a sum thankfully equals a sum of integrals) that can be integrated easily. So from the rule *I x^N dx = x^(N + 1) / (N + 1)* we deduce that integral of *2 * x* is *2 * x^2 / 2 = x^2* and integral of *-2* is *-2 * x*, so we get: *I (2 * x - 2) dx = x^2 - 2 * x + C* A few things to note here now: - Notice the additive constant *C* at the end. We always have to include this constant in the result of indefinite integral, like already mentioned. For example imagine if we set *C = 0*, then we'll get a function *x^2 - 2 * x*, and if we differentiate this back, we'll get the function we integrated: *2 * x - 2*. But we will also get the same function no matter what *C* we set because, like explained in the derivative section, additive constants disappear in differentiation. So just never forget this constant. We didn't obtain a single function but an infinite set of functions that differ just by the value of *C* (i.e. their graphs are just vertically shifted). - We in fact DID receive back the original function from the derivative example, which was *x^2 - 2 * x + 3*, which confirms our result as correct. Or, as per above, we should rather say again that this function is a part of the set of functions we computed, one with *C = 3*. Our example integral wasn't that hard, right? Yes, this was extremely easy, but once you start integrating something with composed functions (functions inside other functions) you'll get into all sorts of trouble. Now let's finish with computing a definite integral, OK? Let's say we want to compute the integral over interval 0 to 1, i.e. we'll write: *I(0,1) (2 * x - 2) dx* Above we said this is done by computing indefinite integral (already done), then plugging the upper and lower bound and subtracting, so let's do it: *I(0,1) (2 * x - 2) dx = (1^2 - 2 * 1 + C) - (0^2 - 2 * 0 + C) = -1* Things to notice here: - The constants *C* nicely subtract and disappear, and they always will, so we don't have to worry about assigning them any values or stuff like that. - The area we got is negative and its absolute size is 1, does this make sense? YES. Take a look at the graph of the function *2 * x - 2* up above and pay attention to the interval 0 to 1. The function's value is below zero and we said that area below zero will be negative, so this checks out. Also we can see that geometrically the size of the area is a half of a rectangle of height 2 and width 1, which is exactly 1. So all in all we're cool. For completeness here are some rules for integration: | *f(x)* | *I f(x) dx* | comment | | -------------------- | ----------------------------------------------------- | ---------- | | *a * x^n* | *a * (x^(n+1))/(n+1) + C* | | | *cos(x)* | *sin(x) + C* | | | *sin(x)* | *-cos(x) + C* | | | *e^x* | *e^x + C* | | | *1/x* | *log(x) + C* | | | *a * g(x) + b * h(x)*| *a * (I g(x) dx) + b * (I h(x) dx) + C* | | | *g(x) * h(x)* | *g(x) * (I h(x) dx) - (I g'(x) * (I h(x) dx) dx) + C* | per partes | However note that applying these rules is generally not so simple as with differentiation, there exist methods such as *per partes* or *substitution* that don't tell you exactly how or when to apply them, so you have to experiment -- like said, this is an entertainment left to those who just enjoy doing math. **Can we do higher order integrals and partial integrals?** Yes, of course, just like with derivatives we can do both of these. ## Super Simple Numerical Calculus Example Here is a small [C](c.md) code that produces the image at the top showing a graph of a function, its derivative and integral. Please keep in mind this is the most naive example using the simplest algorithm that in practice would be too inaccurate and/or inefficient, but it's good for demonstration. For shorter code we resort to using [floating point](float.md) but of course we can always avoid it with [fixed point](fixed_point.md). You can try to play around with the function and see how its derivative and integral changes. Note that the plotted integral is indeed just one of the infinitely many integrals that would be differently vertically shifted by the constant *C* -- here we just plot the one that at *x = 0* goes through 0. ``` #include #include #define GRAPH_RESX 64 // ASCII graph resolution #define GRAPH_RESY 28 #define GRAPH_SIZE 2.5 // interval shown in the graph #define DX 0.01 // for numeric methods double f(double x) // our function { return 1 + sin(2 * x) + 0.2 * x * x; } double derivative(double (*f)(double), double x) { return (f(x + DX) - f(x)) / DX; } double integral(double (*f)(double), double x) { int steps = x / DX; double r = 0; int flip = x < 0; if (x < 0) steps *= -1; else x = 0; while (steps) { r += f(x) * DX; steps--; x += DX; } return flip ? -1 * r : r; } char graphImage[GRAPH_RESX * GRAPH_RESY]; void graphDraw(double x, double y, char c) { int drawX = ((x + GRAPH_SIZE) / (2 * GRAPH_SIZE)) * GRAPH_RESX, drawY = GRAPH_RESY - ((y + GRAPH_SIZE) / (2 * GRAPH_SIZE)) * GRAPH_RESY; if (drawX >= 0 && drawX < GRAPH_RESX && drawY >= 0 && drawY < GRAPH_RESY) graphImage[drawY * GRAPH_RESX + drawX] = c; } int main(void) { // clear the graph image: for (int i = 0; i < GRAPH_RESX * GRAPH_RESY; ++i) graphImage[i] = (i % GRAPH_RESX) == GRAPH_RESX / 2 ? ':' : ((i / GRAPH_RESX) == GRAPH_RESY / 2 ? '-' : ' '); // now plot the function, its derivative and integral for (double x = -1 * GRAPH_SIZE; x < GRAPH_SIZE; x += GRAPH_SIZE / (2 * GRAPH_RESX)) { graphDraw(x,integral(f,x),'#'); graphDraw(x,derivative(f,x),'*'); graphDraw(x,f(x),'x'); } // draw the graph: for (int i = 0; i < GRAPH_RESX * GRAPH_RESY; ++i) { putchar(graphImage[i]); if ((i + 1) % GRAPH_RESX == 0) putchar('\n'); } return 0; } ``` ## See Also - [differential equation](differential_equation.md)