Update

2024-01-04 16:45:30 +01:00 · 2024-01-04 16:45:30 +01:00 · 9635bbfa85
commit 9635bbfa85
parent 690e441493
7 changed files with 32 additions and 23 deletions
--- a/computational_complexity.md
+++ b/computational_complexity.md
@ -17,13 +17,13 @@ FIRSTLY let's make it more clear what *f(N)* returns exactly -- when computing a

 - **best case** scenario: Here we assume *f(N)* always returns the best possible value for given *N*, usually the lowest (i.e. least number of steps, least amount of memory etc.). So e.g. with array sorting for each array length we will assume the input array has such values that the given algorithm will achieve its best result (fastest sorting, best memory usage, ...). I.e. this is the **lower bound** for all possible values the function could give for given *N*.
 - **average case** scenario: Here *f(N)* returns the average, i.e. taking all possible inputs for given input size *N*, we just average the performance of our algorithm and this is what the function tells us.
- **worst case** scenario: Here *f(N)* return the worst possible value for given *N*, i.e. the opposite of best case scenario. This is the **upper bound** for all possible value the function could give for given *N*.
+- **worst case** scenario: Here *f(N)* return the worst possible values for given *N*, i.e. the opposite of best case scenario. This is the **upper bound** for all possible value the function could give for given *N*.

-This just deals with the fact that some algorithms may perform vastly different for different data -- imagine e.g. linear searching of a specific value in a list; if the searched value is always at the beginning, the algorithm always performs just one step, no matter how long list is, on the other hand if the searched value is at the end, the number of steps will increase with the list size. So when analyzing an algorithm **we always specify which kind of case we are analyzing** (WATCH OUT: do not confuse these cases with differences between big O, big Omega and big Theta defined below). So let's say from now on we'll be implicitly examining worst case scenarios.
+This just deals with the fact that some algorithms may perform vastly different for different data -- imagine e.g. linear searching of a specific value in a list; if the searched value is always at the beginning, the algorithm always performs just one step, no matter how long the list is, on the other hand if the searched value is at the end, the number of steps will increase with the list size. So when analyzing an algorithm **we always specify which kind of case we are analyzing** (WATCH OUT: do not confuse these cases with differences between big O, big Omega and big Theta defined below). So let's say from now on we'll be implicitly examining worst case scenarios.

 SECONDLY rather than being interested in PRECISE complexity functions we will rather focus on so called **asymptotic complexity** -- this kind of complexity is only concerned with how fast the resource usage generally GROWS as the size of input data approaches big values (infinity). So again, taking the example of array sorting, we don't really need to know exactly how many steps we will need to sort any given array, but rather how the time needed to sort bigger and bigger arrays will grow. This is also aligned with practice in another way: we don't really care how fast our program will be for small amount of data, it doesn't matter if it takes 1 or 2 microseconds to sort a small array, but we want to know how our program will [scale](scalability.md) -- if we have 10 TB of data, will it take 10 minutes or half a year to sort? If this data doubles in size, will the sorting time also double or will it increase 1000 times? This kind of complexity also no longer depends on what machine we use, the rate of growth will be the same on fast and slow machine alike, so we can conveniently just consider some standardized computer such as [Turing machine](turing_machine.md) to mathematically study complexity of algorithms.

-Rather than exact value of resource usage (such as exact number of steps or exact number of bytes in RAM) asymptotic complexity tells us a **[class](class.md)** into which our complexity falls. These classes are given by mathematical functions that grow as fast as our complexity function. So basically we get kind of "tiers", like *constant*, *linear*, *logarithmic* *quadratic* etc., and our complexity simply falls under one of them. Some common complexity classes, from "best" to "worst", are following (note this isn't an exhaustive list):
+Rather than exact value of resource usage (such as exact number of steps or exact number of bytes in RAM) asymptotic complexity tells us a **[class](class.md)** into which our complexity falls. These classes are given by mathematical functions that grow as fast as our complexity function. So basically we get kind of "tiers", like *constant*, *linear*, *logarithmic*, *quadratic* etc., and our complexity simply falls under one of them. Some common complexity classes, from "best" to "worst", are following (note this isn't an exhaustive list):

 - **constant**: Given by function *f(x) = 1* (i.e. complexity doesn't depend on input data size). Best.
 - **[logarithmic](log.md)**: Given by function *f(x) = log(x)*. Note the base of logarithm doesn't matter.
@ -35,13 +35,13 @@ Rather than exact value of resource usage (such as exact number of steps or exac

 Now we just put all the above together, introduce some formalization and notation that computer scientists use to express algorithm complexity, you will see it anywhere where this is discussed. There are the following:

- **big O (Omicron) notation**, written as *O(f(N))*: Says the algorithm complexity (for whatever we are measuring, i.e. time, space etc. and also the specific kind of case, i.e. worst/best/average) is asymptotically bounded from ABOVE by function *f(N)*, i.e. says the **upper bound** of complexity. This is probably the most common information regarding complexity you will encounter (we usually want this "pessimistic" view). More mathematically: complexity *f(x)* belongs to class *O(g(y))* if from some *N0* (we ignore some initial oscillations before this value) the function always stays under function *g* multiplied by any non-zero constant *C*. Formally: *f(x) belongs to O(g(y)) => exists C > 0 and N0 > 0: for all n >= N0: 0 <= f(n) <= C * g(n)*.
+- **big O (Omicron) notation**, written as *O(f(N))*: Says the algorithm complexity (for whatever we are measuring, i.e. time, space etc. and also the specific kind of case, i.e. worst/best/average) is asymptotically bounded from ABOVE by function *f(N)*, i.e. says the **upper bound** of complexity. This is probably the most common information regarding complexity you will encounter (we usually want this "pessimistic" view). More mathematically: complexity *f(x)* belongs to class *O(g(y))* if from some *N0* (we ignore some initial oscillations before this value) the function *f* always stays under function *g* multiplied by some positive constant *C*. Formally: *f(x) belongs to O(g(y)) => exists C > 0 and N0 > 0: for all n >= N0: 0 <= f(n) <= C * g(n)*.
 - **big Omega notation**, written as *Omega(f(N))*: Says the algorithm complexity **lower bound** is given by function *f(N)*. Formally: *f(x) belongs to Omega(g(y)) => exists C > 0 and N0 > 0: for all n >= N0: 0 <= C * g(n) <= f(n)*.
 - **big Theta notation**, written as *Theta(f(N))*: This just means the complexity is both *O(f(N))* and *Omega(f(N))*, i.e. the complexity is tightly bounded by given function.

 Please note that big O/Omega/Theta are a different thing than analyzing best/worst/average case! We can compute big O, big Omega and big Theta for all best, worst and average case, getting 9 different "complexities".

-Now notice (also check by the formal definitions) that we simply don't care about additive and multiplicative constant and we also don't care about some initial oscillations of the complexity function -- it doesn't matter if the complexity function is *f(x) = x* or *f(x) = 100000000 + 100000000 * x*, it still falls under linear complexity! If we have algorithm *A* and *B* and *A* has better complexity, *A* doesn't necessarily ALWAYS perform better, it's just that as we scale our data size to very high values, *A* will prevail in the end.
+Now notice (also check by the formal definitions) that we simply don't care about additive and multiplicative constants and we also don't care about some initial oscillations of the complexity function -- it doesn't matter if the complexity function is *f(x) = x* or *f(x) = 100000000 + 100000000 * x*, it still falls under linear complexity! If we have algorithm *A* and *B* and *A* has better complexity, *A* doesn't necessarily ALWAYS perform better, it's just that as we scale our data size to very high values, *A* will prevail in the end.

 Another thing we have to clear up: **what does input size really mean?** I.e. what exactly is the *N* in *f(N)*? We've said that e.g. with array sorting we saw *N* as the length of the array to be sorted, but there are several things to additionally talk about. Firstly it usually doesn't matter if we measure the size of input in bits, bytes or number of items -- note that as we're now dealing with asymptotic complexity, i.e. only growth rate towards infinity, we'll get the same complexity class no matter the units (e.g. a linear growth will always be linear, no matter if our *x* axis measures meters or centimeters or light years). SECONDLY however it sometimes DOES matter how we define the input size, take e.g. an algorithm that takes a square image with resolution *R * R* on the input, iterates over all pixels and find the brightest one; now we can define the input size either as the total number of pixels of the image (i.e. *N = R * R*) OR the length of the image side (i.e. *N = R*) -- with the former definition we conclude the algorithm to have linear time complexity (for *N* input pixels the algorithm makes roughly *N* steps), with the latter definition we get QUADRATIC time complexity (for image with side length *N* the algorithm makes roughly *N * N* steps). What now, how to solve this? Well, this isn't such an issue -- we can define the input size however we want, we just have to **stay consistent** so that we are able to compare different algorithms (i.e. it holds that if algorithm *A* has better complexity than algorithm *B*, it will stay so under whatever definition of input size we set), AND when mentioning complexity of some algorithm we should mention how we define the input size so as to prevent confusion.

@ -60,7 +60,7 @@ Similarly to algorithm complexity, with problems we again define **[classes](cla
 - **DSpace(f(x))**: Same as DTime but for space complexity.
 - **NSpace(f(x))**: Same as NTime but for space complexity.
 - **[P](p.md)**: Union of all classes *DTime(n^k)*, i.e. all problems that can be solved by a DETERMINISTIC Turing machine with [polynomial](polynomial.md) time complexity.
- **[NP](np.md)**: Union of all classes *NTime(n^k)*, i.e. the same as above but for NON-DETERMINISTIC Turing machine. It is currently not know [if classes P and NP are the same](p_vs_np.md), though it's believed to be not the case -- in fact this is probably the most famous yet unsolved problem of computer science.
+- **[NP](np.md)**: Union of all classes *NTime(n^k)*, i.e. the same as above but for NON-DETERMINISTIC Turing machine. It is currently not know [if classes P and NP are the same](p_vs_np.md), though it's believed to not be the case -- in fact this is probably the most famous yet unsolved problem of computer science.
 - **EXP**: Union of all classes *DTime(2^n^k)*.
 - ...