Update
This commit is contained in:
parent
490ffab10e
commit
ee83d8a6b6
38 changed files with 2053 additions and 2030 deletions
|
@ -1,19 +1,19 @@
|
|||
# Computational Complexity
|
||||
|
||||
Computational complexity is a formal ([mathematical](math.md)) study of resource usage (usually time and memory) by [computers](computer.md) as they're solving various types of problems. For example when using computers to [sort](sorting.md) arrays of numbers, computational complexity can tell us which [algorithm](algorithm.md) will be fastest as the size of the array grows, which one will require least amount of memory ([RAM](ram.md)) and even what's generally the fastest way in which this can be done. While time ("speed", number of steps) and memory (also *space*) are generally the resources we are most interested in, other can be considered too, e.g. network or power usage. Complexity theory is extremely important and one of the most essential topics in [computer science](compsci.md); it is also immensely practically important as it helps us [optimize](optimization.md) our programs, it teaches us useful things such as that we can trade time and space complexity (i.e. make program run faster on detriment of memory and vice versa) etc.
|
||||
Computational complexity is a formal ([mathematical](math.md)) study of resource usage (usually time and memory) by [computers](computer.md) as they're solving various types of problems. For example when using computers to [sort](sorting.md) arrays of [numbers](number.md), computational complexity allows us to tell which [algorithm](algorithm.md) will be fastest as the size of the array grows, which one will demand the least amount of memory ([RAM](ram.md)) and even what's generally the fastest way in which this can be done. While time ("speed", number of steps) and memory (also *space*) are generally our primary resources of interest, other can be considered too, e.g. network or power usage. Complexity theory is invaluable and extremely important, it belongs among the most essential disciplines of [computer science](compsci.md); it is also immensely practically important as it helps us [optimize](optimization.md) our programs, it teaches us useful knowledge such as that we can trade time and space complexity (i.e. make program run faster on detriment of memory and vice versa) etc.
|
||||
|
||||
Firstly we have to distinguish between two basic kinds of complexity:
|
||||
Primarily we have to distinguish between two basic types of complexity:
|
||||
|
||||
- **[algorithm](algorithm.md) complexity**: Complexity of a specific algorithm. For example [quick sort](quick_sort.md) and [bubble sort](bubble_sort.md) are both algorithms for sorting arrays but quick sort has better time complexity. (Sometimes we may extend this meaning and talk e.g. about memory complexity of a [data structure](data_structure.md) etc.)
|
||||
- **problem complexity**: Complexity of the best algorithm that can solve particular problem; e.g. time complexity of sorting an array is given by time complexity of the algorithm that can sort arrays the fastest.
|
||||
|
||||
## Algorithm Complexity
|
||||
|
||||
Let us now focus on algorithm complexity, as problem complexity follows from it. OK, so **what really is the "algorithm complexity"?** Given resource *R* -- let's consider e.g. time, or the number of steps the algorithm needs to finish solving a problem -- let's say that complexity of a specific algorithm is a [function](function.md) *f(N)* where *N* is the size of input data (for example length of the array to be sorted), which returns the amount of the resource (here number of steps of the algorithm). However we face issues here, most importantly that the number of steps may not only depend on the size of input data but also the data itself (e.g. with sorting it may take shorter time to sort and already sorted array) and on the computer we use (for example some computers may be unable to perform multiplication natively and will emulate it with SEVERAL additions, increasing the number of steps), and also the exact complexity function will be pretty messy (it likely won't be a nice smooth function but rather something that jumps around a bit). So this kind of [sucks](suck.md). We have to make several steps to get a nice, usable theory.
|
||||
Let us now focus on algorithm complexity, as problem complexity follows from it. OK, so **what really is the "algorithm complexity"?** Given resource *R* -- let's consider e.g. time, or the number of steps the algorithm needs to finish solving a problem -- let's say that complexity of a specific algorithm is a [function](function.md) *f(N)* where *N* is the size of input data (for example length of the array to be sorted), which returns the amount of the resource (here number of steps of the algorithm). However we may spot issues emerging here, most importantly that the number of steps may not only depend on the size of input data but also the data itself (e.g. with sorting it may take shorter time to sort and already sorted array) and on the computer we use (for example some computers may be unable to perform multiplication natively and will emulate it with SEVERAL additions, increasing the number of steps), and also the exact complexity function will be pretty messy (it likely won't be a nice smooth function but rather something that jumps around a bit). So this kind of [sucks](suck.md). We have to make several steps to get a nice, usable theory.
|
||||
|
||||
The **solution** to above issues will be achieved in several steps.
|
||||
The **solution** to presented problem will be achieved in several steps.
|
||||
|
||||
FIRSTLY let's make it more clear what *f(N)* returns exactly -- when computing algorithm complexity we will always be interested in one of the following:
|
||||
FIRSTLY let's clarify with better precision what *f(N)* returns exactly -- when computing algorithm complexity we will always be interested in one of the following:
|
||||
|
||||
- **best case** scenario: Here we assume *f(N)* always returns the best possible value for given *N*, usually the lowest (i.e. least number of steps, least amount of memory etc.). So e.g. with array sorting for each array length we will assume the input array has such values that the given algorithm will achieve its best result (fastest sorting, best memory usage, ...). I.e. this is the **lower bound** for all possible values the function could give for given *N*.
|
||||
- **average case** scenario: Here *f(N)* returns the average, i.e. taking all possible inputs for given input size *N*, we just average the performance of our algorithm and this is what the function tells us.
|
||||
|
@ -21,7 +21,7 @@ FIRSTLY let's make it more clear what *f(N)* returns exactly -- when computing a
|
|||
|
||||
This just deals with the fact that some algorithms may perform vastly different for different data -- imagine e.g. linear searching of a specific value in a list; if the searched value is always at the beginning, the algorithm always performs just one step, no matter how long the list is, on the other hand if the searched value is at the end, the number of steps will increase with the list size. So when analyzing an algorithm **we always specify which kind of case we are analyzing** (WATCH OUT: do not confuse these cases with differences between big O, big Omega and big Theta defined below). So let's say from now on we'll be implicitly examining worst case scenarios.
|
||||
|
||||
SECONDLY rather than being interested in PRECISE complexity functions we will rather focus on so called **asymptotic complexity** -- this kind of complexity is only concerned with how fast the resource usage generally GROWS as the size of input data approaches big values (infinity). So again, taking the example of array sorting, we don't really need to know exactly how many steps we will need to sort any given array, but rather how the time needed to sort bigger and bigger arrays will grow. This is also aligned with practice in another way: we don't really care how fast our program will be for small amount of data, it doesn't matter if it takes 1 or 2 microseconds to sort a small array, but we want to know how our program will [scale](scalability.md) -- if we have 10 TB of data, will it take 10 minutes or half a year to sort? If this data doubles in size, will the sorting time also double or will it increase 1000 times? This kind of complexity also no longer depends on what machine we use, the rate of growth will be the same on fast and slow machine alike, so we can conveniently just consider some standardized computer such as [Turing machine](turing_machine.md) to mathematically study complexity of algorithms.
|
||||
SECONDLY rather than being interested in PRECISE complexity functions we will rather focus on so called **asymptotic complexity** -- this kind of complexity is only concerned with how fast the resource usage generally GROWS as the size of input data approaches big values (infinity). So again, taking the example of array sorting, we don't really desire to know exactly how many steps we will need to sort any given array, but rather how the time needed to sort bigger and bigger arrays will grow. This is also aligned with practice in another way: we don't really care how fast our program will be for small amount of data, it doesn't matter if it takes 1 or 2 microseconds to sort a small array, but we want to know how our program will [scale](scalability.md) -- if we have 10 TB of data, will it take 10 minutes or half a year to sort? If this data doubles in size, will the sorting time also double or will it increase 1000 times? This kind of complexity also no longer depends on what machine we use, the rate of growth will be the same on fast and slow machine alike, so we can conveniently just consider some standardized computer such as [Turing machine](turing_machine.md) to mathematically study complexity of algorithms.
|
||||
|
||||
Rather than exact value of resource usage (such as exact number of steps or exact number of bytes in RAM) asymptotic complexity tells us a **[class](class.md)** into which our complexity falls. These classes are given by mathematical functions that grow as fast as our complexity function. So basically we get kind of "tiers", like *constant*, *linear*, *logarithmic*, *quadratic* etc., and our complexity simply falls under one of them. Some common complexity classes, from "best" to "worst", are following (note this isn't an exhaustive list):
|
||||
|
||||
|
@ -45,7 +45,7 @@ Now notice (also check by the formal definitions) that we simply don't care abou
|
|||
|
||||
Another thing we have to clear up: **what does input size really mean?** I.e. what exactly is the *N* in *f(N)*? We've said that e.g. with array sorting we saw *N* as the length of the array to be sorted, but there are several things to additionally talk about. Firstly it usually doesn't matter if we measure the size of input in bits, bytes or number of items -- note that as we're now dealing with asymptotic complexity, i.e. only growth rate towards infinity, we'll get the same complexity class no matter the units (e.g. a linear growth will always be linear, no matter if our *x* axis measures meters or centimeters or light years). SECONDLY however it sometimes DOES matter how we define the input size, take e.g. an algorithm that takes a square image with resolution *R * R* on the input, iterates over all pixels and find the brightest one; now we can define the input size either as the total number of pixels of the image (i.e. *N = R * R*) OR the length of the image side (i.e. *N = R*) -- with the former definition we conclude the algorithm to have linear time complexity (for *N* input pixels the algorithm makes roughly *N* steps), with the latter definition we get QUADRATIC time complexity (for image with side length *N* the algorithm makes roughly *N * N* steps). What now, how to solve this? Well, this isn't such an issue -- we can define the input size however we want, we just have to **stay consistent** so that we are able to compare different algorithms (i.e. it holds that if algorithm *A* has better complexity than algorithm *B*, it will stay so under whatever definition of input size we set), AND when mentioning complexity of some algorithm we should mention how we define the input size so as to prevent confusion.
|
||||
|
||||
With memory complexity we encounter a similar issue -- we may define memory consumption either as an EXTRA memory or TOTAL memory that includes the memory storing the input data. For example with array sorting if an algorithm works [in situ](in_situ.md) (in place, needing no extra memory), considering the former we conclude memory complexity to be constant (extra memory needed doesn't depend on size of input array), considering the latter we conclude the memory complexity to be linear (total memory needed grows linearly with the size of input array). The same thing as above holds: whatever definition we choose, we should just mention which one we chose and stay consistent in using it.
|
||||
With memory complexity we face a similar issue -- we may define memory consumption either as an EXTRA memory or TOTAL memory that includes the memory storing the input data. For example with array sorting if an algorithm works [in situ](in_situ.md) (in place, needing no extra memory), considering the former we conclude memory complexity to be constant (extra memory needed doesn't depend on size of input array), considering the latter we conclude the memory complexity to be linear (total memory needed grows linearly with the size of input array). The same thing as above holds: whatever definition we choose, we should just mention which one we chose and stay consistent in using it.
|
||||
|
||||
## Problem Complexity
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue