Update

2024-07-17 20:31:45 +02:00 · 2024-07-17 20:31:45 +02:00 · cc40dcb437
commit cc40dcb437
parent dd3842ae42
19 changed files with 1826 additions and 1819 deletions
--- a/optimization.md
+++ b/optimization.md
@ -15,14 +15,14 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
 - **`<stdint.h>` has fast type nicknames**, types such as `uint_fast32_t` which picks the fastest type of at least given width on given platform.
 - **Actually measure the performance** to see if your optimizations work or not. Sometimes things behave counterintuitively and you end up making your program perform worse by trying to optimize it! Also make sure that you MEASURE THE PERFORMANCE CORRECTLY, many beginners for example just try to measure run time of a single simple function call which doesn't really work, you want to try to measure something like a million of such function calls in a loop and then average the time.
 - **Keywords such as `inline`, `static`, `const` and `register` can help compiler optimize well**.
- **Optimize the [bottlenecks](bottleneck.md)!** Optimizing in the wrong place is a complete waste of time. If you're optimizing a part of code that's taking 1% of your program's run time, you will never speed up your program by more than that 1% even if you speed up the specific part by 10000%. Bottlenecks are usually inner-most loops of the main program loop, you can identify them with [profiling](profiling.md). Generally initialization code that runs only once in a long time doesn't need much optimization -- no one is going to care if a program starts up 1 millisecond faster (but of course in special cases such as launching many processes this may start to matter).
- **You can almost always trade space (memory usage) for time (CPU demand) and vice versa** and you can also fine-tune this. You typically gain speed by precomputation (look up tables, more demanding on memory) and memory with [compression](compression.md) (more demanding on CPU).
+- **Optimize the [bottlenecks](bottleneck.md)!** Optimizing in the wrong place is a complete waste of time. If you're optimizing a part of code that's taking 1% of your program's run time, you will never speed up your program by more than that 1% even if you speed up the specific part by 10000%. Bottlenecks are usually inner-most loops of the main program loop, you can identify them with [profiling](profiling.md). A typical bottleneck code is for example a [shader](shader.md) that processes millions of pixels per second. Generally initialization code that runs only once in a long time doesn't need much optimization -- no one is going to care if a program starts up 1 millisecond faster (but of course in special cases such as launching many processes this may start to matter).
+- **You can almost always trade space (memory usage) for time (CPU demand) and vice versa** and you can also fine-tune this. You typically gain speed by [precomputation](precomputation.md) ([look up tables](lut.md), more demanding on memory) and memory with [compression](compression.md) (more demanding on CPU).
 - **[Static](static.md) things are faster and smaller than [dynamic](dynamic.md) things.** This means that things that are somehow fixed/unchangeable are better in terms of performance (and usually also safer and better testable) than things that are allowed to change during [run time](runtime.md) -- for example calling a function directly (e.g. `myVar = myFunc();`) is both faster and requires fewer instructions than calling a function by pointer (e.g. `myVar = myFuncPointer();`): the latter is more flexible but for the price of performance, so if you don't need flexibility (dynamic behavior), use static behavior. This also applies to using [constants](constant.md) (faster/smaller) vs [variables](variable.md), static vs dynamic [typing](typing.md), normal vs dynamic [arrays](array.md) etc.
 - **Be smart, use [math](math.md)**, for example simplify [expressions](expression.md) using known formulas, minimize [logic circuits](logic_circuit.md) etc. Example: let's say you want to compute the radius of a zero-centered [bounding sphere](bounding_sphere.md) of an *N*-point [point cloud](point_cloud.md). Naively you might be computing the Euclidean distance (*sqrt(x^2 + y^2 + z^2)*) to each point and taking a maximum of them, however you can just find the maximum of squared distances (*x^2 + y^2 + z^2*) and return a square root of that maximum. This saves you a computation of *N - 1* square roots.
 - **Fancier algorithms will very often be slower than simpler ones, even if they are theoretically faster**, i.e. don't get too smart and first try the simple algorithm, greater complexity has to be justified. This was commented on e.g. by [Rob Pike](rob_pike.md) who said that "fancy algorithms are slow when n is small, and n is usually small", i.e. if you're sorting an array of 10 items, just use bubble sort, not quick sort etc.
 - **Learn about [dynamic programming](dynamic_programming.md)**.
 - **Avoid branches (ifs)** if you can (remember [ternary operators](ternary_operator.md), loop conditions etc. are branches as well). They break prediction in CPU pipelines and instruction preloading and are often source of great performance losses. Don't forget that you can many times compare and use the result of operations without using any branching (e.g. `x = (y == 5) + 1;` instead of `x = (y == 5) ? 2 : 1;`).
- **Use iteration instead of [recursion](recursion.md)** if possible (calling a function costs something).
+- **Use iteration instead of [recursion](recursion.md)** if possible (calling a function costs something). Know that it is ALWAYS possible to remove recursive function calls.
 - **Use [good enough](good_enough.md) [approximations](approximation.md) instead of completely accurate calculations**, e.g. taxicab distance instead of Euclidean distance, capsule shape to represent the player's collision shape rather than the 3D model's mesh etc. With a physics engine instead of running the simulation at the same FPS as rendering, you may just run it at half and [interpolate](interpolation.md) between two physics states at every other frame. Nice examples can also be found in [computer graphics](graphics.md), e.g. some [software renderers](sw_rendering.md) use perspective-correct texturing only for large near triangles and cheaper affine texturing for other triangles, which mostly looks OK.  
 - **Use quick [bailout](bailout.md) conditions**: many times before performing some expensive calculation you can quickly check whether it's even worth performing it and potentially skip it. For example in physics [collision detections](collision_detection.md) you may first quickly check whether the bounding spheres of the bodies collide before running an expensive precise collision detection -- if bounding spheres of objects don't collide, it is not possible for the bodies to collide and so we can skip further collision detection. 
 - **Operations on static data can be accelerated with accelerating structures** ([look-up tables](lut.md) for functions, [indices](indexing.md) for database lookups, spatial grids for collision checking, various [trees](tree.md) ...).