Update

2023-07-29 11:26:00 +02:00 · 2023-07-29 11:26:00 +02:00 · 89693a5451
commit 89693a5451
parent e105277181
13 changed files with 49 additions and 15 deletions
--- a/optimization.md
+++ b/optimization.md
@ -6,9 +6,10 @@ Optimization means making a program more efficient in terms of consumption of so

 These are mainly for [C](c.md), but may be usable in other languages as well.

- **Tell your compiler to actually optimize** (`-O3`, `-Os` flags etc.).
+- **Tell your compiler to actually optimize** (`-O3`, `-Os` flags etc.). Also check out further compiler flags that may help you turn off unnecessary things you don't need, AND try out different compilers, some may just produce better code.
 - **[gprof](gprof.md) is a utility you can use to profile your code**.
 - **`<stdint.h>` has fast type nicknames**, types such as `uint_fast32_t` which picks the fastest type of at least given width on given platform.
+- **Actually measure the performance** to see if your optimizations work or not. Sometimes things behave counterintuitively and you end up making your program perform worse by trying to optimize it! Also make sure that you MEASURE THE PERFORMANCE CORRECTLY, many beginners for example just try to measure run time of a single simple function call which doesn't really work, you want to try to measure something like a million of such function calls in a loop and then average the time.
 - **Keywords such as `inline`, `static` and `const` can help compiler optimize well**.
 - **Optimize the [bottlenecks](bottleneck.md)!** Optimizing in the wrong place is a complete waste of time. If you're optimizing a part of code that's taking 1% of your program's run time, you will never speed up your program by more than that 1% even if you speed up the specific part by 10000%. Bottlenecks are usually inner-most loops of the main program loop, you can identify them with [profiling](profiling.md). Generally initialization code that runs only once in a long time doesn't need much optimization -- no one is going to care if a program starts up 1 millisecond faster (but of course in special cases such as launching many processes this may start to matter).
 - **You can almost always trade space (memory usage) for time (CPU demand) and vice versa** and you can also fine-tune this. You typically gain speed by precomputation (look up tables, more demanding on memory) and memory with [compression](compression.md) (more demanding on CPU).
@ -21,6 +22,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
 - **Use quick opt-out conditions**: many times before performing some expensive calculation you can quickly check whether it's even worth performing it and potentially skip it. For example in physics [collision detections](collision_detection.md) you may first quickly check whether the bounding spheres of the bodies collide before running an expensive precise collision detection -- if bounding spheres of objects don't collide, it is not possible for the bodies to collide and so we can skip further collision detection. 
 - **Operations on static data can be accelerated with accelerating structures** ([look-up tables](lut.md) for functions, [indices](indexing.md) for database lookups, spatial grids for collision checking, various [trees](tree.md) ...).
 - **Use powers of 2** (1, 2, 4, 8, 16, 32, ...) whenever possible, this is efficient thanks to computers working in [binary](binary.md). Not only may this help nice utilization and alignment of memory, but mainly multiplication and division can be optimized by the compiler to mere bit shifts which is a tremendous speedup.
+- **Memory alignment usually helps speed**, i.e. variables at "nice addresses" (usually multiples of the platform's native integer size) are faster to access, but this may cost some memory (the gaps between aligned data).
 - **Write [cache-friendly](cache-friendly.md) code** (minimize long jumps in memory).
 - **Compare to [0](zero.md) rather than other values**. There's usually an instruction that just checks the zero flag which is faster than loading and comparing two arbitrary numbers.
 - **Use [bit tricks](bit_hack.md)**, hacks for manipulating binary numbers in clever ways only using very basic operations without which one might naively write complex inefficient code with loops and branches. Example of a simple bit trick is checking if a number is power of two as `!(x & (x - 1)) && x`.
@ -43,6 +45,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
 - **Else should be the less likely branch**, try to make if conditions so that the if branch is the one with higher probability of being executed -- this can help branch prediction.
 - Similarly **order if-sequences and switch cases from most probable**: If you have a sequences of ifs such as `if (x) ... else if (y) ... else if (z) ...`, make it so that the most likely condition to hold gets checked first, then second most likely etc. Compiler most likely can't know the probabilities of the conditions so it can't automatically help with this. Do the same with the `switch` statement -- even though switch typically gets compiled to a table of jump addresses, in which case order of the cases doesn't matter, it may also get compiled in a way similar to the if sequence (e.g. as part of size optimization if the cases are sparse) and then it may matter again.
 - **You can save space by "squeezing" variables** -- this is a space-time tradeoff, it's a no brainer but nubs may be unaware of it -- for example you may store 2 4bit values in a single `char` variable (8bit data type), one in the lower 4bits, one in the higher 4bits (use bit shifts etc.). So instead of 16 memory-aligned booleans you may create one `int` and use its individual bits for each boolean value. This is useful in environments with extremely limited RAM such as 8bit Arduinos.
+- **Consider [lazy evaluation](lazy_evaluation.md)** (only evaluate what's actually needed).
 - **You can optimize critical parts of code in [assembly](assembly.md)**, i.e. manually write the assembly code that takes most of the running time of the program, with as few and as inexpensive instructions as possible (but beware, popular compilers are very smart and it's often hard to beat them). But note that such code loses [portability](portability.md)! So ALWAYS have a C (or whatever language you are using) [fallback](fallback.md) code for other platforms, use [ifdefs](ifdef.md) to switch to the fallback version on platforms running on different assembly languages.
 - **Loop unrolling/splitting/fusion, function inlining etc.**: there are optimizations that are usually done by high level languages at [assembly](assembly.md) level (e.g. loop unrolling physically replaces a loop by repeated commands which gains speed but also makes the program bigger). However if you're writing in assembly or have a dumb compiler (or are even writing your own) you may do these manually, e.g. with macros/templates etc. Sometimes you can hint a compiler to perform these optimizations, so look this up.
 - **[Parallelism](parallelism.md) ([multithreading](multithreading.md), [compute shaders](compute_shader.md), ...) can astronomically accelerate many programs**, it is one of the most effective techniques of speeding up programs -- we can simply perform several computations at once and save a lot of time -- but there are a few notes. Firstly not all problems can be parallelized, some problem are sequential in nature, even though most problems can probably be parallelized to some degree. Secondly it is hard to do, opens the door for many new types of bugs, requires hardware support (software simulated parallelism can't work here of course) and introduces [dependencies](dependency.md); in other words it is huge [bloat](bloat.md), we don't recommend parallelization unless a very, very good reason is given. Optional use of [SIMD](simd.md) instructions can be a reasonable midway to going full parallel computation.