This commit is contained in:
Miloslav Ciz 2023-03-13 21:33:27 +01:00
parent b999f33fd4
commit 1e00167b4a
7 changed files with 37 additions and 19 deletions

View file

@ -17,7 +17,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- **Learn about [dynamic programming](dynamic_programming.md)**.
- **Avoid branches (ifs)** if you can (remember [ternary operators](ternary_operator.md), loop conditions etc. are branches as well). They break prediction in CPU pipelines and instruction preloading and are often source of great performance losses. Don't forget that you can many times compare and use the result of operations without using any branching (e.g. `x = (y == 5) + 1;` instead of `x = (y == 5) ? 2 : 1;`).
- **Use iteration instead of [recursion](recursion.md)** if possible (calling a function costs something).
- **You can use good-enough [approximations](approximation.md) instead of completely accurate calculations**, e.g. taxicab distance instead of Euclidean distance, and gain speed or memory without trading.
- **You can use good-enough [approximations](approximation.md) instead of completely accurate calculations**, e.g. taxicab distance instead of Euclidean distance, and gain speed or memory without trading. Nice examples can be found in [computer graphics](graphics.md), e.g. some [software renderers](sw_rendering.md) use perspective-correct texturing only for large near triangles and cheaper affine texturing for other triangles, which mostly looks OK.
- **Use quick opt-out conditions**: many times before performing some expensive calculation you can quickly check whether it's even worth performing it and potentially skip it. For example in physics [collision detections](collision_detection.md) you may first quickly check whether the bounding spheres of the bodies collide before running an expensive precise collision detection -- if bounding spheres of objects don't collide, it is not possible for the bodies to collide and so we can skip further collision detection.
- **Operations on static data can be accelerated with accelerating structures** ([look-up tables](lut.md) for functions, [indices](indexing.md) for database lookups, spatial grids for collision checking, various [trees](tree.md) ...).
- **Use powers of 2** (1, 2, 4, 8, 16, 32, ...) whenever possible, this is efficient thanks to computers working in [binary](binary.md). Not only may this help nice utilization and alignment of memory, but mainly multiplication and division can be optimized by the compiler to mere bit shifts which is a tremendous speedup.
@ -40,6 +40,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- Similarly **order if-sequences and switch cases from most probable**: If you have a sequences of ifs such as `if (x) ... else if (y) ... else if (z) ...`, make it so that the most likely condition to hold gets checked first, then second most likely etc. Compiler most likely can't know the probabilities of the conditions so it can't automatically help with this. Do the same with the `switch` statement -- even though switch typically gets compiled to a table of jump addresses, in which case order of the cases doesn't matter, it may also get compiled in a way similar to the if sequence (e.g. as part of size optimization if the cases are sparse) and then it may matter again.
- **You can save space by "squeezing" variables** -- this is a space-time tradeoff, it's a no brainer but nubs may be unaware of it -- for example you may store 2 4bit values in a single `char` variable (8bit data type), one in the lower 4bits, one in the higher 4bits (use bit shifts etc.). So instead of 16 memory-aligned booleans you may create one `int` and use its individual bits for each boolean value. This is useful in environments with extremely limited RAM such as 8bit Arduinos.
- **You can optimize critical parts of code in [assembly](assembly.md)**, i.e. manually write the assembly code that takes most of the running time of the program, with as few and as inexpensive instructions as possible (but beware, popular compilers are very smart and it's often hard to beat them). But note that such code loses [portability](portability.md)! So ALWAYS have a C (or whatever language you are using) [fallback](fallback.md) code for other platforms, use [ifdefs](ifdef.md) to switch to the fallback version on platforms running on different assembly languages.
- **Loop unrolling/splitting/fusion, function inlining etc.**: there are optimizations that are usually done at [assembly](assembly.md) level (e.g. loop unrolling physically replaces a loop by repeated commands which gains speed but also makes the program bigger) and higher level languages try to perform them automatically. However if you're writing in assembly or have a dumb compiler (or are even writing a compiler) you may do these automatically, e.g. with macros/templates etc. Sometimes you can hint a compiler to perform these optimizations, so look this up.
- **[Parallelism](parallelism.md) ([multithreading](multithreading.md), [compute shaders](compute_shader.md), ...) can astronomically accelerate many programs**, it is one of the most effective techniques of speeding up programs -- we can simply perform several computations at once and save a lot of time -- but there are a few notes. Firstly not all problems can be parallelized, some problem are sequential in nature, even though most problems can probably be parallelized to some degree. Secondly it is hard to do, opens the door for many new types of bugs, requires hardware support (software simulated parallelism can't work here of course) and introduces [dependencies](dependency.md); in other words it is huge [bloat](bloat.md), we don't recommend parallelization unless a very, very good reason is given. Optional use of [SIMD](simd.md) instructions can be a reasonable midway to going full parallel computation.
- **Specialized hardware (e.g. a [GPU](gpu.md)) astronomically accelerates programs**, but as with the previous point, portablity and simplicity greatly suffers, your program becomes bloated and gains dependencies, always consider using specialized hardware and offer software fallbacks.