Update

2024-09-06 15:31:02 +02:00 · 2024-09-06 15:31:02 +02:00 · a179d394ea
commit a179d394ea
parent 9c9ff9934c
14 changed files with 1823 additions and 1811 deletions
--- a/optimization.md
+++ b/optimization.md
@ -57,7 +57,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
 - **You can optimize critical parts of code in [assembly](assembly.md)**, i.e. manually write the assembly code that takes most of the running time of the program, with as few and as inexpensive instructions as possible (but beware, popular compilers are very smart and it's often hard to beat them). But note that such code loses [portability](portability.md)! So ALWAYS have a C (or whatever language you are using) [fallback](fallback.md) code for other platforms, use [ifdefs](ifdef.md) to switch to the fallback version on platforms running on different assembly languages.
 - **Loop unrolling/splitting/fusion, function inlining etc.**: there are optimizations that are usually done by high level languages at [assembly](assembly.md) level (e.g. loop unrolling physically replaces a loop by repeated commands which gains speed but also makes the program bigger). However if you're writing in assembly or have a dumb compiler (or are even writing your own) you may do these manually, e.g. with macros/templates etc. Sometimes you can hint a compiler to perform these optimizations, so look this up.
 - **[Parallelism](parallelism.md) ([multithreading](multithreading.md), [compute shaders](compute_shader.md), ...) can astronomically accelerate many programs**, it is one of the most effective techniques of speeding up programs -- we can simply perform several computations at once and save a lot of time -- but there are a few notes. Firstly not all problems can be parallelized, some problem are sequential in nature, even though most problems can probably be parallelized to some degree. Secondly it is hard to do, opens the door for many new types of bugs, requires hardware support (software simulated parallelism can't work here of course) and introduces [dependencies](dependency.md); in other words it is huge [bloat](bloat.md), we don't recommend parallelization unless a very, very good reason is given. Optional use of [SIMD](simd.md) instructions can be a reasonable midway to going full parallel computation.
- **Optimizing [data](data.md)**: it's important to remember we can optimize both algorithm AND data, for example in a 3D game we may simplify our 3D models, remove parts of a level that will never be seen etc.
+- **Optimizing [data](data.md)**: it's important to remember we can optimize both algorithm AND data, for example in a 3D game we may simplify our 3D models, remove parts of a level that will never be seen etc. Ordering, grouping, aligning, reorganizing the data, changing number formats, adding indices and so on may help us achieve cache friendliness and simpler and/or faster algorithms. For example a color [palette](palette.md) may be constructed so that certain desired operations are faster; this is seen e.g. in [Anarch](anarch.md) where colors are arranged so that darkening/brightening is done just by decrementing/incrementing the color index. In [raycasting](raycasting.md) engines it is common to store images by columns rather than by rows as they will be drawn by columns -- this simple change of how data is ordered increases cache friendliness. And so on.
 - **Specialized hardware (e.g. a [GPU](gpu.md)) astronomically accelerates programs**, but as with the previous point, portablity and simplicity greatly suffers, your program becomes bloated and gains dependencies, always consider using specialized hardware and offer software fallbacks.
 - **Smaller code may also be faster** as it allows to fit more instructions into [cache](cache.md).
 - Do not optimize everything and for any cost: optimization often makes the code more cryptic, it may [bloat](bloat.md) it, bring in more bugs etc. Only optimize if it is worth the reward. { from *Game Programming Gurus* -drummyfish }
@ -88,7 +88,6 @@ The following are some common methods of automatic optimization (also note that
 - **Removing instructions that do nothing**: generated code may contain instructions that just do nothing, e.g. NOPs that were used as placeholders that never got replaced; these can be just removed.
 - **Register allocation**: most frequently used variables should be kept in CPU registers for fastest access.
 - **Removing branches**: branches are often expensive due to not being CPU pipeline friendly, they can sometimes be replaced by a branch-free code, e.g. `if (a == b) c = 1; else c = 0;` can be replaced with `c = a == b;`.
- **Memory alignment, reordering etc.**: data stored in memory may be reorganized for better efficiency, e.g. an often accessed array of bytes may actually be made into array of ints so that each item resides exactly on one address (which takes fewer instructions to access and is therefore faster). Data may also be reordered to be more [cache](cache.md) friendly.
 - **Generating [lookup tables](lut.md)**: if the optimizer judges some function to be critical in terms of speed, it may auto generate a lookup table for it, i.e. precompute its values and so sacrifice some memory for making it run extremely fast.
 - **Dead code removal**: parts of code that aren't used can be just removed, making the generated program smaller -- this includes e.g. functions that are present in a [library](library.md) which however aren't used by the specific program or blocks of code that become unreachable e.g. due to some `#define` that makes an if condition always false etc.
 - **[Compression](compression.md)**: compression methods may be applied to make data smaller and optimize for size (for the price of increased CPU usage).