This commit is contained in:
Miloslav Ciz 2023-10-28 14:40:53 +02:00
parent 2cf32776c0
commit 767b52f354
8 changed files with 121 additions and 60 deletions

View file

@ -10,7 +10,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- **[gprof](gprof.md) is a utility you can use to profile your code**.
- **`<stdint.h>` has fast type nicknames**, types such as `uint_fast32_t` which picks the fastest type of at least given width on given platform.
- **Actually measure the performance** to see if your optimizations work or not. Sometimes things behave counterintuitively and you end up making your program perform worse by trying to optimize it! Also make sure that you MEASURE THE PERFORMANCE CORRECTLY, many beginners for example just try to measure run time of a single simple function call which doesn't really work, you want to try to measure something like a million of such function calls in a loop and then average the time.
- **Keywords such as `inline`, `static` and `const` can help compiler optimize well**.
- **Keywords such as `inline`, `static`, `const` and `register` can help compiler optimize well**.
- **Optimize the [bottlenecks](bottleneck.md)!** Optimizing in the wrong place is a complete waste of time. If you're optimizing a part of code that's taking 1% of your program's run time, you will never speed up your program by more than that 1% even if you speed up the specific part by 10000%. Bottlenecks are usually inner-most loops of the main program loop, you can identify them with [profiling](profiling.md). Generally initialization code that runs only once in a long time doesn't need much optimization -- no one is going to care if a program starts up 1 millisecond faster (but of course in special cases such as launching many processes this may start to matter).
- **You can almost always trade space (memory usage) for time (CPU demand) and vice versa** and you can also fine-tune this. You typically gain speed by precomputation (look up tables, more demanding on memory) and memory with [compression](compression.md) (more demanding on CPU).
- **[Static](static.md) things are faster and smaller than [dynamic](dynamic.md) things.** This means that things that are somehow fixed/unchangeable are better in terms of performance (and usually also safer and better testable) than things that are allowed to change during [run time](runtime.md) -- for example calling a function directly (e.g. `myVar = myFunc();`) is both faster and requires fewer instructions than calling a function by pointer (e.g. `myVar = myFuncPointer();`): the latter is more flexible but for the price of performance, so if you don't need flexibility (dynamic behavior), use static behavior. This also applies to using [constants](constant.md) (faster/smaller) vs [variables](variable.md), static vs dynamic [typing](typing.md), normal vs dynamic [arrays](array.md) etc.
@ -26,9 +26,10 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- **Write [cache-friendly](cache-friendly.md) code** (minimize long jumps in memory).
- **Compare to [0](zero.md) rather than other values**. There's usually an instruction that just checks the zero flag which is faster than loading and comparing two arbitrary numbers.
- **Use [bit tricks](bit_hack.md)**, hacks for manipulating binary numbers in clever ways only using very basic operations without which one might naively write complex inefficient code with loops and branches. Example of a simple bit trick is checking if a number is power of two as `!(x & (x - 1)) && x`.
- **Consider moving computation from run time to compile time**. E.g. if you make a resolution of your game constant (as opposed to a variable), the compiler will be able to partially precompute expressions with the display dimensions and so speed up your program (but you won't be able to dynamically change resolution).
- **Consider moving computation from run time to compile time**, see [preprocessor](preprocessor.md), [macros](macro.md) and [metaprogramming](metaprogramming.md). E.g. if you make a resolution of your game constant (as opposed to a variable), the compiler will be able to partially precompute expressions with the display dimensions and so speed up your program (but you won't be able to dynamically change resolution).
- On some platforms such as [ARM](arm.md) the first **arguments to a function may be passed via registers**, so it may be better to have fewer parameters in functions.
- **Optimize when you already have a working code**. As [Donald Knuth](knuth.md) put it: "premature optimization is the root of all evil". Nevertheless you should get used to simple nobrainer efficient patterns by default and just write them automatically.
- **Passing arguments costs something**: passing a value to a function requires a push onto the stack and later its pop, so minimizing the number of parameters a function has, using global variables to pass arguments and doing things like passing structs by pointers rather than by value can help speed. { from *Game Programming Gurus* -drummyfish }
- **Optimize when you already have a working code**. As [Donald Knuth](knuth.md) put it: "premature optimization is the root of all evil". Nevertheless you should get used to simple nobrainer efficient patterns by default and just write them automatically. Also do one optimization at a time, don't try to put in more optimizations at once.
- **Use your own [caches](cache.md) where they help**, for example if you're frequently working with some database item you better pull it to memory and work with it there, then write it back once you're done (as opposed to communicating with the DB there and back).
- **[Single compilation unit](single_compilation_unit.md) (one big program without [linking](linking.md)) can help compiler optimize better** because it can see the whole code at once, not just its parts. It will also make your program compile faster.
- Search literature for **algorithms with better [complexity class](complexity_class.md)** ([sorts](sorting.md) are a nice example).
@ -44,6 +45,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- **Mental calculation tricks**, e.g. multiplying by one less or more than a power of two is equal to multiplying by power of two and subtracting/adding once, for example *x * 7 = x * 8 - x*; the latter may be faster as a multiplication by power of two (bit shift) and addition/subtraction may be faster than single multiplication, especially on some primitive platform without hardware multiplication. However this needs to be tested on the specific platform. Smart compilers perform these optimizations automatically, but not every compiler is high level and smart.
- **Else should be the less likely branch**, try to make if conditions so that the if branch is the one with higher probability of being executed -- this can help branch prediction.
- Similarly **order if-sequences and switch cases from most probable**: If you have a sequences of ifs such as `if (x) ... else if (y) ... else if (z) ...`, make it so that the most likely condition to hold gets checked first, then second most likely etc. Compiler most likely can't know the probabilities of the conditions so it can't automatically help with this. Do the same with the `switch` statement -- even though switch typically gets compiled to a table of jump addresses, in which case order of the cases doesn't matter, it may also get compiled in a way similar to the if sequence (e.g. as part of size optimization if the cases are sparse) and then it may matter again.
- **Variable aliasing**: If in a function you are often accessing a variable through some complex dereference of multiple pointers, it may help to rather load it to a local variable at the start of the function and then work with that variable, as dereferencing pointers costs something. { from *Game Programming Gurus* -drummyfish }
- **You can save space by "squeezing" variables** -- this is a space-time tradeoff, it's a no brainer but nubs may be unaware of it -- for example you may store 2 4bit values in a single `char` variable (8bit data type), one in the lower 4bits, one in the higher 4bits (use bit shifts etc.). So instead of 16 memory-aligned booleans you may create one `int` and use its individual bits for each boolean value. This is useful in environments with extremely limited RAM such as 8bit Arduinos.
- **Consider [lazy evaluation](lazy_evaluation.md)** (only evaluate what's actually needed).
- **You can optimize critical parts of code in [assembly](assembly.md)**, i.e. manually write the assembly code that takes most of the running time of the program, with as few and as inexpensive instructions as possible (but beware, popular compilers are very smart and it's often hard to beat them). But note that such code loses [portability](portability.md)! So ALWAYS have a C (or whatever language you are using) [fallback](fallback.md) code for other platforms, use [ifdefs](ifdef.md) to switch to the fallback version on platforms running on different assembly languages.
@ -51,6 +53,7 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- **[Parallelism](parallelism.md) ([multithreading](multithreading.md), [compute shaders](compute_shader.md), ...) can astronomically accelerate many programs**, it is one of the most effective techniques of speeding up programs -- we can simply perform several computations at once and save a lot of time -- but there are a few notes. Firstly not all problems can be parallelized, some problem are sequential in nature, even though most problems can probably be parallelized to some degree. Secondly it is hard to do, opens the door for many new types of bugs, requires hardware support (software simulated parallelism can't work here of course) and introduces [dependencies](dependency.md); in other words it is huge [bloat](bloat.md), we don't recommend parallelization unless a very, very good reason is given. Optional use of [SIMD](simd.md) instructions can be a reasonable midway to going full parallel computation.
- **Specialized hardware (e.g. a [GPU](gpu.md)) astronomically accelerates programs**, but as with the previous point, portablity and simplicity greatly suffers, your program becomes bloated and gains dependencies, always consider using specialized hardware and offer software fallbacks.
- **Smaller code may also be faster** as it allows to fit more instructions into [cache](cache.md).
- Do not optimize everything and for any cost: optimization often makes the code more cryptic, it may [bloat](bloat.md) it, bring in more bugs etc. Only optimize if it is worth the prize. { from *Game Programming Gurus* -drummyfish }
## When To Actually Optimize?