This commit is contained in:
Miloslav Ciz 2022-10-22 16:12:50 +02:00
parent 17341ead82
commit 01ab666f94
10 changed files with 194 additions and 23 deletions

View file

@ -7,11 +7,12 @@ Optimization means making a program more efficient in terms of consumption of so
These are mainly for [C](c.md), but may be usable in other languages as well.
- **Tell your compiler to actually optimize** (`-O3`, `-Os` etc.).
- **gprof is a utility you can use to profile your code**.
- **[gprof](gprof.md) is a utility you can use to profile your code**.
- **`<stdint.h>` has fast type nicknames**, types such as `uint_fast32_t` which picks the fastest type of at least given width on given platform.
- **Keywords such as `inline`, `static` and `const` can help compiler optimize well**.
- **Optimize the [bottlenecks](bottleneck.md)!** Optimizing in the wrong place is a complete waste of time. If you're optimizing a part of code that's taking 1% of your program's run time, you will never speed up your program by more than that 1% even if you speed up the specific part by 10000%. Bottlenecks are usually inner-most loops of the main program loop, you can identify them with [profiling](profiling.md). Generally initialization code that runs only once in a long time doesn't need much optimization -- no one is going to care if a program starts up 1 millisecond faster (but of course in special cases such as launching many processes this may start to matter).
- **You can almost always trade space (memory usage) for time (CPU demand) and vice versa** and you can also fine-tune this. You typically gain speed by precomputation (look up tables, more demanding on memory) and memory with compression (more demanding on CPU).
- **[Static](static.md) things are faster and smaller than [dynamic](dynamic.md) things.** This means that things that are somehow fixed/unchangeable are better in terms of performance (and usually also safer and better testable) than things that are allowed to change during [run time](runtime.md) -- for example calling a function directly (e.g. `myVar = myFunc();`) is both faster and requires fewer instructions than calling a function by pointer (e.g. `myVar = myFuncPointer();`): the latter is more flexible but for the price of performance, so if you don't need flexibility (dynamic behavior), use static behavior. This also applies to using [constants](constant.md) (faster/smaller) vs [variables](variable.md), static vs dynamic [typing](typing.md), normal vs dynamic [arrays](array.md) etc.
- **Be smart, use [math](math.md)**. Example: let's say you want to compute the radius of a zero-centered [bounding sphere](bounding_sphere.md) of an *N*-point [point cloud](point_cloud.md). Naively you might be computing the Euclidean distance (*sqrt(x^2 + y^2 + z^2)*) to each point and taking a maximum of them, however you can just find the maximum of squared distances (*x^2 + y^2 + z^2*) and return a square root of that maximum. This saves you a computation of *N - 1* square roots.
- **Learn about [dynamic programming](dynamic_programming.md)**.
- **Avoid branches (ifs)** if you can (remember [ternary operators](ternary_operator.md), loop conditions etc. are branches as well). They break prediction in CPU pipelines and instruction preloading and are often source of great performance losses. Don't forget that you can many times compare and use the result of operations without using any branching (e.g. `x = (y == 5) + 1;` instead of `x = (y == 5) ? 2 : 1;`).
@ -23,20 +24,20 @@ These are mainly for [C](c.md), but may be usable in other languages as well.
- **Write [cache-friendly](cache-friendly.md) code** (minimize long jumps in memory).
- **Compare to [0](zero.md) rather than other values**. There's usually an instruction that just checks the zero flag which is faster than loading and comparing two arbitrary numbers.
- **Consider moving computation from run time to compile time**. E.g. if you make a resolution of your game constant (as opposed to a variable), the compiler will be able to partially precompute expressions with the display dimensions and so speed up your program (but you won't be able to dynamically change resolution).
- On some platforms such as ARM the first **arguments to a function may be passed via registers**, so it may be better to have fewer parameters in functions.
- **Optimize when you already have a working code**. As Donald Knuth put it: "premature optimization is the root of all evil". Nevertheless you should get used to simple nobrainer efficient patterns by default and just write them automatically.
- **Use your own caches where they help**, for example if you're frequently working with some database item you better pull it to memory and work with it there, then write it back once you're done (as opposed to communicating with the DB there and back).
- **[Single compilation unit](single_compilation_unit.md) (one big program without linking) can help compiler optimize better** because it can see the whole code at once, not just its parts. It will also make your program compile faster.
- On some platforms such as [ARM](arm.md) the first **arguments to a function may be passed via registers**, so it may be better to have fewer parameters in functions.
- **Optimize when you already have a working code**. As [Donald Knuth](knuth.md) put it: "premature optimization is the root of all evil". Nevertheless you should get used to simple nobrainer efficient patterns by default and just write them automatically.
- **Use your own [caches](cache.md) where they help**, for example if you're frequently working with some database item you better pull it to memory and work with it there, then write it back once you're done (as opposed to communicating with the DB there and back).
- **[Single compilation unit](single_compilation_unit.md) (one big program without [linking](linking.md)) can help compiler optimize better** because it can see the whole code at once, not just its parts. It will also make your program compile faster.
- Search literature for **algorithms with better [complexity class](complexity_class.md)** (sorts are a nice example).
- For the sake of embedded platforms **avoid [floating point](floating_point.md)** as that is often painfully slowly emulated in software. Use [fixed point](fixed_point.md).
- **Early branching can create a speed up** (instead of branching inside the loop create two versions of the loop and branch in front of them). This is a kind of space-time tradeoff.
- For the sake of simple computers such as [embedded](embedded.md) platforms **avoid [floating point](floating_point.md)** as that is often painfully slowly emulated in software. Use [fixed point](fixed_point.md), or at least offer it as a [fallback](fallback.md). This also applies to other hardware requirements such as [GPU](gpu.md) or sound cards: while such hardware accelerates your program on computers that have the hardware, making use of it may lead to your program being slower on computers that lack it.
- **[Early branching](early_branching.md) can create a speed up** (instead of branching inside the loop create two versions of the loop and branch in front of them). This is a kind of space-time tradeoff.
- **Reuse variable to save space**. A warning about this one: readability may suffer, mainstreamers will tell you you're going against "good practice", and some compilers may do this automatically anyway. Be sure to at least make this clear in your comments. Anyway, on a lower level and/or with dumber compilers you can just reuse variables that you used for something else rather than creating a new variable that takes additional RAM; the only prerequisite for "merging" variables is that the variables aren't used at the same time.
- **What's fast on one platform may be slow on another**. This depends on the instruction set as well as on compiler, operating system, quirks of the hardware and other details. You always need to test on the hardware itself.
- **You can optimize critical parts of code in [assembly](assembly.md)**, i.e. manually write the assembly code that takes most of the running time of the program, with as few and as inexpensive instructions as possible (but beware, popular compilers are very smart and it's often hard to beat them). But note that such code loses portability! So ALWAYS have a C (or whatever language you are using) [fallback](fallback.md) code for other platforms, use [ifdefs](ifdef.md) to switch to the fallback version on platforms running on different assembly languages.
- **What's fast on one platform may be slow on another**. This depends on the instruction set as well as on compiler, operating system, available hardware, [driver](driver.md) implementation and other details. In the end you always need to test on the specific platform to be sure about how fast it will run.
- **You can optimize critical parts of code in [assembly](assembly.md)**, i.e. manually write the assembly code that takes most of the running time of the program, with as few and as inexpensive instructions as possible (but beware, popular compilers are very smart and it's often hard to beat them). But note that such code loses [portability](portability.md)! So ALWAYS have a C (or whatever language you are using) [fallback](fallback.md) code for other platforms, use [ifdefs](ifdef.md) to switch to the fallback version on platforms running on different assembly languages.
## When To Actually Optimize?
Nubs often ask this. Generally fine, sophisticated optimization should come as one of the last steps in development, when you actually have a working thing. These are optimizations requiring significant energy/time to implement -- you don't want to spend resources on this at the stage when they may well be dropped in the end, or they won't matter because they'll be outside the bottleneck. However there are two "exceptions".
Nubs often ask this and this can also be a very nontrivial question. Generally fine, sophisticated optimization should come as one of the last steps in development, when you actually have a working thing. These are optimizations requiring significant energy/time to implement -- you don't want to spend resources on this at the stage when they may well be dropped in the end, or they won't matter because they'll be outside the bottleneck. However there are two "exceptions".
The highest-level optimization is done as part of the initial design of the program, before any line of code gets written. This includes the choice of data structures and mathematical models you're going to be using, the very foundation around which you'll be building your castle. This happens in your head at the time you're forming an idea for a program, e.g. you're choosing between [server-client](server_client.md) or [P2P](p2p.md), [monolithic or micro kernel](kernel.md), [raytraced](ray_tracing.md) or [rasterized](rasterization.md) graphics etc. These choices affect greatly the performance of your program but can hardly be changed once the program is completed, so they need to be made beforehand. **This requires wide knowledge and experience** as you work by intuition.