Update tuto

This commit is contained in:
Miloslav Ciz 2022-04-02 22:19:38 +02:00
parent d4d83bf6c1
commit decd2bd9d5
6 changed files with 296 additions and 27 deletions

View file

@ -163,7 +163,7 @@ Let's quickly mention how you can read and write values in C so that you can beg
- `printf("%d ");`: Same as above but without a newline.
- `scanf("%d",&x);`: Read a number from input to the variable `x`. Note there has to be `&` in front of `x`.
## Branches and Loops
## Branches and Loops (If, While, For)
When creating [algorithms](algorithm.md), it's not enough to just write linear sequences of commands. Two things (called [control structures](control_structure.md)) are very important to have in addition:
@ -200,7 +200,17 @@ X is not greater than 10.
And it is also smaller than 5.
```
A note on **conditions** in C: a condition is just an expression (variables/functions along with arithmetic operators). The expression is evaluated (computed) and the number that is obtained is interpreted as *true* or *false* like this: in C 0 means false and anything non-0 means true. Even comparison operators like `<` and `>` are technically arithmetic, they compare numbers and yield either 1 or 0.
About **conditions** in C: a condition is just an expression (variables/functions along with arithmetic operators). The expression is evaluated (computed) and the number that is obtained is interpreted as *true* or *false* like this: **in C 0 means false, anything else means true**. Even comparison operators like `<` and `>` are technically arithmetic, they compare numbers and yield either 1 or 0. Some operators commonly used in conditions are:
- `==` (equals): yields 1 if the operands are equal, otherwise 0.
- `!=` (not equal): yields 1 if the operands are NOT equal, otherwise 0.
- `<` (less than): yields 1 if the first operand is smaller than the second, otherwise 0.
- `<=`: yields 1 if the first operand is smaller or equal to the second, otherwise 0.
- `&&` (logical [AND](and.md)): yields 1 if both operands are non-0, otherwise 0.
- `||` (logical [OR](or.md)): yields 1 if at least one operand is non-0, otherwise 0.
- `!` (logical [NOT](not.md)): yields 1 if the operand is 0, otherwise 0.
E.g. an if statement starting as `if (x == 5 || x == 10)` will be true if `x` is either 5 or 10.
Next we have **loops**. There are multiple kinds of loops even though in theory it is enough to only have one kind of loop (there are multiple types out of convenience). The loops in C are:
@ -332,7 +342,7 @@ int main(void)
- `scanf("%c",&answer);` reads a single character from input to the `answer` variable.
- `if (answer == 'n') break;` is a branch that exits the infinite loop with `break` statement if the answer entered was `n` (*no*).
## Functions
## Functions (Subprograms)
Functions are extremely important, no program besides the most primitive ones can be made without them.
@ -492,19 +502,277 @@ Another local variable is `number` -- it is a local variable both in `main` and
And a last thing: keep in mind that not every command you write in C program is a function call. E.g. control structures (`if`, `while`, ...) and special commands (`return`, `break`, ...) are not function calls.
## Header Files, Libraries, Compilation
## Additional Details (Global, Switch, Float, Forward Decls, ...)
## Advanced Data Types
We've skipped a lot of details and small tricks for simplicity. Let's go over some of them. Many of the following things are so called [syntactic sugar](sugar.md): convenient syntax shorthands for common operations.
## Macros
Multiple variables can be defined and assigned like this:
```
int x = 1, y = 2, z;
```
The meaning should be clear, but let's mention that `z` doesn't generally have a defined value here -- it will have a value but you don't know what it is (this may differ between different computers and platforms). See [undefined behavior](undefined_behavior.md).
The following is a shorthand for using operators:
```
x += 1; // same as: x = x + 1;
x -= 10; // same as: x = x - 1;
x *= x + 1; // same as: x = x * (x + 1);
x++; // same as: x = x + 1;
x--; // same as: x = x - 1;
// etc.
```
The last two constructs are called **[incrementing](increment.md)** and **[decrementing](decrement.md)**. This just means adding/substracting 1.
In C there is a pretty unique operator called the **[ternary operator](ternary_operator.md)** (ternary for having three [operands](operand.md)). It can be used in expressions just as any other operators such as `+` or `-`. Its format is:
```
CONDITION ? VALUE1 : VALUE2
```
It evaluates the `CONDITION` and if it's true (non-0), this whole expression will have the value of `VALUE1`, otherwise its value will be `VALUE2`. It allows for not using so many `if`s. For example instead of
```
if (x >= 10)
x -= 10;
else
x = 10;
```
we can write
```
x = x >= 10 ? x - 10 : 10;
```
**[Global variables](global_variable.md)**: we can create variables even outside function bodies. Recall than variables inside functions are called *local*; variables outside functions are called *global* -- they can basically be accessed from anywhere and can sometimes be useful. For example:
```
#include <stdio.h>
#include <stdlib.h> // for rand()
int money = 0; // total money, global variable
void printMoney(void)
{
printf("I currently have $%d.\n",money);
}
void playLottery(void)
{
puts("I'm playing lottery.");
money -= 10; // price of lottery ticket
if (rand() % 5)
{
money += 100;
puts("I've won!");
}
else
puts("I've lost!");
printMoney();
}
void work(void)
{
puts("I'm going to work :(");
money += 200; // salary
printMoney();
}
int main()
{
work();
playLottery();
work();
playLottery();
return 0;
}
```
In C programs you may encounter a **switch** statement -- it is a control structure similar to a branch `if` which can have more than two branches. It looks like this:
```
switch (x)
{
case 0: puts("X is zero. Don't divide by it."); break;
case 69: puts("X is 69, haha."); break;
case 42: puts("X is 42, the answer to everything."); break;
default: printf("I don't know anything about X."); break;
}
```
Switch can only compare exact values, it can't e.g. check if a value is greater than something. Each branch starts with the keyword `case`, then the match value follows, then there is a colon (`:`) and the branch commands follow. IMPORTANT: there has to be the `break;` statement at the end of each case branch (we won't go into details). A special branch is the one starting with the word `default` that is executed if no case label was matched.
Let's also mention some additional data types we can use in programs:
- `char`: A single text character such as *'a'*, *'G'* or *'_'*. We can assign characters as `char c = 'a';` (single characters are enclosed in apostrophes similarly to how text strings are inside quotes). We can read a character as `c = getchar();` and print it as `putchar(c);`. Special characters that can be used are `\n` (newline) or `\t` (tab). Characters are in fact small numbers (usually with 256 possible values) and can be used basically anywhere a number can be used (for example we can compare characters, e.g. `if (c < 'b') ...`). Later we'll see characters are basic building blocks of text strings.
- `unsigned int`: Integer that can only take positive values or 0 (i.e. no negative values). It can store higher positive values than normal `int` (which is called a *signed int*).
- `long`: Big integer, takes more memory but can store number in the range of at least a few billion.
- `float` and `double`: [Floating point](float.md) number (`double` is bigger and more precise than `float`) -- an approximation of [real numbers](real_number.md), i.e. numbers with a fractional part such as 2.5 or 0.0001.
In the section about functions we said a function can only call a function that has been defined before it in the source code -- this is because the compiler read the file from start to finish and if you call a function that hasn't been defined yet, it simply doesn't know what to call. But sometimes we need to call a function that will be defined later, e.g. in cases where two functions call each other (function *A* calls function *B* in its code but function *B* also calls function *A*). For this there exist so called **[forward declaractions](forward_decl.md)** -- a forward declaration is informing that a function of certain name (and with certain parameters etc.) will be defined later in the code. Forward declaration look the same as a function definition, but it doesn't have a body (the part between `{` and `}`), instead it is terminated with a semicolon (`;`). Here is an example:
```
#include <stdio.h>
void printDecorated2(int x, int fancy); // forward declaration
void printDecorated1(int x, int fancy)
{
putchar('~');
if (fancy)
printDecorated2(x,0); // would be error without f. decl.
else
printf("%d",x);
putchar('~');
}
void printDecorated2(int x, int fancy)
{
putchar('>');
if (fancy)
printDecorated1(x,0);
else
printf("%d",x);
putchar('<');
}
int main()
{
printDecorated1(10,1);
putchar('\n'); // newline
printDecorated2(20,1);
}
```
which prints
```
~>10<~
>~20~<
```
The functions `printDecorated1` and `printDecorated2` call each other, so this is the case when we have to use a forward declaration of `printDecorated2`. Also note the condition `if (fancy)` which is the same thing as `if (fancy != 0)` (imagine `fancy` being 1 and 0 and about what the condition evaluates to in each case).
## Header Files, Libraries, Compilation/Building
So far we've only been writing programs into a single source code file (such as `program.c`). More complicated programs consist of multiple files and libraries -- we'll take a look at this now.
In C we normally deal with two types of source code files:
- *.c files*: These files contain so called **[implementation](implementation.md)** of algorithms, i.e. code that translates into actual program instructions. These files are what's handed to the compiler.
- *.h files*, or **[header files](header_file.md)**: These files typically contain **declarations** such as constants and function headers (but not their bodies, i.e. implementations).
When we have multiple source code files, we typically have pairs of *.c* and *.h* files. E.g. if there is a library called *mathfunctions*, it will consist of files *mathfunctions.c* and *mathfunctions.h*. The *.h* file will contain the function headers (in the same manner as with forward declarations) and constants such as [pi](pi.md). The *.c* file will then contain the implementations of all the functions declared in the *.h* file. But why do we do this?
Firstly *.h* files may serve as a nice documentation of the library for programmers: you can simply open the *.h* file and see all the functions the library offers without having to skim over thousands of lines of code. Secondly this is for how multiple source code files are compiled into a single executable program.
Suppose now we're compiling a single file named *program.c* as we've been doing until now. The compilation consists of several steps:
1. The compiler reads the file *program.c* and makes sense of it.
2. It then creates an intermediate file called *program.o*. This is called an [object file](object_file.md) and is a binary compiled file which however cannot yet be run because it is not *linked* -- in this code all memory addresses are relative and it doesn't yet contain the code from external libraries (e.g. the code of `printf`).
3. The compiler then runs a **[linker](linker.md)** which takes the file *program.o* and the object files of libraries (such as the *stdio* library) and it puts them all together into the final executable file called *program*. This is called **linking**; the code from the libraries is copied to complete the code of our program and the memory addresses are settled to some specific values.
So realize that when the compiler is compiling our program (*program.c*), which contains function such as `printf` from a separate library, it doesn't have the code of these functions available -- this code is not in our file. Recall that if we want to call a function, it must have been defined before and so in order for us to be able to call `printf`, the compiler must know about it. This is why we include the *stdio* library at the top of our source code with `#include <stdio.h>` -- this basically copy-pastes the content of the header file of the *stdio* library to the top of our source code file. In this header there are forward declarations of functions such as `printf`, so the compiler now knows about them (it knows their name, what they return and what parameters they take) and we can call them.
Let's see a small example. We'll have the following files (all in the same directory).
*library.h* (the header file):
```
// Returns the square of n.
int square(int n);
```
*library.c* (the implementation file):
```
int square(int x)
{
// function implementation
return x * x;
}
```
*program.c* (main program):
```
#include <stdio.h>
#include "library.h"
int main(void)
{
int n = square(5);
printf("%d\n",n);
return 0;
}
```
Now we will manually compile the library and the final program. First let's compile the library, in command line run:
```
gcc -c -o library.o library.c
```
The `-c` flag tells the compiler to only compile the file, i.e. only generate the object (*.o*) file without trying to link it. After this command a file *library.o* should appear. Next we compile the main program in the same way:
```
gcc -c -o program.o program.c
```
This will generate the file *program.o*. Note that during this process the compiler is working only with the *program.c* file, it doesn't know the code of the function `square`, but it knows this function exists, what it returns and what parameter it has thanks to us including the library header *library.h* with `#include "library.h"` (quotes are used instead of `<` and `>` to tell the compiler to look for the files in the current directory).
Now we have the file *program.o* in which the compiled `main` function resides and file *library.o* in which the compiled function `square` resides. We need to link them together. This is done like this:
```
gcc -o program program.o library.o
```
For linking we don't need to use any special flag, the compiler knows that if we give it several *.o* files, it is supposed to link them. The file *program* should appear that we can already run and it should print
```
25
```
This is the principle of compiling multiple C files (and it also allows for combining C with other languages). This process is normally automated, but you should know how it works. The systems that automate this action are called **[build systems](build_system.md)**, they are for example [Make](make.md) and [Cmake](cmake.md). When using e.g. the Make system, the whole codebase can be built with a single command `make` in the command line.
Some programmers simplify this whole process further so that they don't even need a build system, e.g. with so called [header-only libraries](header_only.md), but this is outside the scope of this tutorial.
As a bonus, let's see a few useful compiler flags:
- `-O1`, `-O2`, `-O3`: Optimize for speed (higher number means better optimization). Adding `-O3` normally instantly speeds up your program. This is recommended.
- `-Os`: Optimize for size, the same as above but the compiler will try to make as small executable as possible.
- `-Wall -Wextra -pedantic`: The compiler will write more warnings and will be more strict. This can help spot many bugs.
- `-c`: Compile only (generate object files, do not link).
- `-g`: Include debug symbols, this will be important for [debugging](debugging.md).
## Advanced Data Types and Variables (Structs, Arrays)
## Macros/Preprocessor
## Pointers
## More on Functions
## More on Functions (Recursion, Function Pointers)
## Dynamic Allocation
## Dynamic Allocation (Malloc)
## Debugging
## Debugging, Optimization
## Advanced Stuff