# C Tutorial This is a relatively quick WIP [C](c.md) tutorial. You should probably know at least the completely basic ideas of programming before reading this (what's a [programming language](programming_language.md), [source code](source_code.md), [command line](cli.md) etc.). If you're as far as already knowing another language, this should be pretty easy to understand. ## About C and Programming [C](c.md) is - A **[programming language](programming_language.md)**, i.e. a language that lets you express [algorithms](algorithm.md). - [Compiled](compiled.md) language (as opposed to [interpreted](interpreted.md)), i.e. you have to compile the code you write (with [compiler](compiler.md)) in order to obtain a [native](native.md) executable program (a binary file that you can run directly). - Extremely **fast and efficient**. - Very **widely supported and portable** to almost anything. - **[Low level](low_level.md)**, i.e. there is relatively little [abstraction](abstraction.md) and not many comfortable built-in functionality such as [garbage collection](garbage_collection.md), you have to write many things yourself, you will deal with [pointers](pointer.md), [endianness](endianness.md) etc. - [Imperative](imperative.md) (based on sequences of commands), without [object oriented programming](oop.md). - Considered **hard**, but in certain ways it's simple, it lacks [bloat](bloat.md) and [bullshit](bullshit.md) of "[modern](modern.md)" languages which is an essential thing. It will take long to learn (don't worry, not nearly as long as learning a foreign language) but it's the most basic thing you should know if you want to create good software. You won't regret. - **Not holding your hand**, i.e. you may very easily "shoot yourself in your foot" and crash your program. This is the price for the language's power. - Very old, well established and tested by time. - Recommended by us for serious programs. If you come from a language like [Python](python.md) or [JavaScript](javascript.md), you may be shocked that C doesn't come with its own [package manager](package_manager.md), [debugger](debugger.md) or [build system](build_system.md), it doesn't have [modules](module.md), [generics](generics.md), [garabage collection](garbage_collection.d), [OOP](oop.md), [hashmaps](hashmap.md), dynamic [lists](list.md), [type inference](type_inference.md) and similar "[modern](modern.md)" featured. When you truly get into C, you'll find it's a good thing. Programming in C works like this: 1. You write a C source code into a file. 2. You compile the file with a C [compiler](compiler.md) such as [gcc](gcc.md) (which is just a program that turns source code into a runnable program). This gives you the executable program. 3. You run the program, test it, see how it works and potentially get back to modifying the source code (step 1). So, for writing the source code you'll need a [text editor](text_editor.md); any [plain text](plain_text.md) editor will do but you should use some that can highlight C [syntax](syntax.md) -- this helps very much when programming and is practically a necessity. Ideal editor is [vim](vim.md) but it's a bit difficult to learn so you can use something as simple as [Gedit](gedit.md) or [Geany](geany.md). We do NOT recommend using huge programming [IDEs](ide.md) such as "VS Code" and whatnot. You definitely can NOT use an advanced document editor that can format text such as [LibreOffice](libreoffice.md) or that [shit](shit.md) from Micro$oft, this won't work because it's not plain text. Next you'll need a C [compiler](compiler.md), the program that will turn your source code into a runnable program. We'll use the most commonly used one called [gcc](gcc.md) (you can try different ones such as [clang](clang.md) or [tcc](tcc.md) if you want). If you're on a [Unix](unix.md)-like system such as [GNU](gnu.md)/[Linux](linux.md) (which you probably should), gcc is probably already installed. Open up a terminal and write `gcc` to see if it's installed -- if not, then install it (e.g. with `sudo apt install build-essential` if you're on a Debian-based system). If you're extremely lazy, there are online web C compilers that work in a web browser (find them with a search engine). You can use these for quick experiments but note there are some limitations (e.g. not being able to work with files), and you should definitely know how to compile programs yourself. Last thing: there are multiple standards of C. Here we will be covering [C99](c99.md), but this likely doesn't have to bother you at this point. ## First Program Let's quickly try to compile a tiny program to test everything and see how everything works in practice. Open your text editor and paste this code: ``` /* simple C program! */ #include // include IO library int main(void) { puts("It works."); return 0; } ``` Save this file and name it `program.c`. Then open a terminal emulator (or an equivalent command line interface), locate yourself into the directory where you saved the file (e.g. `cd somedirectory`) and compile the program with the following command: ``` gcc -o program program.c ``` The program should compile and the executable `program` should appear in the directory. You can run it with ``` ./program ``` And you should see ``` It works. ``` written in the command line. Now let's see what the source code means: - `/* simple C program! */` is so called *block comment*, it does nothing, it's here only for the humans that will read the source code. Such comments can be almost anywhere in the code. The comment starts at `/*` and ends with `*/`. - `// include IO library` is another comment, but this is a *line comment*, it starts with `//` and ends with the end of line. - `#include ` tells the compiler we want to include a library named *stdio* (the weird [syntax](syntax.md) will be explained in the future). This is a standard library with input output functions, we need it to be able to use the function `puts` later on. We can include more libraries if we want to. These includes are almost always at the very top of the source code. - `int main(void)` is the start of the main program. What exactly this means will be explained later, for now just remember there has to be this function named `main` in basically every program -- inside it there are commands that will be executed when the program is run. Note that the curly brackets that follow (`{` and `}`) denote the block of code that belongs to this function, so we need to write our commands between these brackets. - `puts("It works.");` is a "command" for printing text strings to the command line (it's a command from the `stdio` library included above). Why exactly this is written like this will be explained later, but for now notice the following. The command starts with its name (`puts`, for *put string*), then there are left and right brackets (`(` and `)`) between which there are arguments to the command, in our case there is one, the text string `"It works."`. Text strings have to be put between quotes (`"`), otherwise the compiler would think the words are other commands (the quotes are not part of the string itself, they won't be printed out). The command is terminated by `;` -- all "normal" commands in C have to end with a semicolon. - `return 0;` is another "command", it basically tells the operating system that everything was terminated successfully (`0` is a code for success). This command is an exception in that it doesn't have to have brackets (`(` and `)`). This doesn't have to bother us too much now, let's just remember this will always be the last command in our program. Also notice how the source code is formatted, e.g. the indentation of code withing the `{` and `}` brackets. White characters (spaces, new lines, tabs) are ignored by the compiler so we can theoretically write our program on a single line, but that would be unreadable. We use indentation, spaces and empty lines to format the code to be well readable. To sum up let's see a general structure of a typical C program. You can just copy paste this for any new program and then just start writing commands in the `main` function. ``` #include // include the I/O library // more libraries can be included here int main(void) { // write commands here return 0; // always the last command } ``` ## Variables, Arithmetic, Data Types Programming is a lot like mathematics, we compute equations and transform numerical values into other values. You probably know in mathematics we use *variables* such as *x* or *y* to denote numerical values that can change (hence variables). In programming we also use variables -- here **[variable](variable.md) is a place in memory which has a name**. We can create variables named `x`, `y`, `myVariable` or `score` and then store specific values (for now let's only consider numbers) into them. We can read from and write to these variables at any time. These variables physically reside in [RAM](ram.md), but we don't really care where exactly (at which address) they are located -- this is e.g. similar to houses, in common talk we normally say things like *John's house* or *the pet store* instead of *house with address 3225*. Variable names can't start with a digit (and they can't be any of the [keywords](keyword.md) reserved by C). By convention they also shouldn't be all uppercase or start with uppercase (these are normally used for other things). Normally we name variables like this: `myVariable` or `my_variable` (pick one style, don't mix them). In C as in other languages each variable has a certain **[data type](data_type.md)**; that is each variable has associated an information of what kind of data is stored in it. This can be e.g. a *whole number*, *fraction*, a *text character*, *text string* etc. Data types are a more complex topic that will be discussed later, for now we'll start with the most basic one, the **integer type**, in C called `int`. An `int` variable can store whole numbers in the range of at least -32768 to 32767 (but usually much more). Let's see an example. ``` #include int main(void) { int myVariable; myVariable = 5; printf("%d\n",myVariable); myVariable = 8; printf("%d\n",myVariable); } ``` - `int myVariable;` is so called **variable declaration**, it tells the compiler we are creating a new variable with the name `myVariable` and data type `int`. Variables can be created almost anywhere in the code (even outside the `main` function) but that's a topic for later. - `myVariable = 5;` is so called **variable assignment**, it stores a value 5 into variable named `myVariable`. IMPORTANT NOTE: the `=` does NOT signify mathematical equality but an assignment (equality in C is written as `==`); when compiler encounters `=`, it simply takes the value on the right of it and writes it to the variable on the left side of it. Sometimes people confuse assignment with an equation that the compiler solves -- this is NOT the case, assignment is much more simple, it simply writes a value into variable. So `x = x + 1;` is a valid command even though mathematically it would be an equation without a solution. - `printf("%d\n",myVariable);` prints out the value currently stored in `myVariable`. Don't get scared by this complicated command, it will be explained later (once we learn about [pointers](pointer.md)). For now only know this prints the variable content. - `myVariable = 8;` assigns a new value to `myVariable`, overwriting the old. - `printf("%d\n",myVariable);` again prints the value in `myVariable`. After compiling and running of the program you should see: ``` 5 8 ``` Last thing to learn is **arithmetic operators**. They're just normal math operators such as +, - and /. You can use these along with brackets (`(` and `)`) to create **[expressions](expression.md)**. Expressions can contain variables and can themselves be used in many places where variables can be used (but not everywhere, e.g. on the left side of variable assignment, that would make no sense). E.g.: ``` #include int main(void) { int heightCm = 175; int weightKg = 75; int bmi = (weightKg * 10000) / (heightCm * heightCm); printf("%d\n",bmi); } ``` calculates and prints your BMI (body mass index). Let's quickly mention how you can read and write values in C so that you can begin to experiment with your own small programs. You don't have to understand the following [syntax](syntax.md) as of yet, it will be explained later, now simply copy-paste the commands: - `puts("hello");`: Prints a text string with newline. - `printf("hello");`: Same as above but without newline. - `printf("%d\n",x);`: Prints the value of variable `x` with newline. - `printf("%d ");`: Same as above but without a newline. - `scanf("%d",&x);`: Read a number from input to the variable `x`. Note there has to be `&` in front of `x`. ## Branches and Loops (If, While, For) When creating [algorithms](algorithm.md), it's not enough to just write linear sequences of commands. Two things (called [control structures](control_structure.md)) are very important to have in addition: - **[branches](branch.md)**: Conditionally executing or skipping certain commands (e.g. if a user enters password we want to either log him in if the password was correct or write error if the password was incorrect). This is informally known as **"if-then-else"**. - **[loops](loop.md)** (also called **iteration**): Repeating certain commands given number of times or as long as some condition holds (e.g. when searching a text we repeatedly compare words one by one to the searched word until a match is found or end of text is reached). Let's start with **branches**. In C the command for a branch is `if`. E.g.: ``` if (x > 10) puts("X is greater than 10."); ``` The [syntax](syntax.md) is given, we start with `if`, then brackets (`(` and `)`) follow inside which there is a condition, then a command or a block of multiple commands (inside `{` and `}`) follow. If the condition in brackets hold, the command (or block of commands) gets executed, otherwise it is skipped. Optionally there may be an *else* branch which is gets executed only if the condition does NOT hold. It is denoted with the `else` keyword which is again followed by a command or a block of multiple commands. Branching may also be nested, i.e. branches may be inside other branches. For example: ``` if (x > 10) puts("X is greater than 10."); else { puts("X is not greater than 10."); if (x < 5) puts("And it is also smaller than 5."); } ``` So if `x` is equal e.g. 3, the output will be: ``` X is not greater than 10. And it is also smaller than 5. ``` About **conditions** in C: a condition is just an expression (variables/functions along with arithmetic operators). The expression is evaluated (computed) and the number that is obtained is interpreted as *true* or *false* like this: **in C 0 means false, anything else means true**. Even comparison operators like `<` and `>` are technically arithmetic, they compare numbers and yield either 1 or 0. Some operators commonly used in conditions are: - `==` (equals): yields 1 if the operands are equal, otherwise 0. - `!=` (not equal): yields 1 if the operands are NOT equal, otherwise 0. - `<` (less than): yields 1 if the first operand is smaller than the second, otherwise 0. - `<=`: yields 1 if the first operand is smaller or equal to the second, otherwise 0. - `&&` (logical [AND](and.md)): yields 1 if both operands are non-0, otherwise 0. - `||` (logical [OR](or.md)): yields 1 if at least one operand is non-0, otherwise 0. - `!` (logical [NOT](not.md)): yields 1 if the operand is 0, otherwise 0. E.g. an if statement starting as `if (x == 5 || x == 10)` will be true if `x` is either 5 or 10. Next we have **loops**. There are multiple kinds of loops even though in theory it is enough to only have one kind of loop (there are multiple types out of convenience). The loops in C are: - **while**: Loop with condition at the beginning. - **do while**: Loop with condition at the end, not used so often so we'll ignore this one. - **for**: Loop executed a fixed number of times. This is a very common case, that's why there is a special loop for it. The **while** loop is used when we want to repeat something without knowing in advance how many times we'll repeat it (e.g. searching a word in text). It starts with the `while` keyword, is followed by brackets with a condition inside (same as with branches) and finally a command or a block of commands to be looped. For instance: ``` while (x > y) // as long as x is greater than y { printf("%d %d\n",x,y); // prints x and y x = x - 1; // decrease x by 1 y = y * 2; // double y } puts("The loop ended."); ``` If `x` and `y` were to be equal 100 and 20 (respectively) before the loop is encountered, the output would be: ``` 100 20 99 40 98 60 97 80 The loop ended. ``` The **for** loop is executed a fixed number of time, i.e. we use it when we know in advance how many time we want to repeat our commands. The [syntax](syntax.md) is a bit more complicated: it starts with the keywords `for`, then brackets (`(` and `)`) follow and then the command or a block of commands to be looped. The inside of the brackets consists of an initialization, condition and action separated by semicolon (`;`) -- don't worry, it is enough to just remember the structure. A for loop may look like this: ``` puts("Counting until 5..."); for (int i = 0; i < 5; ++i) printf("%d\n",i); // prints i ``` `int i = 0` creates a new temporary variable named `i` (name normally used by convention) which is used as a **counter**, i.e. this variable starts at 0 and increases with each iteration (cycle), and it can be used inside the loop body (the repeated commands). `i < 5` says the loop continues to repeat as long as `i` is smaller than 5 and `++i` says that `i` is to be increased by 1 after each iteration (`++i` is basically just a shorthand for `i = i + 1`). The above code outputs: ``` Counting until 5... 0 1 2 3 4 ``` IMPORTANT NOTE: in programming we **count from 0**, not from 1 (this is convenient e.g. in regards to [pointers](pointer.md)). So if we count to 5, we get 0, 1, 2, 3, 4. This is why `i` starts with value 0 and the end condition is `i < 10` (not `i <= 10`). Generally if we want to repeat the `for` loop *N* times, the format is `for (int i = 0; i < N; ++i)`. Any loop can be exited at any time with a special command called `break`. This is often used with so called infinite loop, a *while* loop that has `1` as a condition; recall that 1 means true, i.e. the loop condition always holds and the loop never ends. `break` allows us to place conditions in the middle of the loop and into multiple places. E.g.: ``` while (1) // infinite loop { x = x - 1; if (x == 0) break; // this exits the loop! y = y / x; } ``` The code above places a condition in the middle of an infinite loop to prevent division by zero in `y = y / x`. Again, loops can be nested (we may have loops inside loops) and also loops can contain branches and vice versa. ## Simple Game: Guess a Number With what we've learned so far we can already make a simple [game](game.md): guess a number. The computer thinks a random number in range 0 to 9 and the user has to guess it. The source code is following. ``` #include #include #include int main(void) { srand(clock()); // random seed while (1) // infinite loop { int randomNumber = rand() % 10; puts("I think a number. What is it?"); int guess; scanf("%d",&guess); // read the guess getchar(); if (guess == randomNumber) puts("You guessed it!"); else printf("Wrong. The number was %d.\n",randomNumber); puts("Play on? [y/n]"); char answer; scanf("%c",&answer); // read the answer if (answer == 'n') break; } puts("Bye."); return 0; // return success, always here } ``` - `#include `, `#include `: we're including additional libraries because we need some specific functions from them (`rand`, `srand`, `clock`). - `srand(clock());`: don't mind this line too much, its purpose is to [seed](seed.md) a pseudorandom number generator. Without doing this the game would always generate the same sequence of random numbers when run again. - `while (1)` is an infinite game loop -- it runs over and over, in each cycle we perform one game round. The loop can be exited with the `break` statement later on (if the user answers he doesn't want to continue playing). - `int randomNumber = rand() % 10;`: this line declares a variable named `randomNumber` and immediately assigns a value to it. The value is a random number from 0 to 9. This is achieved with a function `rand` (from the above included `stdlib` library) which returns a random number, and with the modulo (remainder after division) arithmetic operator (`%`) which ensures the number is in the correct range (less than 10). - `int guess;` creates another variable in which we'll store the user's guessed number. - `scanf("%d",&guess);` reads a number from the input to the variable named `guess`. Again, don't be bothered by the complicated structure of this command, for now just accept that this is how it's done. - `getchar();`: don't mind this line, it just discards a newline character read from the input. - `if (guess == randomNumber) ...`: this is a branch which checks if the user guess is equal to the generated random number. If so, a success message is printed out. If not, a fail message is printed out along with the secret number. Note we use the `puts` function for the first message as it only prints a text sting, while for the latter message we have to use `printf`, a more complex function, because that requires inserting a number into the printed string. More on these functions later. - `char answer;` declares a variable to store user's answer to a question of whether to play on. It is of `char` data type which can store a single text character. - `scanf("%c",&answer);` reads a single character from input to the `answer` variable. - `if (answer == 'n') break;` is a branch that exits the infinite loop with `break` statement if the answer entered was `n` (*no*). ## Functions (Subprograms) Functions are extremely important, no program besides the most primitive ones can be made without them. **[Function](function.md) is a subprogram** (in other languages functions are also called procedures or subroutines), i.e. it is code that solves some smaller subproblem that you can repeatedly invoke, for instance you may have a function for computing a [square root](sqrt.md), for encrypting data or for playing a sound from speakers. We have already met functions such as `puts`, `printf` or `rand`. Functions are similar to but **NOT the same as mathematical functions**. Mathematical function (simply put) takes a number as input and outputs another number computed from the input number, and this output number depends only on the input number and nothing else. C functions can do this too but they can also do additional things such as modify variables in other parts of the program or make the computer do something (such as play a sound or display something on the screen) -- these are called **[side effects](side_effect.md)**; things done besides computing and output number from an input number. For distinction mathematical functions are called *pure* functions and functions with side effects are called non-pure. **Why are function so important?** Firstly they help us divide a big problem into small subproblems and make the code better organized and readable, but mainly they help us respect the [DRY](dry.md) (*Don't Repeat Yourself*) principle -- this is extremely important in programming. Imagine you need to solve a [quadratic equation](quadratic_equation.md) in several parts of your program; you do NOT want to solve it in each place separately, you want to make a function that solves a quadratic equation and then only invoke (call) that function anywhere you need to solve your quadratic equation. This firstly saves space (source code will be shorter and compiled program will be smaller), but it also makes your program manageable and eliminates bugs -- imagine you find a better (e.g. faster) way to solving quadratic equations; without functions you'd have to go through the whole code and change the algorithm in each place separately which is impractical and increases the chance of making errors. With functions you only change the code in one place (in the function) and in any place where your code invokes (calls) this function the new better and updated version of the function will be used. Besides writing programs that can be directly executed programmers write **[libraries](library.md)** -- collections of functions that can be used in other projects. We have already seen libraries such as *stdio*, *standard input/output library*, a standard (official, bundled with every C compiler) library for input/output (reading and printing values); *stdio* contains functions such as `puts` which is used to printing out text strings. Examples of other libraries are the standard *math* library containing function for e.g. computing [sine](sine.md), or [SDL](sdl.md), a 3rd party multimedia library for such things as drawing to screen, playing sounds and handling keyboard and mouse input. Let's see a simple example of a function that writes out a temperature in degrees of Celsius and well as in Kelvin: ``` #include void writeTemperature(int celsius) { int kelvin = celsius + 273; printf("%d C (%d K)\n",celsius,kelvin); } int main(void) { writeTemperature(-50); writeTemperature(0); writeTemperature(100); return 0; } ``` The output is ``` -50 C (223 K) 0 C (273 K) 100 C (373 K) ``` Now imagine we decide we also want our temperatures in Fahrenheit. We can simply edit the code in `writeTemperature` function and the program will automatically be writing temperatures in the new way. Let's see how to create and invoke functions. Creating a function in code is done between inclusion of libraries and the `main function`, and we formally call this **defining a function**. The function definition format is following: ``` RETURN_TYPE FUNCTION_NAME(FUNCTION_PARAMETERS) { FUNCTION_BODY } ``` - `RETURN_TYPE` is the [data type](data_type.md) the function returns. A function may or may not return a certain value, just as the pure mathematical function do. This may for example be `int`, if the function returns an integer number. If the function doesn't return anything, this type is `void`. - `FUNCTION_NAME` is the name of the function, it follows the same rules as the names for variables. - `FUNCTION_PARAMETERS` specifies the input values of the function. The function can take any number of parameters (e.g. a function `playBeep` may take 0 arguments, `sine` function takes 1, `logarithm` may take two etc.). This list is comma-separated and each item consists of the parameter data type and name. If there are 0 parameters, there should be the word `void` inside the brackets, but compilers tolerate just having empty brackets. - `FUNCTION_BODY` are the commands executed by the function, just as we know them from the *main* function. Let's see another function: ``` #include int power(int x, int n) { int result = 1; for (int i = 0; i < n; ++i) // repeat n times result = result * x; return result; } int main(void) { for (int i = 0; i < 5; ++i) { int powerOfTwo = power(2,i); printf("%d\n",powerOfTwo); } return 0; } ``` The output is: ``` 2 4 8 16 ``` The function power takes two parameters: `x` and `n`, and returns `x` raised to the `n`s power. Note that unlike the first function we saw here the return type is `int` because this function does return a value. **Notice the command `return`** -- it is a special command that causes the function to terminate and return a specific value. In function that return a value (their return type is not `void`) there has to be a `return` command. In function that return nothing there may or may not be one, and if there is, it has no value after it (`return;`); Let's focus on how we invoke the function -- in programming we say we **call the function**. The function call in our code is `power(2,i)`. If a function returns a value (return type is not `void`), it function call can be used in any expression, i.e. almost anywhere where we can use a variable or a numerical value -- just imagine the function computes a return value and this value is **substituted to the place where we call the function**. For example we can imagine the expression `power(3,1) + power(3,0)` as simply `3 + 1`. If a function return nothing (return type is `void`), it can't be used in expressions, it is used "by itself"; e.g. `playBeep();`. (Function that do return a value can also be used like this -- their return value is in this case simply ignored.) We call a function by writing its name (`power`), then adding brackets (`(` and `)`) and inside them we put **arguments** -- specific values that will substitute the corresponding parameters inside the function (here `x` will take the value `2` and `n` will take the current value of `i`). If the function takes no parameters (the function list is `void`), we simply put nothing inside the brackets (e.g. `playBeep();`); Here comes the nice thing: **we can nest function calls**. For example we can write `x = power(3,power(2,1));` which will result in assigning the variable `x` the value of 9. **Functions can also call other functions** (even themselves, see [recursion](recursion.md)), but only those that have been defined before them in the source code (this can be fixed with so called [forward declarations](forward_decl.md)). Notice that the `main` function we always have in our programs is also a function definition. The definition of this function is required for runnable programs, its name has to be `main` and it has to return `int` (an error code where 0 means no error). It can also take parameters but more on that later. These is the most basic knowledge to have about C functions. Let's see one more example with some pecularities that aren't so important now, but will be later. ``` #include void writeFactors(int x) // writes divisord of x { printf("factors of %d:\n",x); while (x > 1) // keep dividing x by its factors { for (int i = 2; i <= x; ++i) // search for a factor if (x % i == 0) // i divides x without remainder? { printf(" %d\n",i); // i is a factor, write it x = x / i; // divide x by i break; // exit the for loop } } } int readNumber(void) { int number; puts("Please enter a number to factor (0 to quit)."); scanf("%d",&number); return number; } int main(void) { while (1) // infinite loop { int number = readNumber(); // <- function call if (number == 0) // 0 means quit break; writeFactors(number); // <- function call } return 0; } ``` We have defined two functions: `writeFactors` and `readNumber`. `writeFactors` return no values but it has side effects (print text to the command line). `readNumber` takes no parameters but return a value; it prompts the user to enter a value and returns the read value. Notice that inside `writeFactors` we modify its parameter `x` inside the function body -- this is okay, it won't affect the argument that was passed to this function (the `number` variable inside the `main` function won't change after this function call). `x` can be seen as a **[local variable](local_variable.md)** of the function, i.e. a variable that's created inside this function and can only be used inside it -- when `writeFactors` is called inside `main`, a new local variable `x` is created inside `writeFactors` and the value of `number` is copied to it. Another local variable is `number` -- it is a local variable both in `main` and in `readNumber`. Even though the names are the same, these are two different variables, each one is local to its respective function (modifying `number` inside `readNumber` won't affect `number` inside `main` and vice versa). And a last thing: keep in mind that not every command you write in C program is a function call. E.g. control structures (`if`, `while`, ...) and special commands (`return`, `break`, ...) are not function calls. ## More Details (Globals, Switch, Float, Forward Decls, ...) We've skipped a lot of details and small tricks for simplicity. Let's go over some of them. Many of the following things are so called [syntactic sugar](sugar.md): convenient syntax shorthands for common operations. Multiple variables can be defined and assigned like this: ``` int x = 1, y = 2, z; ``` The meaning should be clear, but let's mention that `z` doesn't generally have a defined value here -- it will have a value but you don't know what it is (this may differ between different computers and platforms). See [undefined behavior](undefined_behavior.md). The following is a shorthand for using operators: ``` x += 1; // same as: x = x + 1; x -= 10; // same as: x = x - 1; x *= x + 1; // same as: x = x * (x + 1); x++; // same as: x = x + 1; x--; // same as: x = x - 1; // etc. ``` The last two constructs are called **[incrementing](increment.md)** and **[decrementing](decrement.md)**. This just means adding/substracting 1. In C there is a pretty unique operator called the **[ternary operator](ternary_operator.md)** (ternary for having three [operands](operand.md)). It can be used in expressions just as any other operators such as `+` or `-`. Its format is: ``` CONDITION ? VALUE1 : VALUE2 ``` It evaluates the `CONDITION` and if it's true (non-0), this whole expression will have the value of `VALUE1`, otherwise its value will be `VALUE2`. It allows for not using so many `if`s. For example instead of ``` if (x >= 10) x -= 10; else x = 10; ``` we can write ``` x = x >= 10 ? x - 10 : 10; ``` **[Global variables](global_variable.md)**: we can create variables even outside function bodies. Recall than variables inside functions are called *local*; variables outside functions are called *global* -- they can basically be accessed from anywhere and can sometimes be useful. For example: ``` #include #include // for rand() int money = 0; // total money, global variable void printMoney(void) { printf("I currently have $%d.\n",money); } void playLottery(void) { puts("I'm playing lottery."); money -= 10; // price of lottery ticket if (rand() % 5) // 1 in 5 chance { money += 100; puts("I've won!"); } else puts("I've lost!"); printMoney(); } void work(void) { puts("I'm going to work :("); money += 200; // salary printMoney(); } int main() { work(); playLottery(); work(); playLottery(); return 0; } ``` In C programs you may encounter a **switch** statement -- it is a control structure similar to a branch `if` which can have more than two branches. It looks like this: ``` switch (x) { case 0: puts("X is zero. Don't divide by it."); break; case 69: puts("X is 69, haha."); break; case 42: puts("X is 42, the answer to everything."); break; default: printf("I don't know anything about X."); break; } ``` Switch can only compare exact values, it can't e.g. check if a value is greater than something. Each branch starts with the keyword `case`, then the match value follows, then there is a colon (`:`) and the branch commands follow. IMPORTANT: there has to be the `break;` statement at the end of each case branch (we won't go into details). A special branch is the one starting with the word `default` that is executed if no case label was matched. Let's also mention some additional data types we can use in programs: - `char`: A single text character such as *'a'*, *'G'* or *'_'*. We can assign characters as `char c = 'a';` (single characters are enclosed in apostrophes similarly to how text strings are inside quotes). We can read a character as `c = getchar();` and print it as `putchar(c);`. Special characters that can be used are `\n` (newline) or `\t` (tab). Characters are in fact small numbers (usually with 256 possible values) and can be used basically anywhere a number can be used (for example we can compare characters, e.g. `if (c < 'b') ...`). Later we'll see characters are basic building blocks of text strings. - `unsigned int`: Integer that can only take positive values or 0 (i.e. no negative values). It can store higher positive values than normal `int` (which is called a *signed int*). - `long`: Big integer, takes more memory but can store number in the range of at least a few billion. - `float` and `double`: [Floating point](float.md) number (`double` is bigger and more precise than `float`) -- an approximation of [real numbers](real_number.md), i.e. numbers with a fractional part such as 2.5 or 0.0001. You can print these numbers as `printf("%lf\n",x);` and read them as `scanf("%f",&x);`. Here is a short example with the new data types: ``` #include int main(void) { char c; float f; puts("Enter character."); c = getchar(); // read character puts("Enter float."); scanf("%f",&f); printf("Your character is :%c.\n",c); printf("Your float is %lf\n",f); float fSquared = f * f; int wholePart = f; // this can be done printf("It's square is %lf.\n",fSquared); printf("It's whole part is %d.\n",wholePart); return 0; } ``` Notice mainly how we can assign a `float` value into the variable of `int` type (`int wholePart = f;`). This can be done even the other way around and with many other types. C can do automatic **type conversions** (*[casting](cast.md)*), but of course, some information may be lost in this process (e.g. the fractional part). In the section about functions we said a function can only call a function that has been defined before it in the source code -- this is because the compiler read the file from start to finish and if you call a function that hasn't been defined yet, it simply doesn't know what to call. But sometimes we need to call a function that will be defined later, e.g. in cases where two functions call each other (function *A* calls function *B* in its code but function *B* also calls function *A*). For this there exist so called **[forward declaractions](forward_decl.md)** -- a forward declaration is informing that a function of certain name (and with certain parameters etc.) will be defined later in the code. Forward declaration look the same as a function definition, but it doesn't have a body (the part between `{` and `}`), instead it is terminated with a semicolon (`;`). Here is an example: ``` #include void printDecorated2(int x, int fancy); // forward declaration void printDecorated1(int x, int fancy) { putchar('~'); if (fancy) printDecorated2(x,0); // would be error without f. decl. else printf("%d",x); putchar('~'); } void printDecorated2(int x, int fancy) { putchar('>'); if (fancy) printDecorated1(x,0); else printf("%d",x); putchar('<'); } int main() { printDecorated1(10,1); putchar('\n'); // newline printDecorated2(20,1); } ``` which prints ``` ~>10<~ >~20~< ``` The functions `printDecorated1` and `printDecorated2` call each other, so this is the case when we have to use a forward declaration of `printDecorated2`. Also note the condition `if (fancy)` which is the same thing as `if (fancy != 0)` (imagine `fancy` being 1 and 0 and about what the condition evaluates to in each case). ## Header Files, Libraries, Compilation/Building So far we've only been writing programs into a single source code file (such as `program.c`). More complicated programs consist of multiple files and libraries -- we'll take a look at this now. In C we normally deal with two types of source code files: - *.c files*: These files contain so called **[implementation](implementation.md)** of algorithms, i.e. code that translates into actual program instructions. These files are what's handed to the compiler. - *.h files*, or **[header files](header_file.md)**: These files typically contain **declarations** such as constants and function headers (but not their bodies, i.e. implementations). When we have multiple source code files, we typically have pairs of *.c* and *.h* files. E.g. if there is a library called *mathfunctions*, it will consist of files *mathfunctions.c* and *mathfunctions.h*. The *.h* file will contain the function headers (in the same manner as with forward declarations) and constants such as [pi](pi.md). The *.c* file will then contain the implementations of all the functions declared in the *.h* file. But why do we do this? Firstly *.h* files may serve as a nice documentation of the library for programmers: you can simply open the *.h* file and see all the functions the library offers without having to skim over thousands of lines of code. Secondly this is for how multiple source code files are compiled into a single executable program. Suppose now we're compiling a single file named *program.c* as we've been doing until now. The compilation consists of several steps: 1. The compiler reads the file *program.c* and makes sense of it. 2. It then creates an intermediate file called *program.o*. This is called an [object file](object_file.md) and is a binary compiled file which however cannot yet be run because it is not *linked* -- in this code all memory addresses are relative and it doesn't yet contain the code from external libraries (e.g. the code of `printf`). 3. The compiler then runs a **[linker](linker.md)** which takes the file *program.o* and the object files of libraries (such as the *stdio* library) and it puts them all together into the final executable file called *program*. This is called **linking**; the code from the libraries is copied to complete the code of our program and the memory addresses are settled to some specific values. So realize that when the compiler is compiling our program (*program.c*), which contains function such as `printf` from a separate library, it doesn't have the code of these functions available -- this code is not in our file. Recall that if we want to call a function, it must have been defined before and so in order for us to be able to call `printf`, the compiler must know about it. This is why we include the *stdio* library at the top of our source code with `#include ` -- this basically copy-pastes the content of the header file of the *stdio* library to the top of our source code file. In this header there are forward declarations of functions such as `printf`, so the compiler now knows about them (it knows their name, what they return and what parameters they take) and we can call them. Let's see a small example. We'll have the following files (all in the same directory). *library.h* (the header file): ``` // Returns the square of n. int square(int n); ``` *library.c* (the implementation file): ``` int square(int x) { // function implementation return x * x; } ``` *program.c* (main program): ``` #include #include "library.h" int main(void) { int n = square(5); printf("%d\n",n); return 0; } ``` Now we will manually compile the library and the final program. First let's compile the library, in command line run: ``` gcc -c -o library.o library.c ``` The `-c` flag tells the compiler to only compile the file, i.e. only generate the object (*.o*) file without trying to link it. After this command a file *library.o* should appear. Next we compile the main program in the same way: ``` gcc -c -o program.o program.c ``` This will generate the file *program.o*. Note that during this process the compiler is working only with the *program.c* file, it doesn't know the code of the function `square`, but it knows this function exists, what it returns and what parameter it has thanks to us including the library header *library.h* with `#include "library.h"` (quotes are used instead of `<` and `>` to tell the compiler to look for the files in the current directory). Now we have the file *program.o* in which the compiled `main` function resides and file *library.o* in which the compiled function `square` resides. We need to link them together. This is done like this: ``` gcc -o program program.o library.o ``` For linking we don't need to use any special flag, the compiler knows that if we give it several *.o* files, it is supposed to link them. The file *program* should appear that we can already run and it should print ``` 25 ``` This is the principle of compiling multiple C files (and it also allows for combining C with other languages). This process is normally automated, but you should know how it works. The systems that automate this action are called **[build systems](build_system.md)**, they are for example [Make](make.md) and [Cmake](cmake.md). When using e.g. the Make system, the whole codebase can be built with a single command `make` in the command line. Some programmers simplify this whole process further so that they don't even need a build system, e.g. with so called [header-only libraries](header_only.md), but this is outside the scope of this tutorial. As a bonus, let's see a few useful compiler flags: - `-O1`, `-O2`, `-O3`: Optimize for speed (higher number means better optimization). Adding `-O3` normally instantly speeds up your program. This is recommended. - `-Os`: Optimize for size, the same as above but the compiler will try to make as small executable as possible. - `-Wall -Wextra -pedantic`: The compiler will write more warnings and will be more strict. This can help spot many bugs. - `-c`: Compile only (generate object files, do not link). - `-g`: Include debug symbols, this will be important for [debugging](debugging.md). ## Advanced Data Types and Variables (Structs, Arrays, Strings) Until now we've encountered simple data types such as `int`, `char` or `float`. These identify values which can take single atomic values (e.g. numbers or text characters). Such data types are called **[primitive types](primitive_type.md)**. Above these there exist **[compound data types](compound_type.md)** (also *complex* or *structured*) which are composed of multiple primitive types. They are necessary any advanced program. The first compound type is a structure, or **[struct](struct.md)**. It is a collection of several values of potentially different data types (primitive or compound). The following code shows how a struc can be created and used. ``` #include typedef struct { char initial; // initial of name int weightKg; int heightCm; } Human; int bmi(Human human) { return (human.weightKg * 10000) / (human.heightCm * human.heightCm); } int main(void) { Human carl; carl.initial = 'C'; carl.weightKg = 100; carl.heightCm = 180; if (bmi(carl) > 25) puts("Carl is fat."); return 0; } ``` The part of the code starting with `typedef struct` creates a new data type that we call `Human` (one convention for data type names is to start them with an uppercase character). This data type is a structure consisting of three members, one of type `char` and two of type `int`. Inside the `main` function we create a variable `carl` which is of `Human` data type. Then we set the specific values -- we see that each member of the struct can be accessed using the dot character (`.`), e.g. `carl.weightKg`; this can be used just as any other variable. Then we see the type `Human` being used in the parameter list of the function `bmi`, just as any other type would be used. What is this good for? Why don't we just create global variables such as `carl_initial`, `carl_weightKg` and `carl_heightCm`? In this simple case it might work just as well, but in a more complex code this would be burdening -- imagine we wanted to create 10 variables of type `Human` (`john`, `becky`, `arnold`, ...). We would have to painstakingly create 30 variables (3 for each person), the function `bmi` would have to take two parameters (`height` and `weight`) instead of one (`human`) and if we wanted to e.g. add more information about every human (such as `hairLength`), we would have to manually create another 10 variables and add one parameter to the function `bmi`, while with a struct we only add one member to the struct definition and create more variables of type `Human`. **Structs can be nested**. So you may see things such as `myHouse.groundFloor.livingRoom.ceilingHeight` in C code. Another extremely important compound type is **[array](array.md)** -- a sequence of items, all of which are of the same data type. Each array is specified with its length (number of items) and the data type of the items. We can have, for instance, an array of 10 `int`s, or an array of 235 `Human`s. The important thing is that we can **index** the array, i.e. we access the individual items of the array by their position, and this position can be specified with a variable. This allows for **looping over array items** and performing certain operations on each item. Demonstration code follows: ``` #include #include // for sqrt() int main(void) { float vector[5]; vector[0] = 1; vector[1] = 2.5; vector[2] = 0; vector[3] = 1.1; vector[4] = -405.054; puts("The vector is:"); for (int i = 0; i < 5; ++i) printf("%lf ",vector[i]); putchar('\n'); // newline /* compute vector length with pythagoren theorem: */ float sum = 0; for (int i = 0; i < 5; ++i) sum += vector[i] * vector[i]; printf("Vector length is: %lf\n",sqrt(sum)); return 0; } ``` We've included a new library called `math.h` so that we can use a function for square root (`sqrt`). (If you have trouble compiling the code, add `-lm` flag to the compile command.) `float vector[5];` is a declaration of an array of length 5 whose items are of type `float`. When compiler sees this, it creates a continuous area in memory long enough to store 5 numbers of `float` type, the numbers will reside here one after another. After doing this, we can **index** the array with square brackets (`[` and `]`) like this: `ARRAY_NAME[INDEX]` where `ARRAY_NAME` is the name of the array (here `vector`) and `INDEX` is an expression that evaluates to integer, **starting with 0** and going up to the vector length minus one (remember that **programmers count from zero**). So the first item of the array is at index 0, the second at index 1 etc. The index can be a numeric constant like `3`, but also a variable or a whole expression such as `x + 3 * myFunction()`. Indexed array can be used just like any other variable, you can assign to it, you can use it in expressions etc. This is seen in the example. Trying to access an item beyond the array's bounds (e.g. `vector[100]`) will likely crash your program. Especially important are the parts of code staring with `for (int i = 0; i < 5; ++i)`: this is an iteration over the array. It's a very common pattern that we use whenever we need to perform some action with every item of the array. Arrays can also be multidimensional, but we won't bothered with that right now. Why are arrays so important? They allow us to work with great number of data, not just a handful of numeric variables. We can create an array of million structs and easily work with all of them thanks to indexing and loops, this would be practically impossible without arrays. Imagine e.g. a game of [chess](chess.md); it would be very silly to have 64 plain variables for each square of the board (`squareA1`, `squareA2`, ..., `squareH8`), it would be extremely difficult to work with such code. With an array we can represent the square as a single array, we can iterate over all the squares easily etc. One more thing to mention about arrays is how they can be passed to functions. A function can have as a parameter an array of fixed or unknown length. There is also one exception with arrays as opposed to other types: **if a function has an array as parameter and the function modifies this array, the array passed to the function (the argument) will be modified as well** (we say that arrays are *passed by reference* while other types are *passed by value*). We know this wasn't the case with other parameters such as `int` -- for these the function makes a local copy that doesn't affect the argument passed to the function. The following example shows what's been said: ``` #include // prints an int array of lengt 10 void printArray10(int array[10]) { for (int i = 0; i < 10; ++i) printf("%d ",array[i]); } // prints an int array of arbitrary lengt void printArrayN(int array[], int n) { for (int i = 0; i < n; ++i) printf("%d ",array[i]); } // fills an array with numbers 0, 1, 2, ... void fillArrayN(int array[], int n) { for (int i = 0; i < n; ++i) array[i] = i; } int main(void) { int array10[10]; int array20[20]; fillArrayN(array10,10); fillArrayN(array20,20); printArray10(array10); putchar('\n'); printArrayN(array20,20); return 0; } ``` The function `printArray10` has a fixed length array as a parameter (`int array[10]`) while `printArrayN` takes as a parameter an array of unknown length (`int array[]`) plus one additional parameter to specify this length (so that the function knows how many items of the array it should print). The function `printArray10` is important because it shows how a function can modify an array: when we call `fillArrayN(array10,10);` in the `main` function, the array `array10` will be actually modified after when the function finishes (it will be filled with numbers 0, 1, 2, ...). This can't be done with other data types (though there is a trick involving [pointers](pointer.md) which we will learn later). Now let's finally talk about **text [strings](string.md)**. We've already seen strings (such as `"hello"`), we know we can print them, but what are they really? A string is a data type, and from C's point of view strings are nothing but **arrays of `char`s** (text characters), i.e. sequences of `char`s in memory. **In C every string has to end with a 0 `char`** -- this is NOT `'0'` (whose [ASCII](ascii.md) value is 48) but the direct value 0 (remember that `char`s are really just numbers). The 0 `char` cannot be printed out, it is just a helper value to terminate strings. So to store a string `"hello"` in memory we need an array of length at least 6 -- one for each character plus one for the terminating 0. These types of string are called **zero terminated strings** (or *C strings*). When we write a string such as `"hello"` in our source, the C compiler creates an array in memory for us and fills it with characters `'h'`, `'e'`, `'l'`, `'l'`, `'o'`, 0. In memory this may look like a sequence of numbers 104, 101, 108, 108 111, 0. Why do we terminate strings with 0? Because functions that work with strings (such as `puts` or `printf`) don't know what length the string is. We can call `puts("abc");` or `puts("abcdefghijk");` -- the string passed to `puts` has different length in each case, and the function doesn't know this length. But thanks to these strings ending with 0, the function can compute the length, simply by counting characters from the beginning until it finds 0 (or more efficiently it simply prints characters until it finds 0). The [syntax](syntax.md) that allows us to create strings with double quotes (`"`) is just a helper (*syntactic sugar*); we can create strings just as any other array, and we can work with them the same. Let's see an example: ``` #include int main(void) { char alphabet[27]; // 26 places for letters + 1 for temrinating 0 for (int i = 0; i < 26; ++i) alphabet[i] = 'A' + i; alphabet[26] = 0; // terminate the string puts(alphabet); return 0; } ``` `alphabet` is an array of `char`s, i.e. a string. Its length is 27 because we need 26 places for letters and one extra space for the terminating 0. Here it's important to remind ourselves that we count from 0, so the alphabet can be indexed from 0 to 26, i.e. 26 is the last index we can use, doing `alphabet[27]` would be an error! Next we fill the array with letters (see how we can treat `char`s as numbers and do `'A' + i`). We iterate while `i < 26`, i.e. we will fill all the places in the array up to the index 25 (including) and leave the last place (with index 26) empty for the terminating 0. That we subsequently assign. And finally we print the string with `puts(alphabet)` -- here note that there are no double quotes around `alphabet` because its a variable name. Doing `puts("alphabet")` would cause the program to literally print out `alphabet`. Now the program outputs: ``` ABCDEFGHIJKLMNOPQRSTUVWXYZ ``` In C there is a standard library for working with strings called *string* (`#include `), it contains such function as `strlen` for computing string length or `strcmp` for comparing strings. One final example -- a creature generator -- will show all the three new data types in action: ``` #include #include // for rand() typedef struct { char name[4]; // 3 letter name + 1 place for 0 int weightKg; int legCount; } Creature; // some weird creature Creature creatures[100]; // global array of Creatures void printCreature(Creature c) { printf("Creature named %s ",c.name); // %s prints a string printf("(%d kg, ",c.weightKg); printf("%d legs)\n",c.legCount); } int main(void) { // generate random creatures: for (int i = 0; i < 100; ++i) { Creature c; c.name[0] = 'A' + (rand() % 26); c.name[1] = 'a' + (rand() % 26); c.name[2] = 'a' + (rand() % 26); c.name[3] = 0; // terminate the string c.weightKg = 1 + (rand() % 1000); c.legCount = 1 + (rand() % 10); // 1 to 10 legs creatures[i] = c; } // print the creatures: for (int i = 0; i < 100; ++i) printCreature(creatures[i]); return 0; } ``` When run you will see a list of 100 randomly generated creatures which may start e.g. as: ``` Creature named Nwl (916 kg, 4 legs) Creature named Bmq (650 kg, 2 legs) Creature named Cda (60 kg, 4 legs) Creature named Owk (173 kg, 7 legs) Creature named Hid (430 kg, 3 legs) ... ``` ## Macros/Preprocessor The C language comes with a feature called *preprocessor* which is necessary for some advanced things. It allows automatized modification of the source code before it is compiled. Remember how we said that compiler compiles C programs in several steps such as generating object files and linking? There is one more step we didn't mention: **[preprocessing](preprocessing.md)**. It is the very first step -- the source code you give to the compiler first goes to the preprocessor which modifies it according to special commands in the source code called **preprocessor directives**. The result of preprocessing is a pure C code without any more preprocessing directives, and this is handed over to the actual compilation. The preprocessor is like a **mini language on top of the C language**, it has its own commands and rules, but it's much more simple than C itself, for example it has no data types or loops. Each directive begins with `#`, is followed by the directive name and continues until the end of the line (`\` can be used to extend the directive to the next line). We have already encountered one preprocessor directive: the `#include` directive when we included library header files. This directive pastes a text of the file whose name it is handed to the place of the directive. Another directive is `#define` which creates so called [macro](macro.md) -- in its basic form a macro is nothing else than an alias, a nickname for some text. This is used to create constants. Consider the following code: ``` #include #define ARRAY_SIZE 10 int array[ARRAY_SIZE]; void fillArray(void) { for (int i = 0; i < ARRAY_SIZE; ++i) array[i] = i; } void printArray(void) { for (int i = 0; i < ARRAY_SIZE; ++i) printf("%d ",array[i]); } int main() { fillArray(); printArray(); return 0; } ``` `#define ARRAY_SIZE 10` creates a macro that can be seen as a constant named `ARRAY_SIZE` which stands for `10`. From this line on any occurence of `ARRAY_SIZE` that the preprocessor encounters in the code will be replaced with `10`. The reason for doing this is obvious -- we respect the [DRY](dry.md) (don't repeat yourself) principle, if we didn't use a constant for the array size and used the direct numeric value `10` in different parts of the code, it would be difficult to change them all later, especially in a very long code, there's a danger we'd miss some. With a constant it is enough to change one line in the code (e.g. `#define ARRAY_SIZE 10` to `#define ARRAY_SIZE 20`). The macro substitution is literally a copy-paste text replacement, there is nothing very complex going on. This means you can create a nickname for almost anything (for example you could do `#define when if` and then also use `when` in place of `if` -- but it's probably not a very good idea). By convention macro names are to be `ALL_UPPER_CASE` (so that whenever you see an all upper case word in the source code, you know it's a macro). Macros can optionally take parameters similarly to functions. There are no data types, just parameter names. The usage is demonstrated by the following code: ``` #include #define MEAN3(a,b,c) (((a) + (b) + (c)) / 3) int main() { int n = MEAN3(10,20,25); printf("%d\n",n); return 0; } ``` `MEAN3` computes the mean of 3 values. Again, it's just text replacement, so the line `int n = MEAN3(10,20,25);` becomes `int n = (((10) + (20) + (25)) / 3);` before code compilation. Why are there so many brackets in the macro? It's always good to put brackets over a macro and all its parameters because the parameters are again a simple text replacement; consider e.g. a macro `#define HALF(x) x / 2` -- if it was invoked as `HALF(5 + 1)`, the substitution would result in the final text `5 + 1 / 2`, which gives 5 (instead of the intended value 3). You may be asking why would we use a macro when we can use a function for computing the mean? Firstly macros don't just have to work with numbers, they can be used to generate parts of the source code in ways that functions can't. Secondly using a macro may sometimes be simpler, it's shorter and will be faster to execute because the is no function call (which has a slight overhead) and because the macro expansion may lead to the compiler precomputing expressions at compile time. But beware: macros are usually worse than functions and should only be used in very justified cases. For example macros don't know about data types and cannot check them, and they also result in a bigger compiled executable (function code is in the executable only once whereas the macro is expanded in each place where it is used and so the code it generates multiplies). Another very useful directive is `#if` for conditional inclusion or exclusion of parts of the source code. It is similar to the C `if` command. The following example shows its use: ``` #include #define RUDE 0 void printNumber(int x) { puts( #if RUDE "You idiot, the number is:" #else "The number is:" #endif ); printf("%d\n",x); } int main() { printNumber(3); printNumber(100); #if RUDE puts("Bye bitch."); #endif return 0; } ``` When run, we get the output: ``` The number is: 3 The number is: 100 ``` And if we change `#define RUDE 0` to `#define RUDE 1`, we get: ``` You idiot, the number is: 3 You idiot, the number is: 100 Bye bitch. ``` We see the `#if` directive has to have a corresponding `#endif` directive that terminates it, and there can be an optional `#else` directive for an *else* branch. The condition after `#if` can use similar operators as those in C itself (`+`, `==`, `&&`, `||` etc.). There also exists an `#ifdef` directive which is used the same and checks if a macro of given name has been defined. `#if` directives are very useful for conditional compilation, they allow for creation of various "settings" and parameters that can fine-tune a program -- you may turn specific features on and off with this directive. It is also helpful for [portability](portability.md); compilers may automatically define specific macros depending on the platform (e.g. `_WIN64`, `__APPLE__`, ...) based on which you can trigger different code. E.g.: ``` #ifdef _WIN64 puts("Your OS sucks."); #endif ``` ## Pointers Pointers are an advanced topic that many people fear -- many complain they're hard to learn, others complain about memory unsafety and potential dangers of using pointers. These people are stupid, pointers are great. But beware, there may be too much new information in the first read. Don't get scared, give it some time. Pointers allow us to do certain advanced things such as allocate dynamic memory, return multiple values from functions, inspect content of memory or use functions in similar ways in which we use variables. A **[pointer](pointer.md)** is nothing complicated: it is a **data type that can hold a memory address** (plus the information of what data type should be stored at that address). An address is simply a number. Why can't we simply use an `int` for an address? Because the size of `int` and a pointer may differ, the size of pointer depends on each platform's address width. It is also good when the compiler knows a certain variable is supposed to point to a memory (and to which type) -- this can prevent bugs. It's important to remember that a pointer is not a pure address but it also knows about the data type it is pointing to, so there are many kinds of pointers: a pointer to `int`, a pointer to `char`, a pointer to a specific struct type etc. A variable of pointer type is created similarly to a normal variable, we just add `*` after the data type, for example `int *x` is a variable named `x` that is a pointer to `int`. But how do we assign a value to the pointer? To do this, we need an address of something, e.g. of some variable. To get an address of a variable we use the `&` character, i.e. `&a` is the address of a variable `a`. The last basic thing we need to know is how to **[dereference](dereference.md)** a pointer. Dereferencing means accessing the value at the address that's stored in the pointer, i.e. working with the pointed to value. This is again done (maybe a bit confusingly) with `*` character in front of a pointer, e.g. if `x` is a pointer to `int`, `*x` is the `int` value to which the pointer is pointing. An example can perhaps make it clearer. ``` #include int main(void) { int normalVariable = 10; int *pointer; pointer = &normalVariable; printf("address in pointer: %p\n",pointer); printf("value at this address: %d\n",*pointer); *pointer = *pointer + 10; printf("normalVariable: %d\n",normalVariable); return 0; } ``` This may print e.g.: ``` address in pointer: 0x7fff226fe2ec value at this address: 10 normalVariable: 20 ``` `int *pointer;` creates a pointer to `int` with name `pointer`. Next we make the pointer point to the variable `normalVariable`, i.e. we get the address of the variable with `&normalVariable` and assign it normally to `pointer`. Next we print firstly the address in the pointer (accessed with `pointer`) and the value at this address, for which we use dereference as `*pointer`. At the next line we see that we can also use dereference for writing to the pointed address, i.e. doing `*pointer = *pointer + 10;` is the same as doing `normalVariable = normalVariable + 10;`. The last line shows that the value in `normalVariable` has indeed changed. IMPORTANT NOTE: **You cannot write to random addresses**! This will crash your program. To be able to write to a certain address it must be *[allocated](allocation.md)*, i.e. reserved for use. Addresses of variables are allocated by the compiler and can be written to. There's a special value called `NULL` (a macro defined in the standard library) that is meant to be assigned to pointer that point to "nothing". So when we have a pointer `p` that's currently not supposed to point to anything, we do `p = NULL;`. In a safe code we should always check (with `if`) whether a pointer is not `NULL` before dereferencing it, and if it is, then NOT dereference it. This isn't required but is considered a "good practice" in safe code, storing `NULL` in pointers that point nowhere prevents dereferencing random or unallocated addresses which would crash the program. But what can pointers be good for? Many things, for example we can kind of "store variables in variables", i.e. a pointer is a variable which says which variable we are now using, and we can switch between variable any time. E.g.: ``` #include int backAccountMonica = 1000; int backAccountBob = -550; int backAccountJose = 700; int *payingAccount; // pointer to who's currently paying void payBills(void) { *payingAccount -= 200; } void buyFood(void) { *payingAccount -= 50; } void buyGas(void) { *payingAccount -= 20; } int main(void) { // let Jose pay first payingAccount = &backAccountJose; payBills(); buyFood(); buyGas(); // that's enough, now let Monica pay payingAccount = &backAccountMonica; buyFood(); buyGas(); buyFood(); buyFood(); // now it's Bob's turn payBills(); buyFood(); buyFood(); buyGas(); printf("Monika has $%d left.\n",backAccountMonica); printf("Jose has $%d left.\n",backAccountJose); printf("backAccountBob has $%d left.\n",backAccountBob); return 0; } ``` Well, this could be similarly achieved with arrays, but pointers have more uses. For example they allow us to **return multiple values by a function**. Again, remember that we said that (with the exception of arrays) a function cannot modify a variable passed to it because it always makes its own local copy of it? We can bypass this by, instead of giving the function the value of the variable, giving it the address of the variable. The function can read the value of that variable (with dereference) but it can also CHANGE the value, it simply writes a new value to that address (again, using dereference). This example shows it: ``` #include #include #define PI 3.141592 // returns 2D coordinates of a point on a unit circle void getUnitCirclePoint(float angle, float *x, float *y) { *x = sin(angle); *y = cos(angle); } int main(void) { for (int i = 0; i < 8; ++i) { float pointX, pointY; getUnitCirclePoint(i * 0.125 * 2 * PI,&pointX,&pointY); printf("%lf %lf\n",pointX,pointY); } return 0; } ``` Function `getUnitCirclePoint` doesn't return any value in the strict sense, but thank to pointers it effectively returns two `float` values via its parameters `x` and `y`. These parameters are of the data type pointer to `int` (as there's `*` in front of them). When we call the function with `getUnitCirclePoint(i * 0.125 * 2 * PI,&pointX,&pointY);`, we hand over the addresses of the variables `pointX` and `pointY` (which belong to the `main` function and couldn't normally be accessed in `getUnitCirclePoint`). The function can then compute values and write them to these addresses (with dereference, `*x` and `*y`), changing the values in `pointX` and `pointY`, effectively returning two values. Now let's take a look at pointers to structs. Everything basically works the same here, but there's one thing to know about, a [syntactic sugar](sugar.md) known as an arrow (`->`). Example: ``` #include typedef struct { int a; int b; } SomeStruct; SomeStruct s; SomeStruct *sPointer; int main(void) { sPointer = &s; (*sPointer).a = 10; // without arrow sPointer->b = 20; // same as (*sPointer).b = 20 printf("%d\n",s.a); printf("%d\n",s.b); return 0; } ``` Here we are trying to write values to a struct through pointers. Without using the arrow we can simply dereference the pointer with `*`, put brackets around and access the member of the struct normally. This shows the line `(*sPointer).a = 10;`. Using an arrow achieves the same thing but is perhaps a bit more readable, as seen in the line `sPointer->b = 20;`. The arrow is simply a special shorthand and doesn't need any brackets. Now let's talk about arrays -- these are a bit special. The important thing is that **an array is itself basically a pointer**. What does this mean? If we create an array, let's say `int myArray[10];`, then `myArray` is basically a pointer to `int` in which the address of the first array item is stored. When we index the array, e.g. like `myArray[3] = 1;`, behind the scenes there is basically a dereference because the index 3 means: 3 places after the address pointed to by `myArray`. So when we index an array, the compiler takes the address stored in `myArray` (the address of the array start) and adds 3 to it (well, kind of) by which it gets the address of the item we want to access, and then dereferences this address. Arrays and pointer are kind of a duality -- we can also use array indexing with pointers. For example if we have a pointer declared as `int *x;`, we can access the value `x` points to with a dereference (`*x`), but ALSO with indexing like this: `x[0]`. Accessing index 0 simply means: take the value stored in the variable and add 0 to it, then dereference it. So it achieves the same thing. We can also use higher indices (e.g. `x[10]`), BUT ONLY if `x` actually points to a memory that has at least 11 allocated places. This leads to a concept called **[pointer arithmetic](pointer_arithmetic.md)**. Pointer arithmetic simply means we can add or substract numbers to pointer values. If we continue with the same pointer as above (`int *x;`), we can actually add numbers to it like `*(x + 1) = 10;`. What does this mean?! It means exactly the same thing as `x[1]`. Adding a number to a pointer shifts that pointer given number of *places* forward. We use the word *places* because each data type takes a different space in memory, for example `char` takes one byte of memory while `int` takes usually 4 (but not always), so shifting a pointer by *N* places means adding *N* times the size of the pointed to data type to the address stored in the pointer. This may be a lot information to digest. Let's provide an example to show all this in practice: TODO ## More on Functions (Recursion, Function Pointers) ## Dynamic Allocation (Malloc) ## Debugging, Optimization ## Advanced Stuff ## Under The Hood