less_retarded_wiki/c.md
2023-12-26 23:42:01 +01:00

192 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# C
{ We have a [C tutorial](c_tutorial.md)! ~drummyfish }
C is an [old](old.md) [low level](low_level.md) structured [statically typed](static_typing.md) [imperative](imperative.md) compiled [programming language](programming_language.md), the language that's currently mostly used by [less retarded software](lrs.md). Though by very strict standards it would still be considered [bloated](bloat.md), compared to any mainstream [modern](modern.md) language it is very bullshitless and [KISS](kiss.md), so it is also the go-to language of the [suckless](suckless.md) community as well as most true experts, for example the [Linux](linux.md) and [OpenBSD](openbsd.md) developers, because of its good, relatively simple design, uncontested performance, wide support, great number of compilers, level of control and a greatly established and tested status. C is perhaps the most important language in history, it influenced, to smaller or greater degree, basically all of the widely used languages today such as [C++](c.md), [Java](java.md), [JavaScript](javascript.md) etc., however it is not a thing of the past -- in the area of low level programming C is still the number one unsurpassed language. C is by no means perfect but it is currently probably the best choice of a programming language (along with [comun](comun.md), of course).
{ Look up *The Ten Commandments for C Programmers* by Henry Spencer. ~drummyfish }
It is usually not considered an easy language to learn because of its low level nature: it requires good understanding of how a [computer](computer.md) actually works and doesn't prevent the programmer from shooting himself in the foot. Programmer is given full control (and therefore responsibility). There are things considered "tricky" which one must be aware of, such as undefined behavior of certain operators and raw pointers. This is what can discourage a lot of modern "coding monkeys" from choosing C, but it's also what inevitably allows such great performance -- undefined behavior allows the compiler to choose the most efficient implementation. On the other hand, C as a language is pretty simple without [modern](modern.md) bullshit concepts such as [OOP](oop.md), it is not as much hard to learn but rather hard to master, as any other true art.
{ Though C is almost always compiled, there have appeared some C interpreters. ~drummyfish }
C is said to be a **"[portable](portability.md) [assembly](assembly.md)"** because of its low level nature, great performance etc. -- though C is structured (has control structures such as branches and loops) and can be used in a relatively high level manner, it is also possible to write assembly-like code that operates directly with bytes in memory through [pointers](pointer.md) without many safety mechanisms, so C is often used for writing things like hardware [drivers](driver.md). On the other hand some restrain from likening C to assembly because C compilers still perform many transformations of the code and what you write is not necessarily always what you get.
Mainstream consensus acknowledges that C is among the best languages for writing low level code and code that requires performance, such as [operating systems](operating_system.md), [drivers](driver.md) or [games](game.md). Even scientific libraries with normie-language interfaces -- e.g. various [machine learning](machine_learning.md) [Python](python.md) libraries -- usually have the performance critical core written in [C](c.md). Normies will tell you that for things outside this scope C is not a good language, with which we disagree -- [we](lrs.md) recommend using C for basically everything that's supposed to last, i.e. if you want to write a good website, you should write it in C etc.
## History and Context
C was developed in 1972 at [Bell Labs](bell_labs.md) alongside the [Unix](unix.md) operating system by [Dennis Ritchie](dennis_ritchie.md) and [Brian Kerninghan](brian_kerninghan.md), as a successor to the [B](b.md) language ([portable](portability.md) language with [recursion](recursion.md)) written by Denis Ritchie and [Ken Thompson](ken_thompson.md), which was in turn inspired by the the [ALGOL](algol.md) language (code blocks, lexical [scope](scope.md), ...).
In 1973 Unix was rewritten in C. In 1978 Keninghan and Ritchie published a book called *The C Programming Language*, known as *K&R*, which became something akin the C specification. In 1989, the [ANSI C](ansi_c.md) standard, also known as C89, was released by the American ANSI. The same standard was also adopted a year later by the international ISO, so C90 refers to the same language. In 1999 ISO issues a new standard that's known as C99.
TODO
## Standards
C is not a single language, there have been a few standards over the years since its inception in 1970s. The notable standards and versions are:
- **K&R C**: C as described by its inventors in the book *The C Programming Language*, before official standardization. This is kind of too ancient nowadays.
- **C89/C90 (ANSI/ISO C)**: First fully standardized version, usable even today, many hardcore C programmers stick to this version so as to enjoy maximum compiler support.
- **C95**: A minor update of the previous standard, adds wide character support.
- **C99**: Updated standard from the year 1999, striking a nice balance between "[modern](modern.md)" and "good old". This is a good version to use in [LRS](lrs.md) programs, but will be a little less supported than C89, even though still very well supported. Notable new features against C89 include `//` comments, [stdint](stdint.md) library (fixed-width integer types), [float](float.md) and `long long` type, variable length stack-allocated [arrays](array.md), variadic [macros](macro.md) and declaration of variables "anywhere" (not just at function start).
- **C11**: Updated standard from the year 2011. This one is too [bloated](bloat.md) and isn't worth using.
- **C17/C18**: Yet another update, yet more bloated and not worth using anymore.
- ...
Quite nice online reference to all the different standards (including C++) is available at https://en.cppreference.com/w/c/99.
[LRS](lrs.md) should use C99 or C89 as the newer versions are considered [bloat](bloat.md) and don't have such great support in compilers, making them less portable and therefore less free.
The standards of C99 and older are considered pretty [future-proof](future_proof.md) and using them will help your program be future-proof as well. This is to a high degree due to C having been established and tested better than any other language; it is one of the oldest languages and a majority of the most essential software is written in C, C compiler is one of the very first things a new hardware platform needs to implement, so C compilers will always be around, at least for historical reasons. C has also been very well designed in a relatively minimal fashion, before the advent of modern feature-creep and and bullshit such as [OOP](oop.md) which cripples almost all "modern" languages.
## Compilers
- [gcc](gcc.md): The main "big name" that can compile all kinds of languages including C, used by default in many places, very [bloated](bloat.md) and can take long to compile big programs, but is pretty good at [optimizing](optimization.md) the code and generating fast code. Also has number of frontends and can compile for many platforms. Uses GENERIC/GIMPLE [intermediate representation](intermediate_representation.md).
- [clang](clang.md): Another big bloated compiler, kind of competes with gcc, is similarly good at optimization etc. Uses [LLVM](llvm.md) intermediate representation.
- [tcc](tcc.md): Tiny C compiler, [suckless](suckless.md), orders of magnitude smaller (currently around 25 KLOC) and simpler than gcc and clang, cannot optimize nearly as well as the big compilers so the generated executables can be a bit slower and bigger, however besides its internal simplicity there are many advantages, mainly e.g. fast compilation (claims to be 9 times faster than gcc) and small tcc executable (about 100 kB). Seems to only support x86.
- [scc](scc.md): Another small/suckless C compiler, currently about 30 KLOC.
- [DuskCC](duskcc.md): [Dusk OS](duskos.md) C compiler written in [Forth](forth.md), focused on extreme simplicity, probably won't adhere to standards completely.
- [8c](8c.md), [8cc](8cc.md)
- ...
## Standard Library
Besides the pure C language the C standard specifies a set of [libraries](library.md) that have to come with a standard-compliant C implementation -- so called standard library. This includes e.g. the *stdio* library for performing standard [input/output](io.md) (reading/writing to/from screen/files) or the *math* library for mathematical functions. It is usually relatively okay to use these libraries as they are required by the standard to exist so the [dependency](dependency.md) they create is not as dangerous, however many C implementations aren't completely compliant with the standard and may come without the standard library. Also many stdlib implementations suck or you just can't be sure what the implementation will prefer (size? speed?) etc. So for sake of [portability](portability.md) it is best if you can avoid using standard library.
The standard library (libc) is a subject of live debate because while its interface and behavior are given by the C standard, its implementation is a matter of each compiler; since the standard library is so commonly used, we should take great care in assuring it's extremely well written, however we ALWAYS have to choose our priorities and make tradeoffs, there just mathematically CANNOT be an ultimate implementation that will be all extremely fast and extremely memory efficient and extremely portable and extremely small. So choosing your C environment usually comprises of choosing the C compiler and the stdlib implementation. As you probably guessed, the popular implementations ([glibc](glibc.md) et al) are [bloat](bloat.md) and also often just [shit](shit.md). Better alternatives thankfully exist, such as:
- [musl](musl.md)
- [uclibc](uclibc.md)
- [not using](dependency.md) the standard library :)
- ...
## Bad Things About C
Nothing is [perfect](perfect.md), not even C; it was one of the first relatively higher level languages and even though it has showed to have been designed extremely well, some things didn't age great, or were simply bad from the start. We still prefer this language as usually the best choice, but it's good to be aware of its downsides or smaller issues, if only for the sake of one day designing a better language. Keep in mind all here are just suggestions, they made of course be a subject to counter arguments and further discussion. So, let's go:
- **C specification (the ISO standard) is [proprietary](proprietary.md)** :( The language itself probably can't be copyrighted, nevertheless this may change in the future, and a proprietary specs lowers C's accessibility and moddability (you can't make derivative versions of the spec).
- **The specification is also long as fuck**, indicating [bloat](bloat.md)/complexity/obscurity. A good, free language should have a simple definition. It could be simplified a lot by simplifying the language itself as well as dropping some truly legacy considerations (like [BCD](bcd.md) systems?) and removing a lot of undefined behavior.
- **Some behavior is weird and has exceptions**, for example a function can return anything, including a `struct`, except for an array. This makes it awkward to e.g. implement vectors which would best be made as arrays but you want functions to return them, so you may do hacks like wrapping them instide a struct just for this.
- **Some things could be made simpler**, e.g. using [reverse polish](reverse_polish.md) notation for expressions, rather than expressions with brackets and operator precedence, would make implementations much simpler, increasing sucklessness (of course readability is an argument).
- **Some things could be dropped entirely** ([enums](enum.md), [bitfields](bitfield.md), possibly also unions etc.), they can be done and imitated in other ways without much hassle.
- **The preprocessor isn't exactly elegant**, it has completely different syntax and rules from the main language, not very suckless -- ideally preprocessor uses the same language as the base language.
- **The syntax is sucky sometimes**, e.g. data type names may consist of multiple tokens (`long long int` etc.), multiplication uses the same symbol as pointer dereference (`*`), also it's pretty weird that the condition after `if` has to be in brackets etc., it could be designed better. Keywords also might be better being single chars, like `?` instead of `if` etc. A shorter, natural-language-neutral source code would be probably better. Both line and block comments could be implemented with a single character (e.g. `#` for line comment, ending with a newline or another `#`, `##` for block comment ending with another `##`?).
- **Some undefined/unspecified behavior would maybe be better defined/specified** -- undefined behavior isn't bad in general, it is what allows C to be so fast and efficient in the first place, but some of it has shown to be rather cumbersome; for example the unspecified representation of integers, their binary size and behavior of floats leads to a lot of trouble (unknown upper bounds, sizes, undefined behavior of many operators etc.) while practically all computers have settled on using 8 bit bytes, [two's complement](twos_complement.md) and IEEE754 for [floats](float.md) -- this could easily be made a mandatory assumption which would simplify great many things without doing basically any harm. New versions of C actually already settle on two's complement. This doesn't mean C should be shaped to reflect the degenerate "[modern](modern.md)" trends in programming though!
- Some basic things that are part of libraries or extensions, like fixed width types and binary literals and possibly very basic I/O (putchar/readchar), could be part of the language itself rather than provided by libraries.
- All that stuff with *.c* and *.h* files is unnecessary, there should just be one file type -- this isn't part of the language per se, but it's part of its culture.
- TODO: moar
## Basics
This is a quick overview, for a more in depth tutorial see [C tutorial](c_tutorial.md).
A simple program in C that writes "welcome to C" looks like this:
```
#include <stdio.h> // standard I/O library
int main(void)
{
// this is the main program
puts("welcome to C");
return 0; // end with success
}
```
You can simply paste this code into a file which you name e.g. `program.c`, then you can compile the program from command line like this:
`gcc -o program program.c`
Then if you run the program from command line (`./program` on Unix like systems) you should see the message.
## Cheatsheet
It's pretty important you learn C, so here's a little cheat sheet for you.
**data types** (just some):
- **int**: signed integer, at least 16 bits (-32767 to 32767) but usually more
- **unsigned int**: unsigned integer, at least 16 bit (0 to 65535) but usually more
- **char**: smallest integer of at least 8 bits (1 byte, 256 values), besides others used for containing [ASCII](ascii.md) characters
- **unsigned char**: like char but unsigned (0 to 255)
- **float**: [floating point](float.md) number (usually 32 bit)
- **double**: like float but higher precision (usually 64 bit)
- **short**: smaller signed integer, at least 16 bits (32767 to 32767)
- **long**: bigger signed integer, at least 32 bits (-2147483647 to 2147483647)
- **pointer**: memory address (size depends on platform), always tied to a specific type, e.g. a pointer to integer: `*int`, pointer to double: `*double` etc.
- **array**: a sequence of values of some type, e.g. an array of 10 integers: `int[10]`
- **struct**: structure of values of different types, e.g. `struct myStruct { int myInt; chat myChar; }`
- note: header *stdint.h* contains fixed-width data types such as *uint32_t* etc.
- note: there is no **string**, a string is an array of *char*s which must end with a value 0 (string terminator)
- note: there is no real **bool** (actually it is in header *stdbool*), integers are used instead (0 = false, 1 = true)
**branching aka if-then-else**:
```
if (CONDITION)
{
// do something here
}
else // optional
{
// do something else here
}
```
**for loop** (repeat given number of times):
```
for (int i = 0; i < MAX; ++i)
{
// do something here, you can use i
}
```
**while loop** (repeat while CONDITION holds):
```
while (CONDITION)
{
// do something here
}
```
**do while loop** (same as *while* but CONDITION at the end):
```
do
{
// do something here
} while (CONDITION);
```
**function definition**:
```
RETURN_TYPE myFunction (TYPE1 param1, TYPE2 param2, ...)
{ // return type can be void
// do something here
}
```
## See Also
- [B](b.md)
- [D](d.md)
- [comun](comun.md)
- [C tutorial](c_tutorial.md)
- [C pitfalls](c_pitfalls.md)
- [C programming style](programming_style.md)
- [C++](cpp.md)
- [IOCCC](ioccc.md)
- [HolyC](holyc.md)
- [QuakeC](quakec.md)
- [Pascal](pascal.md)
- [Fortran](fortran.md)
- [LISP](lisp.md)
- [FORTH](forth.md)
- [memory management](memory_management.md) in C