less_retarded_wiki/unix.md
2025-03-04 21:04:02 +01:00

26 KiB

Unix

"Those who don't know Unix are doomed to reinvent it, poorly." --obligatory quote by Henry Spencer

Unix (plurar Unixes or Unices) is an old operating system developed since 1960s as a research project of Bell Labs, which has become one of the most influential pieces of software in history and whose principles (e.g. the Unix philosophy, everything is a file, ...) live on in many so called Unix-like operating systems such as Linux and BSD (at least to some degree). The original system itself is no longer in use (it was later followed by a new project, plan9, which itself is now pretty old), the name UNIX is nowadays a trademark and a certification. However, as someone once said, Unix is not so much an operating system as a way of thinking.

In one aspect Unix has reached the highest level a software can strive for: it has transcended its implementation and became a de facto standard. This means it has become a set of interface conventions, "paradigms", cultural and philosophical ideas rather than being a single system, it lives on as a concept that has many implementations. This is extremely important as we don't depend on any single Unix implementation but we have a great variety of choice between which we can switch without greater issues. This is very important for freedom -- it prevents monopolization -- and its one of the important reasons to use unix-like systems.

The main highlights of Unix are possibly these:

  • Unix philosophy: a kind of general mindset of software development, usually summed up as "do one things well" (rather than "do everything but poorly") and "make programs work in collaboration with other programs", advising on using universal text interfaces for communication etc. This often comes with the idea of pipes, a way of chaining programs (typically using the pipe | operator, hence the name) by sending one program's output to other program's input.
  • everything is a file: Unix chose to use the file abstraction to enable universal communication of programs with hardware and among themselves, i.e. on unices most things such as printing, reading keyboard, networking etc. will be likely implemented as reading or writing to/from some special (sometimes just virtual) file. This has the advantage of being able to just use some file reading library or syscall, not having to access physical memory bits in memory, which may be difficult, unsafe etc.
  • Text centrism (great command line preference), value on portability (even over performance), sharing of source code, freedom of information and openness, connection to hacker culture, valuing human time over machine time, ...
  • ...

Unix is greatly connected to software minimalism, however most unices are still not minimalist to absolute extreme and many unix forks (e.g. GNU/Linux) just abandon minimalism as a priority. So the question stands: is Unix LRS or is it too bloated? The answer to this will be similar to our stance towards the C language (which itself was developed alongside Unix); from our point of view Unix -- i.e. its concepts and some of their existing implementations -- is relatively good, there is a lot of wisdom to take away (e.g. "do one thing well", modularity, "use text interfaces", ...), however these are intermixed with things which under more strict minimalism we may want to abandon (e.g. multiple users, file permissions and ownership, also "everything is a file" requires we buy into the file abstraction and will often also imply existence of a file system etc., which may be unnecessary, even multitasking could be dropped), so in some ways we see Unix as a temporary "least evil" tool on our way to truly good, extremely minimalist technology. DuskOS is an example of operating system more close to the final idea of LRS. But for now Unix is very cool, some Unix-like systems are definitely a good choice nowadays.

There is a semi humorous group called the UNIX HATERS that has a mailing list and a whole book that criticizes Unix, arguing that the systems that came before it were much better -- though it's mostly just joking, they give some good points sometimes. It's like they are the biggest boomers for whom the Unix is what Windows is to the Unix people.

History

In the 1960s, Bell Labs along with other groups were developing Multics, a kind of operating system -- however the project failed and was abandoned for its complexity and expensiveness of development. In 1969 two Multics developers, Ken Thompson and Dennis Ritchie, then started to create their own system, this time with a different philosophy; that of simplicity (see Unix philosophy). They weren't alone in developing the system, a number of other hackers helped program such things as a file system, shell and simple utility programs. At VCF East 2019 Thompson said that they developed Unix as a working system in three weeks. At this point Unix was written in assembly.

In the early 1970s the system got funding as well as its name Unix (a pun on Multix). By now Thompson and Richie were developing a new language for Unix which would eventually become the C language. In version 4 (1973) Unix was rewritten in C.

Unix then started being sold commercially. This led to its fragmentation into different versions such as the BSD or Solaris. In 1983 a version called System V was released which would become one of the most successful. The fragmentation and a lack of a unified standard led to so called Unix Wars in the late 1980s, which led to a few Unix standards such as POSIX and Single Unix Specification.

For zoomers and other noobs: Unix wasn't like Windows, it was more like DOS, things were done in text interface only (even a TUI or just colorful text was a luxury) -- if you use the command line in "Linux" nowadays, you'll get an idea of what it was like, except it was all even more primitive. Things we take for granted such as a mouse, copy-pastes, interactive text editors, having multiple user accounts or running multiple programs at once were either non-existent or advanced features in the early days. There weren't even personal computers back then, people accessed share computers over terminals. Anything these guys did you have to see as done with stone tools -- they didn't have GPUs, gigaherts CPUs, gigabytes of RAM, scripting languages like Python or JavaScript, Google, stack overflow, wifi, mice, IDEs, multiple HD screens all around, none of that -- and yet they programmed faster, less buggy software that was much more efficient. If this doesn't make you think, then probably nothing will.

How To For Noobs

UNDER CONSTRUCTION

Note: here by "Unix" we will more or less assume a system conforming to some version of the POSIX standard.

This should help complete noobs kickstart their journey with a Unix-like system such as GNU/Linux or BSD. Please be aware that each system has its additional specifics, for example package managers, init systems, GUI and so on -- these you must learn about elsewhere as here we may only cover the core parts those systems inherited from the original Unix. Having learned this though you should be able to somewhat fly any Unix like system. Obviously we'll be making some simplifications here too, don't be too pedantic if you're a pro Unix guru please.

Also a NOTE: terms such as command line, terminal or shell have different meanings, but for simplicity we'll be treating them more or less as synonyms here.

Learning to use Unix in practical terms firstly means learning the command line and then a few extra things (various concepts, philosophies, conventions, file system structure etc.). Your system will have a way for you to enter the command line that allows you to interact with it only through textual commands (i.e. without GUI). Sometimes the system boots up to command line, other time you must click an icon somewhere (called terminal, term, shell, command line etc.), sometimes you can switch TTYs with CTRL+ALT+Fkeys etc. To command line virgins this will seem a little intimidating but it's absolutely necessary to know at least the basics, on Unices the command line is extremely powerful, efficient and much can only ever be achieved through the command line.

The gist: unsurprisingly in command line you write commands -- many of these are actually tiny programs called Unix utilities (or just "utils"). These are installed by default and, they're tools for you to do whatever you want (including stuff that on normie systems you usually do by clicking with a mouse). For example ls is a program that writes out list of files in the working directory, cd is a program that changes working directory etc. There are many more such programs and you must learn at least the most commonly used ones. Good news is that the programs are more or less the same on every Unix system so you just learn this once. There also exist other kinds of commands -- those defined by the shell language (shell is basically a fancy word for the textual interface), which allow us to combine the utilities together and even program the shell (we call this scripting). First learn the utils (see the list below).

PRO TIP: convenient features are often implemented, most useful ones include going through the history of previously typed commands with UP/DOWN keys and completing commands with the TAB key, which you'll find yourself using very frequently. Try it. It's enough to type just first few letters and then press tab, the command will be completed (at least as much as can be guessed).

You run a utility simply by writing its name, for example typing ls will show you a list of files in your current directory. Very important is the man command that shows you a manual page for another command, e.g. typing man ls should display a page explaining the ls utility in detail. Short help for a utility can also usually be obtained by writing -h after it, for example grep -h.

Unix utilities (and other programs) can also be invoked with arguments that specify more detail about what should be done. Arguments are written after the utility name and are separated by spaces (if the argument itself should contain space, it must be enclosed between double quotes, e.g.: "abc def" is a single arguments containing space, but abc def are two arguments). For example the cd (change directory) utility must be given the name of a directory to go to, e.g. cd mydirectory.

Some arguments start with one or two minus characters (-), for example -h or --help. These are usually called flags and serve either to turn something on/off or to name other parameters. For example many utilities accept a -s flag which means "silent" and tells the utility to shut up and not write anything out. A flag oftentimes has a short and long form (the long one starting with two minus characters), so -s and --silent are the same thing. The other type of flag says what kind of argument the following argument is going to be -- for example a common one is --output (or -o) with which we specify the name of the output file, so for instance running a C compiler may look like c99 mysourcecode.c --output myprogram (we tell the compiler to name the final program "myprogram"). Short flags can usually be combined like so: instead of -a -b -c we can write just -abc. Flags accepted by utilities along with their meaning are documented in the manual pages (see above).

To run a program that's present in the current directory as a file you can't just write its name (like you could e.g. in DOS), it MUST be prefixed it with ./ (shorthand for current directory), otherwise the shell thinks you're trying to run an INSTALLED program, i.e. it will be looking for the program in a directory where programs are installed. For example having a program named "myprogram" in current directory it will be run with ./myprogram. Also note that to be able to run a file as a program it must have the executable mode set, which is done with chmod +x myprogram (you may have to do this if you e.g. download the program from the Internet). Programs can also take arguments just like we saw with the built-in utilities, so you can run a program like ./myprogram abc def --myflag.

Now to the very basic stuff: browsing directories, moving and deleting files etc. This is done with the following utils: ls (prints files in current directory), pwd (prints path to current directory), cd (travels to given directory, cd .. travels back), cat (outputs content of given file), mkdir (creates directory), rm (removes given file; to remove a directory -rf flag must be present), cp (copies file), mv (moves file, including directory -- note that moving also serves for renaming). As an exercise try these out (careful with rm -rf) and read manual pages of the commands (you'll find that ls can also tell you for example the file sizes and so on).

Files and file system: On Unices the whole filesystem hierarchy starts with a directory called just / (the root directory), i.e. every absolute (full) path will always start with slash (don't confuse / with \). For example pictures belonging to the user john may live under /home/john/pictures. It's also possible to use relative paths, i.e. ones that are considered to start in the current (working) directory. A dot (.) stands for current directory and two dots (..) for the directory "above" the current one. I.e. if our current directory is /home/john, we can list the pictures with ls pictures as well as ls /home/john/pictures or ls ./pictures. Absolute and relative paths are distinguished by the fact the absolute one always starts with / while relative don't. There are several types of files, most importantly regular files (the "normal" files) and directories (there are more such symbolic links, sockets, block special files etc., but for now we'll be ignoring these). Unix has a paradigm stating that everything's a file, so notably accessing e.g. hardware devices is done by accessing special device files (placed in /dev). Just remember this concept, you'll hear about it often.

NOTE: On Unices files often don't have extensions as it's often relied on so called magic number (first few bytes in the file) to decide what kind of file we're dealing with. You will see files with extension (.sh, .txt, ...) but notably for example compiled programs typically don't have any (unlike for example on Windows).

Files additionally have attributes, importantly so called permissions -- unfortunately these are a bit complicated, but as a mere user working with your own files you won't have to deal too much with them, only remember if you encounter issues with accessing files, it's likely due to this. In short: each file has an owner and then also a set of permissions that say who's allowed to do what with the file. There are three kind of permissions: read (r), write (w) and execute (x), and ALL THREE are defined for the file's owner, for the file's group and for everyone else, plus there is a magical value suid/sgid/sticky we won't delve into. All of this is then usually written either as a 4 digit octal number (each digit expresses the three permission bits) or as a 12 character string (containing the r/w/x/- characters). Well, let's not dig much deeper now.

PRO TIP: there is a very useful feature called wildcard characters that help us handle many files at once. Most commonly used are * and ? wildcards -- if we use these in a program argument, the arguments will be expanded so that we get a list of files matching certain pattern. This sounds complicated but let's see an example. If we write let's say rm *.jpg, we are going to remove all files in current directory whose name ends with .jpg. This is because * is a wildcard character that matches any string and when we execute the command, the shell actually replaces our argument with all files that match our pattern, so the command may actually internally look like rm picture1.jpg picture2.jpg picture3.jpg. ? character is similar but matches exactly one character (whatever it is), so to list for example all files whose name is exactly three characters we can write ls ???.

Here is a quick cheatsheet of the most common Unix utilities:

name function possible arguments (just some)
alias create or display alias (nickname for another command) alias=command
awk text processing language (advanced)
bc interactive calculator
c99 C language compiler (advanced) file, -o (output file)
cd change directory directory name (.. means back)
chmod change file mode +x (execute), +w (write), +r (read), file
cmp compare files -s (silent), file1, file2
cp copy files -r (recursive, for dirs), file, newfile
date write date and/or time format
df report free space on disk -k (use KiB units)
du estimate size of file (useful for directories) -k (use KiB units), -s (only total), file
echo write out string (usually for scripts)
ed ed is the standard text editor
expr evaluate expression (simple calculator) expression (as separate arguments)
false return false value
grep search for pattern in file pattern, file, -i (case insensitive)
head show first N lines of a file -n (count), file
kill terminate process or send a signal to it processid, -9 (kill), -15 (terminate)
ls list directory (shows files in current dir.) -s (show file sizes in block)
man show manual page for topic topic
mkdir make directory name
mv move (rename) file -i (ask for rewrite), file, newfile
pwd print working directory
rm remove files -r (recursive, for dirs), -f (force)
sed stream editing util (replacing text etc.), see also regex script, file
sh shell (the command line interpreter, usually for scripting) -c (command string)
sort sort lines in file -r (reverse), -u (unique), file
tail show last N lines of a file -n (count), file
true return true value
uname output system name and info -a (all, output everything)
vi advanced text editor
wc word count (count characters or lines in file, can tell exact file size) -c (character), -l (lines), file

NOTES on the above table:

  • Typically there are two ways of feeding input data to a utility: either by specifying a file to read from or by feeding the input on to the utility's standard input. This also applies to the output. Using standard input/output is a more "Unix" way as it allows us to chain the utlities with pipes, make one program feed its output to another as input.
  • Utilities try to follow common conventions so that it's easier to guess and remember what flags mean etc., for example -h is commonly a flag for getting help, -o is one for specifying output file etc.
  • Specific Unix systems will normally have more feature rich utilities, supporting additional flags and even adding new utilities. Check out manual pages on your system. You'll probably have to learn about common utils that aren't part of POSIX, e.g. wget, history, ssh, curl, sudo, apt and more. And of course there are thousands and thousands of additional utilities/programs you can download, program or otherwise install on your system.

Now on to a key feature of Unix: pipelines and redirects. Processes (running programs) on Unix have so called standard input (stdin) and standard output (stdout) -- these are streams of data (often textual but also binary) that the process takes on input and output respectively. There may also exist more streams (notably e.g. standard error output) but again, we'll ignore this now. When you run a program (utility etc.) in the command line, standard input will normally come from your keyboard and standard output will be connected to the terminal (i.e. you'll see it being written out in the command line). However sometimes you may want the program to take input from a file and/or to write its output to a file (imagine e.g. keeping logs), or you may even want one program to feed its output as an input to another program! This is very powerful as you may combine the many small utilities into more powerful units. See also Unix philosophy.

Most commonly used redirections are done like this:

  • command > file: redirects output of command to file file (rewriting its content if there is any).
  • command < file: redirects input of command to come from file.
  • command >> file: output of command will be appended to file (i.e. added at its end).

Pipelines are similar: they are chains of several programs separated by a "pipe" character: |. This makes a program feed its output to the input of the next program. For example ls | grep \.html will run the ls command and pass its output (list of files in current directory) to grep, which will only filter out those that contain the ".html" string.

Several commands can also be written on a single line, they just have to be separated with ;.

Example of doing stuff in a Unix terminal (# character starts a comment -- these are here only to describe what's happening in the example):

> pwd
/home/drummyfish
> ls
Pictures    Documents    Downloads
git         Videos
> cd Downloads
> ls
free_software_song.midi  hentai_porn.mp4
lrs_wiki.txt
> rm hentai_porn.mp4     # oh noes, quickly delete this
> cd ../git; ls
Anarch      comun       Licar
raycastlib  small3dlib
> cd Anarch
> wc -l *.h *.c | tail -n 1    # count lines of code in .h and .c files
14711 total
> cat *.h *.c | grep "TODO"    # show all TODOs in code
      (SFG_game.backgroundScaleMap[(pixel->position.y          // ^ TODO: get rid of mod?
  RCL_Unit     direction;  // TODO: rename to "angle" to keep consistency
  TODO: maybe array functions should be replaced by defines of funtion names
  /* FIXME/TODO: The adjusted (=orthogonal, camera-space) distance could
  RCL_Unit limit1, // TODO: int16_t?
  RCL_Unit depth = 0; /* TODO: this is for clamping depth to 0 so that we don't
         increment == -1 ? i >= limit : i <= limit; /* TODO: is efficient? */\
  RCL_Unit limit1, // TODO: int16_t?
       increment == -1 ? i >= limit : i <= limit; // TODO: is efficient?
  // TODO: probably doesn't work
#if 1 // TODO: add other options for input handling (SDL, xinput, ...)
> echo "Remember to fix the TODOs in code!" >> TODO.txt  # add note to TODO file

See Also