less_retarded_wiki/unix_philosophy.md
2024-03-23 00:26:32 +01:00

12 KiB

Unix Philosophy

Unix philosophy is one of the most important and essential approaches to programming (and by extension all technology design) which advocates great minimalism and is best known by the saying that a program should only do one thing and do it well. Unix philosophy is a collective wisdom, a set of design recommendations evolved during the development of one of the earliest (and most historically important) operating systems called Unix, hence the name. Having been defined by hackers (the true, old style ones) the philosophy naturally advises for providing a set of many highly effective tools that can be combined in various ways, i.e. to perform hacking, rather than being restricted by a fixed, intended functionality of huge do-it-all programs. Unix philosophy advocates simplicity, clarity, modularity, reusability and composition of larger programs out of very small programs rather than designing huge monolithic programs as a whole. Unix philosophy, at least partially, lives on in many project and Unix-like operating systems such as Linux (though Linux is more and more distancing from Unix), has been wholly adopted by groups such as suckless and LRS (us), and is even being reiterated in such projects as plan9.

NOTE: see also everything is a file, another famous design principle of Unix -- this one is rather seen as a Unix-specific design choice rather than part of the general Unix philosophy itself, but it helps paint the whole picture.

As written in the GNU coreutils introduction, a Swiss army knife (universal tool that does many things at once) can be useful, but it's not a good tool for experts at work, they note that a professional carpenter will rather use a set of relatively simple, highly specialized tools, each of which is extremely efficient at its job. Unix philosophy brings this observation over to the world of expert programmers.

In 1978 Douglas McIlroy has written a short overview of the Unix system (UNIX Time-Sharing System) in which he gives the main points of the system's style; this can be seen as a summary of the Unix philosophy (the following is paraphrased):

  1. Each program should do one thing and do it well. Overcomplicating existing programs isn't good; for new functionality create a new program.
  2. Output of a program should be easy to interpret by another program. In Unix programs are chained by so called pipes in which one program sends its output as an input to another, so a programmer should bear this in mind. Interactive programs should be avoided if possible. Make your program a filter if possible, as that exactly helps this case.
  3. Program so that you can test early, don't be afraid to throw away code and rewrite it from scratch.
  4. Write and use tools, even if they're short-lived, they're better than manual work. Unix-like systems are known for their high scriptability.

This has later been condensed into: do one thing well, write programs to work together, make programs communicate via text streams, a universal interface.

Details about to what extent/extreme this minimalism ("doing only one thing") should be taken are of course a hot topic of countless debates and opinions, the original Unix hackers are often highly strict, famous example of which is the "cat -v considered harmful" presentation bashing a relatively simple function added to the cat program that should only ever concatenate files. Some tolerate adding a few convenience functions to trivial programs, especially nowadays.

Simple example: likely the most common practical example that can be given is piping small command line utility programs; inside a Unix system there live a number of small programs that do only one thing but do it well, for example the cat program that only concatenates and outputs the content of selected files, the grep program that searches for patterns in text etc. In command line we may use so called pipes to chain some of these simple programs into more complex processing pipelines by redirecting one program's output stream to another one's input. Let's say we want to for example automatically list all first and second level headings on given webpage and write them out alphabetically sorted. We can do it with a command such as this one:

wget -q -O - "http://www.tastyfish.cz/lrs/main.html" | grep -i -o "<h[12][^>]*>[^<]*<" | sed "s/[^>]*> *\([^ ][^<]*[^ ]\) *<.*/\1/g" | sort

Which may output for example:

Are You A Noob?
Did You Know
less_retarded_wiki
Topics
Wanna Help?
Welcome To The Less Retarded Wiki
What Is Less Retarded Software/Society/Wiki?

In the command the pipes (|) chain multiple programs together so that the output of one becomes the input of the next. The first command, wget, downloads the HTML content of the webpage and passes it to the second command, grep, which filters the text and only prints lines with headings (using so called regular expressions), this is passed to sed that removes the HTML code and the result is passed to sort that sorts the lines alphabetically -- as this is the last command, the result is then printed out, but we could also e.g. add > output.txt at the end to save the result into a text file instead. We also use flags to modify the behavior of the programs, for example -i tells grep to work in case-insensitive mode, -q tells wget to be silent and not print things such as download progress. This whole wiki is basically made on top of a few scripts like this (compare e.g. to MediaWiki software), so you literally see the manifestation of these presented concepts as you're reading this. This kind of "workflow" is a fast, powerful and very flexible way of processing data for anyone who knows the Unix tools. Notice the relative simplicity of each command and how each one works as a text filter; text is a universal communication interface and behaving as a filter makes intercommunication easy and efficient, utilizing the principle of a pipeline. A filter simply takes an input stream of data and outputs another stream of data; it ideally works on-the-go (without having to load whole input in order to produce the output), which has numerous advantages, for example requiring only a small amount of memory (which may become significant when we are running many programs at once in the pipeline, imagine e.g. a server with 10000 users, each one running his own commands like this) and decreasing latency (the next pipe stage may start processing the data before the previous stage finishes). When you're writing a program, such as for example a compression tool, make it work like this.

Compare this to the opposing Windows philosophy in which combining programs into collaborating units is not intended, is possibly even purposefully prevented and therefore very difficult, slow and impractical to do -- such programs are designed for manually performing some predefined actions, mostly using GUI, e.g. painting pictures with a mouse, but aren't designed to collaborate with each other or be automatized, they can rarely be used in unintended, inventive ways needed for powerful hacking. Returning to the example of a compression tool, on Windows such a program would be a large GUI program that requires a user to open up a file dialog, manually select a file to compress, which then might even do nasty things like loading the whole file into memory (because anyone who can afford Windows can also afford a lot of RAM), perform compression there, and then writing the data back to some other file. Need to use the program on a computer without graphical display? Automatize it to work with other programs? Run it from a script? Run it 10000 at the same time with 10000 other similar programs? Bad luck, Windows philosophy doesn't allow this.

Watch out! Do not misunderstand Unix philosophy. There are many extremely dangerous cases of misunderstanding Unix philosophy by modern wannabe programmers who can't tell pseudominimalism apart from true minimalism. One example is the hilarious myth about "React following Unix philosophy" (LMAO this), the devs just show so many misunderstandings here -- firstly of course JavaScript itself is extremely bloated as it's a language aiming for things like comfort, rapid development, "safety" and beginner friendliness to which it sacrifices performance and elegance, an expert hacker trying to write highly thought through, optimized program is not its target group, therefore nothing based on JavaScript can ever be compatible with the Unix way in the first place. Secondly they seem to imply that basically any system of modules follows Unix philosophy -- that's of course wrong, modularity far predates Unix philosophy, Unix philosophy is more than that, merely having a package system of libraries, each of which focuses on some thing (even very broad one like highly complex GUI), doesn't mean those tools are simple (both internally and externally), efficient, communicating in good ways and so on.

Does Unix philosophy imply universality is always bad? Well, most likely no, not in general at least -- it simply tells us that for an expert to create art that reaches the peak of his potential it seems best in most cases if he lives in an environment with many small, highly efficient tools that he can tinker with, which allow him to combine them, even (and especially) in unforeseen ways -- to do hacking. Universal tools, however, are great as well, either as a supplement or for other use cases (non-experts, quick dirty jobs and so on) -- after all a general purpose programming language such as C, another creation of Unix creators themselves, is a universal tool that prefers generality over effectiveness at one specific task (for example you can use C to process text but you likely won't match the efficiency of sed, etc.). Nevertheless let us realize an important thing: a universal tool can still be implemented in minimalist way, therefore never confuse a universal tool with a bloated monolith encumbered by feature creep!

{ One possible practical interpretation of Unix philosophy I came up with is this: there's an upper but also lower limit on complexity. "Do one thing" means the program shouldn't be too complex, we can simplify this to e.g. "Your program shouldn't surpass 10 KLOC". "Do it well" means the programs shouldn't bee too trivial because then it is hardly doing it well, we could e.g. say "Your program shouldn't be shorter than 10 LOC". E.g. we shouldn't literally make a separate program for printing each ASCII symbol, such programs would be too simple and not doing a thing well. We rather make a cat program, that's neither too complex nor too trivial, which can really print any ASCII symbol. By this point of view Unix philosophy is really about balance of triviality and huge complexity, but hints that the right balance tends to be much closer to the triviality than we humans are tempted to intuitively choose. Without guidance we tend to make programs too complex and so the philosophy exists to remind us to force ourselves to rather minimize our programs to strike the correct balance. ~drummyfish }

See Also