13 KiB
Bootstrap/Boot
In general bootstrapping (from the idiom "pull yourself up by your bootstraps"), sometimes shortened to just booting, refers to a clever process of self-establishing some relatively complex system starting from something very small, without much external help. As an example imagine something like a "civilization bootstrapping kit" that contains only few primitive tools along with instructions on how to use those tools to mine ore, turn it into metal out of which one makes more tools which will be used to obtain more material and so on up until having basically all modern technology and factories set up in relatively short time (civboot is a project like this). The term bootstrapping is however especially relevant in relation to computer technology -- here it has two main meanings:
- The process by which a computer starts and sets up the operating system after power on, which often involves several stages of loading various modules, running several bootloaders etc. This is traditionally called booting (rebooting means restarting the computer).
- Utilizing the principle of bootstrapping for making greatly independent software, i.e. software that doesn't depend on other software as it can set itself up. This is usually what bootstrapping (the longer term) means. This is also greatly related to self hosting, another principle whose idea is to "implement technology using itself".
Bootstrapping: Making Dependency-Free Software
TODO
Why be concerned with bootstrapping when we already have our systems set up? There are many reasons, one of the notable ones is that we may lose our current technology due to societal collapse, which is not improbable, it keeps happening throughout history over and over, so many people fear (rightfully so) that if by whatever disaster we lose our current computers, Internet etc., we will also lose with it all modern art, data, software we so painfully developed, digitized books, inventions and so on; not talking about the horrors that will follow if we're unable to quickly reestablish our computer networks we are so dependent on. Setting up what we currently have completely from scratch would be extremely difficult, a task for centuries -- just take a while to consider all the activity and knowledge that's required around the globe to create a single computer with all its billions of lines of code worth of software that makes it work. Knowledge of old technology gets lost -- to make modern computers we first needed older, primitive computers, but now that we only have modern computers no one remembers anymore how to make the older computers -- if we lose the current ones, we won't be able to make them, we will lack the tools. Another reason for bootstrapping is independence of technology which brings e.g. freedom (your operating system being able to be set up anywhere without some corporation's proprietary driver or hardware unit is pursued by many), robustness, simplicity, ability to bring existing software to new platforms and so on, i.e. things that are practical even in current world.
Forth has traditionally been used for making bootstrapping environments; Dusk OS is an example of such project. Similarly simple language such as Lisp and comun will probably work too.
How to do this then? To make a computing environment that can bootstrap itself you can do it like this:
- Make a simple programming language L. You can choose e.g. the mentioned Forth but you can even make your own, just remember to keep it extremely simple -- simplicity of the base language is the key feature here. The language will serve as tool for writing software for your platform, i.e. it will provide some comfort in programming (so that you don't have to write in assembly) but mainly it will be an abstraction layer for the programs, it will allow them to run on any hardware/platform. The language therefore has to be portable; it should probably abstracts things like endianness, native integer size, control structures etc., so as to work nicely on all CPUs, but it also mustn't have too much abstraction (such as OOP) otherwise it will quickly get complicated. The language can compile e.g. to some kind of very simple bytecode that will be easy to translate to any assembly. At first you'll have to temporarily implement L in some already existing language, e.g. C. NOTE: in theory you could just make bytecode, without making L, and just write your software in that bytecode, but the bytecode has to focus on being simple to translate, i.e. it will e.g. likely have few opcodes, which will be in conflict with making it at least somewhat comfortable to program on your platform. However one can try to make some compromise and it will save the complexity of translating language to bytecode, so it can be considered (uxn seems to be doing this).
- Write L in itself, i.e. self host it. This means you'll use L to write a compiler of L that outputs L's bytecode. Once you do this, you have a completely independent language and can throw away the original compiler of L written in another language. Now compile L with itself -- you'll get the bytecode of L compiler. At this point you can bootstrap L on any platform as long as you can execute the L bytecode on it -- this is why it was crucial to make L and its bytecode very simple. In theory it's enough to just interpret the bytecode but it's better to translate it to the platform's native machine code so that you get maximum efficiency (the nature of bytecode should make it so that it isn't really more diffiult to translate it than to interpret it). If for example you want to bootstrap on an x86 CPU, you'll have to write a program that translates the bytecode to x86 assembly; if we suppose that at the time of bootstrapping you will only have this x86 computer, you will have to write the translator in x86 assembly manually. If your bytecode really is simple and well made, it shouldn't be hard though (you will mostly be replacing your bytecode opcodes with given platform's machine code opcodes).
- Further help make L bootstrapable. This means making it even easier to execute the L bytecode on any given platform -- you may for example write the bytecode translators for common platforms like x86, ARM, RISC-V and so on. At this point you have L bootstrappable without any work on the platform you have translators for and on others it will just take a tiny bit of work to write its own translator.
- Write everything else in L. This means writing the platform itself and software such as various tools and libraries. You can potentially even use L to write a higher level language for yet more comfort in programming. Since everything here is written in L and L can be bootstrapped, everything here can be bootstrapped as well.
Booting: Computer Starting Up
Booting as in "staring computer up" is also a kind of setting up a system from the ground up -- we take it from granted but remember it takes some work to get a computer from being powered off and having all RAM empty to having an operating system loaded, hardware checked and initialized, devices mounted etc.
Starting up a simple computer -- such as some MCU-based embedded open console that runs bare metal programs -- isn't as complicated as booting up a mainstream PC with an operating system.
First let's take a look at the simple computer. It may work e.g. like this: upon start the CPU initializes its registers and simply starts executing instructions from some given memory address, let's suppose 0 (you will find this in your CPU's data sheet). Here the memory is often e.g. flash ROM to which we can externally upload a program from another computer before we turn the CPU on -- in game consoles this can often be done through USB. So we basically upload the program (e.g. a game) we want to run, turn the console on and it starts running it. However further steps are often added, for example there may really be some small, permanently flashed initial boot program at the initial execution address that will handle some things like initializing hardware (screen, speaker, ...), setting up interrupts and so on (which otherwise would have to always be done by the main program itself) and it can also offer some functionality, for example a simple menu through which the user can select to actually load a program from SD card to flash memory (thanks to which we won't need external computer to reload programs). In this case we won't be uploading our main program to the initial execution address but rather somewhere else -- the initial bootloader will jump to this address once it's done its work.
Now for the PC (the "IBM compatibles"): here things are more complicated due to the complexity of the whole platform, i.e. because we have to load an operating system first, of which there can be several, each of which may be loadable from different storages (harddisk, USB stick, network, ...), also we have more complex CPU that has to be set in certain operation mode, we have complex peripherals that need complex initializations etcetc. Generally there's a huge bloated boot sequence and PCs infamously take longer and longer to start up despite skyrocketing hardware improvements -- that says something about state of technology. Anyway, it usually it works like this:
{ I'm not terribly experienced with this, verify everything. ~drummyfish }
- Computer is turned on, the CPU starts executing at some initial address (same as with the simple computer).
- From here CPU jumps to an address at which stage one bootloader is located (bootloader is just a program that does the booting and as this is the first one in a line of potentially multiple bootloaders, it's called stage one). This address is in the motherboard ROM and in there typically BIOS (or something similar that may be called e.g. UEFI, depending on what standard it adheres to) is uploaded, i.e. BIOS is stage one bootloader. BIOS is the first software (we may also call it firmware) that gets run, it's uploaded on the motherboard by the manufacturer and isn't supposed to be rewritten by the user, though some based people still rewrite it (ignoring the "read only" label :D), often to replace it with something more free (e.g. libreboot). BIOS is the most basic software that serves to make us able to use the computer at the most basic level without having to flash programs externally, i.e. to let us use keyboard and monitor, let us install an operating system from a CD drive etc. (It also offers a basic environment for programs that want to run before the operating system, but that's not important now.) BIOS is generally different on each computer model, it normally allows us to set up what (which device) the computer will try to load next -- for example we may choose to boot from harddisk or USB flash drive or from a CD. There is often some countdown during which if we don't intervene, the BIOS automatically tries to load what's in its current settings. Let's suppose it is set to boot from harddisk.
- BIOS performs the power on self test (POST) -- basically it makes sure everything is OK, that hardware works etc. If it's so, it continues on (otherwise halts).
- BIOS loads the master boot record (MBR, the first sector of the device) from harddisk (or from another mass storage device, depending on its settings) into RAM and executes it, i.e. it passes control to it. This will typically lead to loading the second stage bootloader.
- The code loaded from MBR is limited by size as it has to fit in one HDD sector (which used to be only 512 bytes for a long time), so this code is here usually just to load the bigger code of the second stage bootloader from somewhere else and then again pass control to it.
- Now the second stage bootloader starts -- this is a bootloader whose job it is normally to finally load the actual operating system. Unlike BIOS this bootloader may quite easily be reinstalled by the user -- oftentime installing an operating system will also cause installing some kind of second stage bootloader -- example may be GRUB which is typically installed with GNU/Linux systems. This kind of bootloader may offer the user a choice of multiple operating systems, and possibly have other settings. In any case here the OS kernel code is loaded and run.
- Voila, the kernel now starts running and here it's free to do its own initializations and manage everything, i.e. Linux will start the PID 1 process, it will mount filesystems, run initial scripts etcetc.