This commit is contained in:
Miloslav Ciz 2024-08-25 01:56:24 +02:00
parent 4bff69ec4a
commit 8e2f22bfc7
20 changed files with 1913 additions and 1808 deletions

View file

@ -15,13 +15,19 @@ Bootstrapping has to start with some initial prerequisite machine dependent bina
[Forth](forth.md) is a language that has traditionally been used for making bootstrapping environments -- its paradigm and philosophy is ideal for bootstrapping as it's based on the concept of building a computing environment practically from nothing just by defining new and new words using previously defined simpler words, fitting the definition of bootstrapping perfectly. [Dusk OS](duskos.md) is a project demonstrating this. Similarly simple language such as [Lisp](lisp.md) and [comun](comun.md) can work too (GNU Mes uses a combination of [Scheme](scheme.md) and C).
**How to do this then?** To make a computing environment that can bootstrap itself you can do it like this:
**How to do this then?** To make a computing environment that can bootstrap itself this approach is often used:
1. **Make a [simple](kiss.md) [programming language](programming_language.md) L**. You can choose e.g. the mentioned [Forth](forth.md) but you can even make your own, just remember to keep it extremely simple -- simplicity of the base language is the key feature here. If you also need a more complex language, write it in L. The language L will serve as tool for writing software for your platform, i.e. it will provide some comfort in programming (so that you don't have to write in assembly) but mainly it will be an **[abstraction](abstraction.md) layer** for the programs, it will allow them to run on any hardware/platform. The language therefore has to be **[portable](portability.md)**; it should probably abstracts things like [endianness](byte_sex.md), native integer size, control structures etc., so as to work nicely on all [CPUs](cpu.md), but it also mustn't have too much abstraction (such as [OOP](oop.md)) otherwise it will quickly get complicated. The language can compile e.g. to some kind of very simple [bytecode](bytecode.md) that will be easy to translate to any [assembly](assembly.md). Make the bytecode very simple (and document it well) as its complexity will later on determine the complexity of the bootstrap binary seed. At first you'll have to temporarily implement L in some already existing language, e.g. [C](c.md). NOTE: in theory you could just make bytecode, without making L, and just write your software in that bytecode, but the bytecode has to focus on being simple to translate, i.e. it will probably have few opcodes for example, which will be in conflict with making it at least somewhat comfortable to program on your platform. However one can try to make some compromise and it will save the complexity of translating language to bytecode, so it can be considered ([uxn](uxn.md) seems to be doing this).
2. **Write L in itself, i.e. [self host](self_hosting.md) it**. This means you'll use L to write a [compiler](compiler.md) of L that outputs L's bytecode. Once you do this, you have a completely independent language and can start using it instead of the original compiler of L written in another language. Now compile L with itself -- you'll get the bytecode of L compiler. At this point you can bootstrap L on any platform as long as you can execute the L bytecode on it -- this is why it was crucial to make L and its bytecode very simple. In theory it's enough to just interpret the bytecode but it's better to translate it to the platform's native machine code so that you get maximum efficiency (the nature of bytecode should make it so that it isn't really more diffiult to translate it than to interpret it). If for example you want to bootstrap on an [x86](x86.md) CPU, you'll have to write a program (L compiler [backend](backend.md)) that translates the bytecode to x86 assembly; if we suppose that at the time of bootstrapping you will only have this x86 computer, you will have to write the translator in x86 assembly manually. If your bytecode really is simple and well made, it shouldn't be hard though (you will mostly be replacing your bytecode opcodes with given platform's machine code opcodes). Once you have the x86 backend, you can completely bootstrap L's compiler on any x86 computer.
3. **Further help make L bootstrapable**. This means making it even easier to execute the L bytecode on any given platform -- you may for example write backends (the bytecode translators) for common platforms like x86, ARM, RISC-V, C, Lisp and so on. You can also provide tests that will help check newly written backends for correctness. At this point you have L bootstrappable without any [work](work.md) on the platforms for which you provide backends and on others it will just take a tiny bit of work to write its own translator.
4. **Write everything else in L**. This means writing the platform itself and software such as various tools and libraries. You can potentially even use L to write a higher level language (e.g. C) for yet more comfort in programming. Since everything here is written in L and L can be bootstrapped, everything here can be bootstrapped as well.
However, a possibly even better way may be the [Forth](forth.md)-style **incremental programming** way, which works like this (see also [Macrofucker](macrofucker.md) and [portability](portability.md) for explanation of some of the concepts):
1. **Start with a trivially simple language.** It must be one that's easy to implement from scratch on any computer without any extra tools -- something maybe just a little bit more sophisticated than [Brainfuck](brainfuck.md). This language may even be a machine specific [assembly](assembly.md), let's say [x86](x86.md), that's using just a small subset of the simplest instructions, as long as it's easy to replace these instructions with other instructions on another hardware architecture. There should basically only be as many commands to ensure [Turing Completeness](turing_complete.md) and good performance (i.e. while an increment instruction may be enough for Turing completeness, we should probably also include instruction performing general addition, because adding two numbers in a loop using just the increment instruction would be painfully slow). The goal here is of course to build the foundations for the rest of our platform -- one that's simple enough to be easily replaced.
2. **Build a more complex language on top of it.** I.e. now use this simple language ALONE to build a more complex, practically usable language. Again, take inspiration in Forth -- you may for example introduce something like procedures, [macros](macro.md) or words to your simple language, which will allow you to keep adding new useful things such as arrays or more complex control structures. To add the system of macros for example just write a [preprocessor](preprocessor.md) in the base language that will take the new, macro-enabled language source code and convert it to the plain base language; with macros on your disposal now you can start expanding the language more and more just by writing new macros. I.e. expanding the base language should be done in small steps, incrementally -- that is don't build C out of Brainfuck right away; instead first build just a tiny bit more complex language on top of the initial language, then a bit more complex one on top of that etc. -- in Forth this happens by defining new words and expanding the language's dictionary.
3. **Now build everything else with the complex language.** This is already straightforward (though time consuming). First you may even build more language extensions and development tools like a debugger of [text editor](text_editor.md) for example. The beauty of this approach is really that to allow yourself to program on the system you are building the system itself on-the-go, i.e. you are creating a development environment and also a user environment for yourself, AND everything you make is bootstrappable from the original simple language. This is a very elegant, natural way -- you are setting up a complex system, building a road which is subsequently easy to walk again from the start, i.e. bootstrap. This is probably how it should ideally be done.
## Booting: Computer Starting Up
Booting as in "staring computer up" is also a kind of setting up a system from the ground up -- we take it for granted but remember it takes some [work](work.md) to get a computer from being powered off and having all RAM empty to having an operating system loaded, hardware checked and initialized, devices mounted etc.