This commit is contained in:
Miloslav Ciz 2025-06-19 02:56:49 +02:00
parent f216e89f03
commit 0e12dc9efe
18 changed files with 2071 additions and 1974 deletions

View file

@ -1,12 +1,12 @@
# Formal Language
The field of formal languages tries to [mathematically](math.md) and rigorously view problems as languages; this includes probably most structures we can think of, from human languages and computer languages to visual patterns and other highly abstract structures. Formal languages are at the root of theoretical [computer science](compsci.md) and are important e.g. for the theory of [computability](computability.md)/decidability, computational complexity, [security](security.md) and [compilers](compiler.md), but they also find use in linguistics and other fields of [science](science.md).
The field of formal languages attempts to [mathematically](math.md) and rigorously view problems as languages; this includes probably most structures we can think of, from human and computer languages to visual patterns, sequences of moves in the game of [chess](chess.md) and other highly abstract sequences of symbols. Formal languages sit near the foundations of theoretical [computer science](compsci.md) and are important e.g. for the theory of [information](information.md), [computability](computability.md)/decidability, computational complexity, [security](security.md) and [compilers](compiler.md), but they also find use in linguistics and other fields of [science](science.md).
A **formal language** is defined as a (potentially infinite) set of strings (which are finite but unlimited in length) over some alphabet (which is finite). I.e. a language is a subset of E* where E is a finite alphabet (a set of *letters*). (* is a *Kleene Star* and signifies a set of all possible strings over E). The string belonging to a language may be referred to as a *word* or perhaps even *sentence*, but this word/sentence is actually a whole kind of *text* written in the language, if we think of it in terms of our natural languages. The [C](c.md) programming language can be seen as a formal language which is a set of all strings that are a valid C program that compiles without errors etc.
A **formal language** is defined as a (potentially infinite) [set](set.md) of strings (which are finite but unlimited in length) over some alphabet (which is finite). I.e. a language is a subset of E* where E is a finite alphabet (a set of *letters*). (* is a *Kleene Star* and signifies a set of all possible strings over E). The string belonging to a language may be referred to as a *word* or perhaps even *sentence*, but this word/sentence is actually a whole kind of *text* written in the language, if we think of it in terms of our natural languages. The [C](c.md) programming language can be seen as a formal language which is a set of all strings that are a valid C program that compiles without errors etc.
**For example**, given an alphabet [a,b,c], a possible formal language over it is [a,ab,bc,c]. Another, different possible language over this alphabet is an infinite language [b,ab,aab,aaab,aaaab,...] which we can also write with a [regular expression](regex.md) as a*b. We can also see e.g. English as being a formal language equivalent to a set of all texts over the English alphabet (along with symbols like space, dot, comma etc.) that we would consider to be in English as we speak it.
**What is this all good for?** This mathematical formalization allows us to classify languages and understand their structure, which is necessary e.g. for creating efficient compilers, but also to understand computers as such, their power and limits, as computers can be viewed as machines for processing formal languages. With these tools researches are able to come up with [proofs](proof.md) of different properties of languages, which we can exploit. For example, within formal languages, it has been proven that certain languages are uncomputable, i.e. there are some problems which a computer cannot ever solve (typical example is the [halting problem](halting_problem.md)) and so we don't have to waste time on trying to create such algorithms as we will never find any. The knowledge of formal languages can also guide us in designing computer languages: e.g. we know that regular languages are extremely simple to implement and so, if we can, we should prefer our languages to be regular.
**What is this all [good](good.md) for?** This mathematical formalization allows us to classify languages and understand their structure, which is necessary e.g. for creating efficient compilers, but also to understand computers as such, their power and limits, as computers can be viewed as machines for processing formal languages. With these tools researches are able to come up with [proofs](proof.md) of different properties of languages, which we can exploit. For example, within formal languages, it has been proven that certain languages are uncomputable, i.e. there are some problems which a computer cannot ever solve (typical example is the [halting problem](halting_problem.md)) and so we don't have to waste time on trying to create such algorithms as we will never find any. The knowledge of formal languages can also guide us in designing computer languages: e.g. we know that regular languages are extremely simple to implement and so, if we can, we should prefer our languages to be regular.
## Classification