This commit is contained in:
Miloslav Ciz 2023-03-23 17:16:00 +01:00
parent f48a2a80fa
commit c4013d7128
4 changed files with 5 additions and 11 deletions

View file

@ -1,8 +1,8 @@
# Formal Language
The field of formal languages tries to [mathematically](math.md) and rigorously examine and describe anything that can be viewed as a language, which probably includes most structures we can think of, from human languages and computer languages to visual patterns and other highly abstract structures. Formal languages are at the root of theoretical [computer science](compsci.md) and are important e.g. for [computability](computability.md)/decidability, computational complexity, [security](security.md) and [compilers](compiler.md), but they also find use in linguistics and other fields of [science](science.md).
The field of formal languages tries to [mathematically](math.md) and rigorously view problems as languages; this includes probably most structures we can think of, from human languages and computer languages to visual patterns and other highly abstract structures. Formal languages are at the root of theoretical [computer science](compsci.md) and are important e.g. for the theory of [computability](computability.md)/decidability, computational complexity, [security](security.md) and [compilers](compiler.md), but they also find use in linguistics and other fields of [science](science.md).
A **formal language** is defined as a (potentially infinite) set of strings (which are finite but unlimited in length) over some alphabet (which is finite). I.e. a language is a subset of E* where E is a finite alphabet (a set of *letters*). (* is a *Kleene Star* and signifies a set of all possible strings over E). The string belonging to a language may be referred to as a *word* or perhaps even *sentence*, but this word/sentence is actually a whole kind of *text* written in the language, if we think of it in terms of our natural languages.
A **formal language** is defined as a (potentially infinite) set of strings (which are finite but unlimited in length) over some alphabet (which is finite). I.e. a language is a subset of E* where E is a finite alphabet (a set of *letters*). (* is a *Kleene Star* and signifies a set of all possible strings over E). The string belonging to a language may be referred to as a *word* or perhaps even *sentence*, but this word/sentence is actually a whole kind of *text* written in the language, if we think of it in terms of our natural languages. The [C](c.md) programming language can be seen as a formal language which is a set of all strings that are a valid C program that compiles without errors etc.
**For example**, given an alphabet [a,b,c], a possible formal language over it is [a,ab,bc,c]. Another, different possible language over this alphabet is an infinite language [b,ab,aab,aaab,aaaab,...] which we can also write with a [regular expression](regex.md) as a*b. We can also see e.g. English as being a formal language equivalent to a set of all texts over the English alphabet (along with symbols like space, dot, comma etc.) that we would consider to be in English as we speak it.
@ -13,7 +13,7 @@ A **formal language** is defined as a (potentially infinite) set of strings (whi
We usually classify formal languages according to the **[Chomsky](chomsky.md) hierarchy**, by their computational "difficulty". Each level of the hierarchy has associated models of computation ([grammars](grammar.md), [automatons](automaton.md), ...) that are able to compute **all** languages of that level (remember that a level of the hierarchy is a superset of the levels below it and so also includes all the "simpler" languages). The hierarchy is more or less as follows:
- **all languages**: This includes all possible languages, even those that computers cannot analyze (e.g. the language representing the [halting problem](halting_problem.md)). These languages can only be computed by theoretical computers that cannot physically exist in our universe.
- **type 0**, **recursively enumerable languages**: Most "difficult"/general languages that computers in our universe can analyze. These languages can be computed e.g. by a **[Turing machine](turing_machine.md)**, [lambda calculus](lambda_calculus.md) or a general unrestricted [grammar](grammar.md). Example language: all strings encoding a [Game of Life](game_of_life.md) run which ends in finite time. { At least I think :) ~drummyfish }
- **type 0**, **recursively enumerable languages**: Most "difficult"/general languages that computers in our universe can analyze. These languages can be computed e.g. by a **[Turing machine](turing_machine.md)**, [lambda calculus](lambda_calculus.md) or a general unrestricted [grammar](grammar.md). Example language: a^n where *n* is not a [prime](prime.md).
- **type 1**, **context sensitive languages**: Computed e.g. by a linearly bounded non-deterministic Turing machine or a context sensitive grammars. Example language: a^(n)b^(n)c^(n), n >= 0 (strings of *n* *a*s, followed by *n* *b*s, followed by *n* *c*s).
- **type 2**, **context free languages**: Computed by e.g. non-deterministic pushdown automata or context free grammars. (Deterministic pushdown automata compute a class of languages that is between type 2 and type 3).
- **type 3**, **regular languages**: The *easiest*, *weakest* kind of languages, computed e.g. by [finite state automata](finite_state_automaton.md)s or [regular expressions](regexp.md). This class includes also all finite languages.