You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.7 KiB

Regular Expression

Regular expression (shortened regex or regexp) is a kind of mathematical expression, very often used in programming, that can be used to define simple patterns in strings of characters (usually text). Regular expressions are typically used for searching patterns (i.e. not just exact matches but rather sequences of characters which follow some rules, e.g. numeric values), substitutions (replacement) of such patterns, describing syntax of computer languages, their parsing etc. (though they may also be used in more wild ways, e.g. for generating strings). Regular expression is itself a string of symbols which however describes potentially many (even infinitely many) other strings thanks to containing special symbols that may stand for repetition, alternative etc. For example a.*.b is a regular expression describing a string that starts with letter a, which is followed by a sequence of at least one character and then ends with b (so e.g. aab, abbbb, acaccb etc.).

Regular expressions are computationally weak, they are equivalent to the weakest models of computations such as finite state machines -- in fact regular expressions are often implemented as finite state machines. This means that regular expressions can NOT describe any possible pattern, only relatively simple ones; however it turns out that very many commonly encountered patterns are simple enough to be described this way, so we have a good enough tool. The advantage of regular expressions is exactly that they are simple, yet very often sufficient.

TODO