regular expression


regular expression

[′reg·yə·lər ik′spresh·ən] (computer science) A formal description of a language acceptable by a finite automaton or for the behavior of a sequential switching circuit.

regular expression

(text, operating system)(regexp, RE) One of the wild card patterns used by Perl and other languages, followingUnix utilities such as grep, sed, and awk and editorssuch as vi and Emacs. Regular expressions use conventionssimilar to but more elaborate than those described underglob. A regular expression is a sequence of characters withthe following meanings:

An ordinary character (not one of the special charactersdiscussed below) matches that character.

A backslash (\\) followed by any special character matches thespecial character itself. The special characters are:

"." matches any character except NEWLINE; "RE*" (wherethe "*" is called the "Kleene star") matches zeroor more occurrences of RE. If there is any choice, thelongest leftmost matching string is chosen, in mostregexp flavours.

"^" at the beginning of an RE matches the start of a line and"$" at the end of an RE matches the end of a line.

[string] matches any one character in that string. If thefirst character of the string is a "^" it matches anycharacter except the remaining characters in the string (andalso usually excluding NEWLINE). "-" may be used to indicatea range of consecutive ASCII characters.

\\( RE \\) matches whatever RE matches and \, where n is adigit, matches whatever was matched by the RE between the nth\\( and its corresponding \\) earlier in the same RE. Manyflavours use ( RE ) used instead of \\( RE \\).

The concatenation of REs is a RE that matches theconcatenation of the strings matched by each RE. RE1 | RE2matches whatever RE1 or RE2 matches.

\\< matches the beginning of a word and \\> matches the end of aword. In many flavours of regexp, \\> and \\< are replaced by"\\b", the special character for "word boundary".

RE\\m\\ matches m occurences of RE. RE\\m,\\ matches m ormore occurences of RE. RE\\m,n\\ matches between m and noccurences.

The exact details of how regexp will work in a givenapplication vary greatly from flavour to flavour. Acomprehensive survey of regexp flavours is found in Friedl1997 (see below).

[Jeffrey E.F. Friedl, "Mastering Regular Expressions,O'Reilly, 1997].

regular expression

(2)Any description of a pattern composed from combinationsof symbols and the three operators:

Concatenation - pattern A concatenated with B matches a matchfor A followed by a match for B.

Or - pattern A-or-B matches either a match for A or a matchfor B.

Closure - zero or more matches for a pattern.

The earliest form of regular expressions (and the term itself)were invented by mathematician Stephen Cole Kleene in themid-1950s, as a notation to easily manipulate "regular sets",formal descriptions of the behaviour of finite state machines, in regular algebra.

[S.C. Kleene, "Representation of events in nerve nets andfinite automata", 1956, Automata Studies. Princeton].

[J.H. Conway, "Regular algebra and finite machines", 1971, EdsChapman & Hall].

[Sedgewick, "Algorithms in C", page 294].

regular expression

In programming, a set of symbols used to search for occurrences of text or to search and replace text. The simplest regular expressions are DOS/Windows wildcards; for example, *.html refers to all file names with HTML extensions. However, regular expression functions are available in many programming languages that allow for complex pattern matching and text manipulation. For example, replacing specific text within a sentence when the sentence begins with a certain word can be performed with a regular expression. See expression.