DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
Regular expressions

Editor regular expressions

Wildcard regular expressions are useful for selecting files, but they cannot search the text within files. For that, you need to use the editor regular expressions. These are as follows:


.
Matches any single character.

This is equivalent to the wildcard ``?''. For example, .iddle will match ``diddle'', ``middle'', or any other word beginning with some letter followed by the string ``iddle''.


*
Matches zero or more repeating instances of the regular expression immediately preceding it. (See also ``?'' below.)

For example, .*iddle* matches:

iddle
middle
twiddle

As a single character is taken to be a literal regular expression matching only itself, this means that a character followed by an asterisk matches zero or more instances of itself. Consequently, ``.*'' matches zero or more repeating instances of any character, and ``a*'' matches zero or more ``a''s in a row.

Note that this behavior is not the same as that of the asterisk wildcard character. The shell interprets the asterisk wildcard to mean ``zero or more characters''; in an editor regular expression, the asterisk matches zero or more instances of the preceding regular expression.


?
Matches zero or one occurrences of the regular expression immediately preceding it.

Note that, like the asterisk, this editor regular expression metacharacter does not have the same effect as its wildcard counterpart, which matches a single character, not an instance of a preceding regular expression.


+
Matches one or more (but not zero) occurrences of the regular expression immediately preceding it. (This feature is not available to all of the editor programs: see ``Regular expression summary''.)

There is a subtle difference between the interpretation of regular expressions containing a ``*'' and a ``+''. For example, suppose we have the word list:

fred
frog
figment
fuddled
ford

The expression ``fr+'' will match only ``fred'' and ``frog'', because it is constrained to match an ``f'' followed by at least one ``r''. However, ``fr*'' will match all of these words, because it matches an ``f'' followed by zero or more instances of the letter ``r''.


[ ... ]
Matches any one of the characters enclosed in the brackets. If the first character in the set is a circumflex (^), it matches any one character that is not in the set. A hyphen between two characters in the set indicates a range; for example, [a-d] matches the first four letters of the alphabet. You can only include a literal closing bracket (]) in a class if it is the first character after the opening bracket.

If you are not certain of the spelling of a word that you are searching for, this construction comes in handy. For example, rel[ae]v[ae]nt matches any of:

relavant
relavent
relevant
relevent


^
Matches the beginning of a line if specified at the beginning of a regular expression; otherwise, it matches itself. The following specification uses ^ as a metacharacter:

^This is a nightmare

In the next specification, the ^ is a literal:

The ^ character is octal ASCII 136


$
Matches the end of a line if specified at the end of a regular expression; otherwise, it matches itself.

In the following, the dollar is used to match a string occurring at the end of a line:

It's the end of the line, folks$

In the next example, $ is a literal:

He stole $50000


\{n,m\}
Matches a range of occurrences of the regular expression immediately preceding it. n and m are positive decimal integers between 0 and 256. For example, \{5\} matches exactly five occurrences of the preceding expression, \{5,\} matches five or more occurrences of the preceding expression, and \{5,10\} matches between five and ten occurrences.

Next topic: Escaping metacharacters
Previous topic: Wildcard characters

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003