Leverage Regular Expressions (Continued)
Make Your Match
The code in Listing 1 also shows an interesting technique that's especially appropriate with large source strings. In such cases, the Matches method would return only when all matches have been found, an operation that might take several seconds. A better approach is to use the Regex.Match method to find the first match, then the Match.NextMatch method to search for all subsequent occurrences of the string, until the Match.Success property returns False to indicate there are no more matches. This approach is also effective when you don't need to iterate over all the occurrences—for example, when you want to retrieve a given employee's salary and can ignore all the employees after him or her.
Numbered or named groups are useful for referring to a previous match in the pattern itself and provide a means to tell the regular-expression engine to "match that substring again." Consider the problem of parsing a set of strings that can be enclosed in either single or double quotes. The problem: You want to search for a closing quote that's identical to the opening one. You can find such strings with this pattern:
("|')[^\1]+\1
The ("|') group matches the initial single or double quotes; the sequence [^\1]+ finds all the characters that differ from the initial quote; finally, the \1 reference matches the closing quote, be it single or double.
Regular expressions can replace text too. To do this, you use the Regex.Replace method, which replaces each occurrence of the search pattern passed in the first argument with the string passed in the second argument:
' Replace all Windows version names
' with "Windows XP"
Dim re As New Regex("Windows (95|98|NT|2000)")
' "text" is the source string
Console.WriteLine(re.Replace(text, "Windows XP"))
Replace patterns can include a reference to a numbered or named group defined in the search pattern. This is useful for arranging portions of the searched string in a different order. For example, this code searches for all names in the format "title firstname lastname" and converts them to the "title lastname, firstname" format:
txt = "Mr. Joe Doe and Mrs. Anne Smith"
Dim re As New Regex( _
"(Mr\.|Mrs\.)\s+(\w+)\s+(\w+)")
Console.WriteLine( _
re.Replace(txt, "$1 $3, $2"))
'=> Mr. Doe, Joe and Mrs. Smith, Ann
You can achieve the same result by using named groups:
Dim re As New Regex( _
"(?<title>Mr\.|Mrs\.)\s+" & _
"(?<first>\w+)\s+(?<last>\w+)")
Console.WriteLine(re.Replace(txt, _
"${title} ${last}, ${first}"))
Back to top
|