Leverage Regular Expressions (Continued)
Atomic zero-width assertions are special: They specify where the match must appear in the source string, yet they don't match any character; to put it differently, they don't consume any character in the source text. The most common construct of this type is \b, which marks a word boundary without matching the character that immediately precedes and follows the word. For example, this regular expression finds all five-character words in the source string, without matching the characters before and after each word:
\b\w\w\w\w\w\b
The ^ and $ assertions stand for the beginning and the end of the source string. You often use them with the Regex.IsMatch method to test whether a field's contents coincide with the expected format (instead of containing the search string). For example, suppose TextBox1 is expected to contain a three-digit number; you can validate the control's value with regular expressions like this:
Dim re As New Regex("^\d\d\d$")
If re.IsMatch(TextBox1.Text) Then
' value is ok
End If
The ^ and $ assertions have a slightly different meaning when you use the regular expression in multiline mode. In that case, they match the beginning and the end of individual lines. By default, the Regex object considers the source text as a continuous flux of characters, as is the case with a text document. You can use multiline mode when you parse a document line-by-line, such as a file containing one record per line, with each field delimited by tab or comma characters. You specify multiline mode by passing an enumerated value to the Regex constructor's second argument:
' find all 4-char words at the
' beginning of each line
Dim re As New Regex("^\w\w\w\w\b", _
RegexOptions.Multiline)
The search pattern can include quantifiers too. A quantifier specifies how many times the construct that precedes it can be repeated in the source string. For example, this regular expression matches all words with three, four, or five characters:
\b\w{3,5}\b
The * construct indicates zero or more occurrences; ? stands for zero or one occurrence, so you can use it for optional elements in the matched string; + stands for one or more occurrences. For example, the \w+ regular expression matches any word, so you count the words in a source string with only two statements:
Dim text As String = "Here is a sample sentence"
Dim re As New Regex("\w+")
Console.WriteLine( _
re.Matches(text).Count) ' => 5
Back to top
|