Welcome Guest!
Create Account | Login
Locator+ Code:

Search:
FTPOnline Channels Conferences Resources Hot Topics Partner Sites Magazines About FTP RSS 2.0 Feed

Free Trial Issue of Visual Studio Magazine


Leverage Regular Expressions (Continued)

This pattern defines four groups: the keyword used to declare the variable, the variable's name, the optional New keyword, and the variable's type. You can use these groups to extract additional information from each Match object:

Dim m As Match
For Each m In re.Matches(source)
   Console.WriteLine( _
      "variable {0} of type {1}", _
      m.Groups(2).Value, _
      m.Groups(4).Value)
Next

In this example, the third group would contain the "New " string if this keyword appears in the match; otherwise, it would contain an empty string. You can make your code more readable by assigning a name to a group because you can reference the group later by its name instead of its position. This technique avoids problems arising from nested pairs of parentheses, because nested-parentheses pairs alter group numbering.

Named or numbered groups are especially useful when you're reading comma- or tab-delimited text files such as those you can export from Microsoft Excel and many database apps. For example, consider a text file containing employee names and salaries:

"John", "Doe", 55000
"Mary Ann", "Smith" , 58125.50

This format poses a few challenges that might not be apparent immediately. For example, each element can be preceded or followed by optional spaces or tab characters; also, both the first and the last name can contain multiple words, so you can't search for them with the \w+ sequence (which doesn't match the space). Similarly, the salary information might or might not contain a decimal portion, so you can't match it against the \d+ sequence. Here's an admittedly nontrivial regular expression based on named groups that fits the bill:

^\s*"(?<first>[^"]+)"\s*,
\s*"(?<last>[^"]+)"\s*,
\s*(?<salary>\d+(\.\d\d)?)\s*$

You can match the first and the last name correctly by looking for any character sequence that doesn't contain the double quote, and you can match the salary portion by adding a nested pair of parentheses that make the decimal portion optional. The ^ and $ symbols ensure that the pattern will be found at the beginning of each line in the source file; the several \s* constructs serve to ignore any additional space and tab characters surrounding the commas (see Listing 1).

Back to top

Printer-Friendly Version













Java Pro | Visual Studio Magazine | Windows Server System Magazine
.NET Magazine | Enterprise Architect | XML & Web Services Magazine
VSLive! | Thunder Lizard Events | Discussions | Newsletters | FTP Home