Welcome Guest!
Create Account | Login
Locator+ Code:

Search:
FTPOnline Channels Conferences Resources Hot Topics Partner Sites Magazines About FTP RSS 2.0 Feed

Free Trial Issue of Visual Studio Magazine


Validate With Regular Expressions (Continued)

Match Optional Delimiters
You must modify this expression now so that the user can insert optional spacing delimiters between the groups of four digits. The delimiters can be hyphens (-), spaces, or nothing. To match a single character from a set, you place the set of characters inside square brackets ([]); [ab] matches either a or b. You use \s to match any whitespace character. The expression [-\s] matches the possible delimiters.

But wait—there's a bit more. You want the user to place zero delimiters or one delimiter. You could use {0,1}: [-\s]{0,1}, but matching zero or one copy of a substring is so common that there's a simpler way to specify this—with the question mark (?). You use [\s]? to match zero or one delimiter.

Your version of the regular expression (which splits here because of line-width constraints) now looks like this:

^[\\d]{4}[-\\s]?[\\d]{4}[-\\s]?
[\\d]{4}[-\\s]?[\\d]{4}$

ADVERTISEMENT

You're almost there, but the preceding regular expression has one small bug: The user could use different delimiter characters between different four-digit groups. For example, 1234 5678-90123456 would be valid input. You need to make sure the user places the same delimiter between each of the groups. You use two features of regular expressions—grouping and backreferences—to do this. A grouping is a set of characters that the regular expression processor remembers from the input string. A backreference is a copy of the remembered text. First, modify your expression to remember which delimiter the user typed first:

^[\\d]{4}([-\\s]?)

A group is any expression in parentheses. Simply place a substring in parentheses to create a numbered group. The entire string is number 0, and each group is numbered from left to right, starting at 1. However, I prefer to avoid numbered expressions, because they can be difficult to understand later, especially if they involve multiple or nested groups. You can use named groups instead. You name a group by adding a question mark and a name in angle brackets after the opening parentheses:

^[\\d]{4}(?<grpdel>[-\\s]?)

The remembered delimiter is named grpdel now. You must match the remembered group for each delimiter in order to limit the user to using the same delimiter in each group. Use a backreference to match a remembered group:

^[\d]{4}(?<grpdel>[-\s]?)
[\d]{4}\k<grpdel>

The \k<grpdel> string matches the text remembered from the group named grpdel. (To reference a numbered group, use \1 and \2 for the numbered groups 1 and 2, respectively.) Here's the final regular expression (split once again to fit this column width):

^[\d]{4}(?<grpdel>[-\s]?)[\d]{4}
\k<grpdel>[\d]{4}\k<grpdel>[\d]{4}$

If you're confused at this point, walk through each step. ^ matches the beginning of the input. [\d]{4} matches exactly four digits. (?<grpdel>[-\s]?) matches a hyphen, or whitespace, repeated either zero or one time. The next set matches four digits again. \k<grpdel> matches whatever was captured for the group named <grpdel>. Each of these repeats, and, finally, $ matches the end of the input string.

Back to top

Printer-Friendly Version













Java Pro | Visual Studio Magazine | Windows Server System Magazine
.NET Magazine | Enterprise Architect | XML & Web Services Magazine
VSLive! | Thunder Lizard Events | Discussions | Newsletters | FTP Home