Those Regular Expressions again…

I’ve been growing everyday day more and more fondly of the following principle: Code Talks! The main idea is that well-written code is self-documented and should not require a lot of comments in-line to be understood. 
 
That said, if code talks, when I see code like the one below, I think to myself: “Well, if code talks, this one swears…”:  J  

public bool IsValidPhoneNumber(string number)
{ 
   return Regex.IsMatch(number, @"^\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$");
} 
 
Once upon a time I’ve posted something regarding putting comments in regular expressions:
http://claudiolassala.spaces.live.com/Blog/cns!E2A4B22308B39CD2!117.entry

When I look back, that still seems a bit cryptic, though. Last week I was thinking: “how the heck can a developer do code review where there’s a regular expression involved?”. Given the sample code above, I’ve been thinking about splitting the RegEx into something that’s easier to read, and therefore, easier to review. The code would look something like this:  

/// <summary>
/// Checks whether a given number is a valid phone number (according to the common format).
/// </summary>
/// <param name="number">The phone number.</param>
/// <returns>True if the number is valid, or false if it is invalid.</returns>
/// <remarks>
/// Examples of valid phone numbers:
///    (123)456-7890
///    (123) 456-7890
///    123-456-7890
///    1234567890
/// </remarks>
public bool IsValidPhoneNumber(string number)
{ 
    return Regex.IsMatch(number, REGEX_VALID_PHONE_NUMBER);
}

Notice that I’ve replace the RegEx by a constant that is just easier to read. That constant is defined as follows:  

private const string REGEX_VALID_PHONE_NUMBER = 
  MATCHES_BEGINNING + 
  MATCHES_OPTIONAL_OPENING_PARENTHESIS + 
  MATCHES_EXACTLY_THREE_NUMERIC_DIGITS + 
  MATCHES_OPTIONAL_CLOSING_PARENTHESIS + 
  MATCHES_EITHER_SPACE_OR_HYPHEN + 
  MATCHES_EXACTLY_THREE_NUMERIC_DIGITS + 
  MATCHES_OPTIONAL_HYPHEN +              MATCHES_STRING_ENDS_WITH_FOUR_NUMERIC_DIGITS_REQUIREMENT;

 That’s a lot more verbose, but in this case, something more verbose than the cryptic RegEx. The other constants are defined like so: 
 
private const string MATCHES_BEGINNING = "^";
private const string MATCHES_OPTIONAL_OPENING_PARENTHESIS = @"\(?";
private const string MATCHES_EXACTLY_THREE_NUMERIC_DIGITS = @"\d{3}";
private const string MATCHES_OPTIONAL_CLOSING_PARENTHESIS = @"\)?";
private const string MATCHES_EITHER_SPACE_OR_HYPHEN = @"[\s\-]";
private const string MATCHES_OPTIONAL_HYPHEN = @"\-?";
private const string MATCHES_STRING_ENDS_WITH_FOUR_NUMERIC_DIGITS_REQUIREMENT = @"\d{4}$";  

This does seems a lot easier to review, but there’s one part that I’m not sure it would work: when we’re building and testing a RegEx, we normally use a tool such as Regulator or RegEx Buddy. I’m thinking I need some little tool where I can select the pieces of a RegEx and then create the declarations for the constants out of it, otherwise it’d be painful to do it for a long and complex expression.

I’m wondering what other developers are doing out there. Any thoughts?  

Even though some RegEx developers out there may think this is silly, most of the developers I’ve encountered aren’t that familiar even with the most simple expressions, so I don’t think I’m lone on the frustration of trying to understand those cartoon swear expressions.  🙂

  1. #1 by Simon on February 13, 2007 - 9:47 am

    I think the need for comments boils down to what you are comfortable with.  If you don\’t use regular expressions very often, even the simplest patterns seem incomprehensible.  However, after you\’ve used them for awhile it all starts to make sense.  Your idea for breaking the expression into its constituents make sense expecially when you are dealing with a complex expression.  I don\’t know that I would go the route of using constants but instead use inline comments ala your earlier post, but that\’s more a personal preference. 
     
    But perhaps you\’re thinking about creating a regex library where you could reuse the constants to build other expressions?  Hmm, maybe Milos should add a regex library for the common validations?  Then you could check your phone number like so:
     
      PhoneNumberRegex phoneRegex = new PhoneNumberRegex();
        if (phoneRegex.Match(blah).Success) …
     
    and we could keep the swearing to a minimum. 😉
     

  2. #2 by cyrus on March 7, 2007 - 3:59 pm

    Have you taken a look at the ASP.NET RegularExpressionValidator. I am just now getting into 2005 and am going through some intros when I remembered your post. The RegularExpressionValidator has a property named "ValidationExpression". When adjusting that properties though the properties window you can get a "Regular Expression Editor" that looks like it stores common Regular expression by a usable name (for example US Phone number). It also allows you to make your own. Don\’t know if it is usable in other areas of .Net, but it sounds like you want and Add In or something similar that allows placement of common regular expressions built in the "Regular Expression Editor".
    good luck

  3. #3 by Claudio on March 8, 2007 - 12:27 am

    Yup, that\’s definitely something cool to have. Like Simon suggested, we\’ll probably have a library of expressions in our framework for things we tend to use quite often. That sure helps a lot for developers using the expression but who shouldn\’t know about the nest details of it. For a developer reviewing the expression, though, that\’s where I\’m trying to find good ways to improve the readability on those monsters.  🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: