Posts Tagged regular expression

Those Regular Expressions again…

I’ve been growing everyday day more and more fondly of the following principle: Code Talks! The main idea is that well-written code is self-documented and should not require a lot of comments in-line to be understood. 
 
That said, if code talks, when I see code like the one below, I think to myself: “Well, if code talks, this one swears…”:  J  

public bool IsValidPhoneNumber(string number)
{ 
   return Regex.IsMatch(number, @"^\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$");
} 
 
Once upon a time I’ve posted something regarding putting comments in regular expressions:
http://claudiolassala.spaces.live.com/Blog/cns!E2A4B22308B39CD2!117.entry

When I look back, that still seems a bit cryptic, though. Last week I was thinking: “how the heck can a developer do code review where there’s a regular expression involved?”. Given the sample code above, I’ve been thinking about splitting the RegEx into something that’s easier to read, and therefore, easier to review. The code would look something like this:  

/// <summary>
/// Checks whether a given number is a valid phone number (according to the common format).
/// </summary>
/// <param name="number">The phone number.</param>
/// <returns>True if the number is valid, or false if it is invalid.</returns>
/// <remarks>
/// Examples of valid phone numbers:
///    (123)456-7890
///    (123) 456-7890
///    123-456-7890
///    1234567890
/// </remarks>
public bool IsValidPhoneNumber(string number)
{ 
    return Regex.IsMatch(number, REGEX_VALID_PHONE_NUMBER);
}

Notice that I’ve replace the RegEx by a constant that is just easier to read. That constant is defined as follows:  

private const string REGEX_VALID_PHONE_NUMBER = 
  MATCHES_BEGINNING + 
  MATCHES_OPTIONAL_OPENING_PARENTHESIS + 
  MATCHES_EXACTLY_THREE_NUMERIC_DIGITS + 
  MATCHES_OPTIONAL_CLOSING_PARENTHESIS + 
  MATCHES_EITHER_SPACE_OR_HYPHEN + 
  MATCHES_EXACTLY_THREE_NUMERIC_DIGITS + 
  MATCHES_OPTIONAL_HYPHEN +              MATCHES_STRING_ENDS_WITH_FOUR_NUMERIC_DIGITS_REQUIREMENT;

 That’s a lot more verbose, but in this case, something more verbose than the cryptic RegEx. The other constants are defined like so: 
 
private const string MATCHES_BEGINNING = "^";
private const string MATCHES_OPTIONAL_OPENING_PARENTHESIS = @"\(?";
private const string MATCHES_EXACTLY_THREE_NUMERIC_DIGITS = @"\d{3}";
private const string MATCHES_OPTIONAL_CLOSING_PARENTHESIS = @"\)?";
private const string MATCHES_EITHER_SPACE_OR_HYPHEN = @"[\s\-]";
private const string MATCHES_OPTIONAL_HYPHEN = @"\-?";
private const string MATCHES_STRING_ENDS_WITH_FOUR_NUMERIC_DIGITS_REQUIREMENT = @"\d{4}$";  

This does seems a lot easier to review, but there’s one part that I’m not sure it would work: when we’re building and testing a RegEx, we normally use a tool such as Regulator or RegEx Buddy. I’m thinking I need some little tool where I can select the pieces of a RegEx and then create the declarations for the constants out of it, otherwise it’d be painful to do it for a long and complex expression.

I’m wondering what other developers are doing out there. Any thoughts?  

Even though some RegEx developers out there may think this is silly, most of the developers I’ve encountered aren’t that familiar even with the most simple expressions, so I don’t think I’m lone on the frustration of trying to understand those cartoon swear expressions.  🙂

3 Comments

Can you please put some comment on that Regular Expression?!

Regular expressions are one of those things on the software development world that give me nausea when I come across them. Reading something like this in code just makes me shiver:
 
^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\’\/\\\+&amp;%\$#_]*)?$
 
It seems to me like cartoon characters swearing: &$^%#&%@&$#**^&$. 
 
I just happened to run accross a little article (found at  that taught something I didn’t know about regular expression: one can put comments on them!
 
So, instead of having some cryptic code like this one:
Regex regex = new Regex(@"^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}\s$);
One could make the world a favor and rewrite that code like so:
Regex regex = new Regex(@"
                                            ^                 # anchor at the start
                                            (?=.*\d)      # must contain at least one numeric character
                                            (?=.*[a-z])  # must contain one lowercase character
                                            (?=.*[A-Z]) # must contain one uppercase character
                                            {8,10}         # From 8 to 10 characters in length
                                            \s                 # allows a space 
                                            $                  # anchor at the end", 
                                            RegexOptions.IgnorePatternWhitespace);
 
That way, even my little brain can understand a freakin’ regular expression.
 

Leave a comment