FastSharp - Rapid C# Scripting
Nick Clarke | February 17, 2008Today I had to update a regular expression that I have not touched in two years!
On first look I got the old hhmm where do I start
(?<Protocol>\w+):\/\/(?<Subdomain>\w+)\.(?<Domain>\w+)\.(?<tlDomain>[\w.]+)/(?<File>.*)
This matches:
http://subdomain.url.com/Default.aspx
And breaks it into:
Protocol: http
Subdomain: subdomain
Domain: url
tlDomain: com
File: Default.aspx
But the problem starts when you have a - (dash) in the subdomain:
http://a-subdomain.url.com/Default.aspx
This of course fails as I use \w to break up the subdomain string, which just matches alphanumeric characters. All I need to do is to allow - as well as a-zA-Z0-9 (\w).
The final expression was:
(?<Protocol>\w+):\/\/(?<Subdomain>[\w-]+)\.(?<Domain>\w+)\.(?<tlDomain>[\w.]+)/(?<File>.*)
- Change marked in red
Simple change but testing this takes some time as I either have to run my complete application or write a small test program.
Last week Matt Manela on the msdn blog shared a great application that allows you to test C# code without having to even write a class or create a project.
FastSharp is a great tool for testing out some code. It even goes as far as checking for compilation errors.

This was caused by me not adding the correct library for the Regex class.
To fix this all I had to do was click settings and then add the using statement.
using System.Text.RegularExpressions;

My little code snippet then ran fine and I was able to test and adapt my change very fast.
Great tool be sure to check it out + for more in depth into why and how it was coded see Matt’s post.






The \w use is factually incorrect. As per the RFC
Sebastien Lambla | February 20, 2008The \w use is factually incorrect. As per the RFC on URIs, domain names
(?[A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9]))?\.)*(?[A-Za-z]([A-Za-z-]*[A-Za-z])?)
Your current regex would happily parse completely invalid urls formed of unicode characters.
That said, this does not include any support for IRI and IDNs where more stop points are to be recognized…
What’s wrong with Uri.TryParse?
Hi Sebastien, You're right that the regex does not cover all
Nick Clarke | February 20, 2008Hi Sebastien,
You’re right that the regex does not cover all of the valid URLs characters/patterns nor would it allow unicode characters etc. In my situation it works as I have control over the URL + its only used in a site with limited exposure so this simple approach works.
If someone wanted to break up their URL this expression would not be the way to go, at least in its current state.
I looked at the .Net docs and I could not find Uri.TryParse, there was a Uri.Parse but this is marked as obsolete. Is this in a special library or am I overlooking it?
Cheers for your feedback