Nicholas Clarke

  • rss
  • Home
  • Articles
  • Downloads
  • Contact

FastSharp - Rapid C# Scripting

Nick Clarke | February 17, 2008

Today I had to update a regular expression that I have not touched in two years!

On first look I got the old hhmm where do I start :)

(?<Protocol>\w+):\/\/(?<Subdomain>\w+)\.(?<Domain>\w+)\.(?<tlDomain>[\w.]+)/(?<File>.*)

This matches:

http://subdomain.url.com/Default.aspx

And breaks it into:

Protocol: http
Subdomain: subdomain
Domain: url
tlDomain: com
File: Default.aspx

But the problem starts when you have a - (dash) in the subdomain:

http://a-subdomain.url.com/Default.aspx

This of course fails as I use \w to break up the subdomain string, which just matches alphanumeric characters. All I need to do is to allow - as well as a-zA-Z0-9 (\w).

The final expression was:

(?<Protocol>\w+):\/\/(?<Subdomain>[\w-]+)\.(?<Domain>\w+)\.(?<tlDomain>[\w.]+)/(?<File>.*)

- Change marked in red

Simple change but testing this takes some time as I either have to run my complete application or write a small test program.

Last week Matt Manela on the msdn blog shared a great application that allows you to test C# code without having to even write a class or create a project.

FastSharp is a great tool for testing out some code. It even goes as far as checking for compilation errors.

FastSharp Compile Error

This was caused by me not adding the correct library for the Regex class.

To fix this all I had to do was click settings and then add the using statement.

using System.Text.RegularExpressions;

FastSharp Success

My little code snippet then ran fine and I was able to test and adapt my change very fast.

Great tool be sure to check it out + for more in depth into why and how it was coded see Matt’s post.

Categories
Development, Microsoft
Comments rss
Comments rss
Trackback
Trackback

« Web Request from SQL Server via C# If programmers were to make a plane »

2 responses

The \w use is factually incorrect. As per the RFC

Sebastien Lambla | February 20, 2008

The \w use is factually incorrect. As per the RFC on URIs, domain names
(?[A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9]))?\.)*(?[A-Za-z]([A-Za-z-]*[A-Za-z])?)

Your current regex would happily parse completely invalid urls formed of unicode characters.

That said, this does not include any support for IRI and IDNs where more stop points are to be recognized…

What’s wrong with Uri.TryParse? :)

Hi Sebastien, You're right that the regex does not cover all

Nick Clarke | February 20, 2008

Hi Sebastien,

You’re right that the regex does not cover all of the valid URLs characters/patterns nor would it allow unicode characters etc. In my situation it works as I have control over the URL + its only used in a site with limited exposure so this simple approach works.

If someone wanted to break up their URL this expression would not be the way to go, at least in its current state.

I looked at the .Net docs and I could not find Uri.TryParse, there was a Uri.Parse but this is marked as obsolete. Is this in a special library or am I overlooking it?

Cheers for your feedback

Leave a comment

You can use these tags : <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Pages

  • Articles
  • Contact
  • Downloads

Categories

  • Adverts (4)
  • Books (2)
  • Business (3)
  • Development (21)
  • Google (4)
  • Microsoft (15)
    • ASP.Net (1)
    • SQL Server (1)
    • Visual Studio (3)
  • Music (3)
  • Personal (16)
  • PHP (2)
  • Software (1)
  • WordPress (3)

Archives

  • February 2008 (3)
  • January 2008 (4)
  • December 2007 (2)
  • November 2007 (13)
  • October 2007 (25)
  • September 2007 (1)

Google Reader Shared Items

    Shared Items
    rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox