Ramakant Yadav the Internet edition

Random Ramblings Of Ramakant Yadav

7 Jun 2011

Regular expressions in C# are a fun topic. For some reason, most regular folks just don't like to use regular expressions. The first programming language I learnt was C++ on Visual studio 6.0(that was on my own; and then, like everybody else, I learnt C once I reached college, but that is another story.)

Regex is a great tool for text processing; as long as people do not use it to build HTML parsers. The characters used inside regular expressions are called language elements. The patterns themselves are called regular expressions. And it is all brought together by a regular expression engine.

The whole list of regular expression language elements is here

The interesting thing about regular expressions is the sheer range of options available. For example using backreferences, one can identify a repeating character in a string. The repeating character can change: i.e. we can identify any number of repeating characters in a string.

Or one can use substitutions, replacing text before the match, after the match or the entire match.

One may decide to use anchors, that way we can specify where the match occurs.

Regular expressions are interpreted; however, one may also decide to compile the regular expression so as to maximize run time efficiency using RegexOptions.Compiled .

One may also decide to compile to assembly using the Regex.CompileToAssembly method. The generated assembly contains predefined compiled regular expressions.

Regular expressions are not just useful for string manipulation while processing text files, they also find application in SSRS. For SSRS functions and expressions, look here here.

Regular expressions should not be used to build HTML parsers. Building a HTML parser is not like building a compiler even if one decides to do type checking and re-formatting before lexical analysis. HTML parsers should not be used if one just wants to extract data from a web page.

Regular expressions should be used with caution when dealing with multiple cultures. One should always switch to invariant using something like this : Regex.IsMatch(input, pattern, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant).

blog comments powered by Disqus

The Archives | Contact me | Home

All Rights Reserved ramakantyadav.com