regular expressions for XML tags
Regular expressions are powerful and can substitute XML libraries for simple tasks. Say, you need to select all elements with a specific name from an XML file. Below is a sample doing that. The program reads XMLFile11, selects three different elements and prints them and their content to the console.
XMLFile1:
<?xml version="1.0" encoding="utf-8" ?> <Main> <Item Name="item1"/> <Item Name ="item2"> <Components> <Component/> <Component/> <Component Name="component10"> <SubComponent/> </Component> </Components> </Item> <Item> <Item></Item> <Item></Item> </Item> </Main>
C# code:
class Program { public Program(string Xml) { _xml = Xml; } string _xml; void PrintTags(string tagName) { string expression = @"(<{0}\/>)|" + // gets <tagName/> @"(<{0}\s[^>]*?\/>)|" + //<tagName[space]BlaBla.../> @"(<{0}>[\s\S]*?<\/{0}\s*>)|" + //<tagName>BlaBla...</tagName> @"(<{0}\s[\s\S]*?>[\s\S]*?<\/{0}\s*>)"; //<tagName[space]BlaBla...>BlaBla...</tagName> Regex regex = new Regex(String.Format(expression, tagName)); Match match = regex.Match(_xml); do { Console.WriteLine("tag: {0}", tagName); Console.WriteLine(match.Value); match = match.NextMatch(); } while (match.Success); } void Run() { PrintTags("Item"); PrintTags("Component"); PrintTags("Components"); } static void Main(string[] args) { Program program = new Program(File.ReadAllText("XMLFile1.xml")); program.Run(); Console.Read(); } }
I had to define expresions for four cases (see comments in the PrintTags method).
The following is the output:
Output:
tag: Item <Item Name="item1"/> tag: Item <Item Name ="item2"> <Components> <Component/> <Component/> <Component Name="component10"> <SubComponent/> </Component> </Components> </Item> tag: Component <Component/> tag: Component <Component/> tag: Component <Component Name="component10"> <SubComponent/> </Component> tag: Components <Components> <Component/> <Component/> <Component Name="component10"> <SubComponent/> </Component> </Components>
However this code won’t work if XML file contains nested elements with identical names. See example below.
XML:
<Item>
<Item></Item>
<Item></Item>
</Item>
If you call the PrintTags method with “Item”, you will get <Item><Item></Item>. It happens because the regular expression doesn’t count opened and closed tags.
2 Comments »
Leave a comment
| Next »
-
Recent
- Remote desktop via VPN from SUSE 11.1
- EVPO DC project structure
- EVPODC Getting Started Part 2 (Configuration)
- EVPODC getting started part 1 (installation)
- Customize property editor in the VS designer
- How to load referenced assemblies from any place you want
- The puzzle about a car and two goats
- Effect of XML:Space=”preserve”
- regular expressions for XML tags
-
Links
-
Archives
- May 2009 (2)
- November 2008 (2)
- June 2008 (5)
-
Categories
-
RSS
Entries RSS
Comments RSS
Hi.
Please don’t use this method event for the simplest of task related to XML. Regular expressions are powerful – true, but when you have to work with XML DO USE a XML library! In .NET 3.5 you can use LINQ to XML. It’s so easy to work with XML using Linq to XML.
Using RegEx will create more problems then benefits.
You are right. I said “they can substitute” but I didn’t recommend doing it if it’s possible to use a good library. LINQ to XML is my favorite one.