regular expressions for XML tags
Regular expressions are powerful and can substitute XML libraries for simple tasks. Say, you need to select all elements with a specific name from an XML file. Below is a sample doing that. The program reads XMLFile11, selects three different elements and prints them and their content to the console.
XMLFile1:
<?xml version="1.0" encoding="utf-8" ?> <Main> <Item Name="item1"/> <Item Name ="item2"> <Components> <Component/> <Component/> <Component Name="component10"> <SubComponent/> </Component> </Components> </Item> <Item> <Item></Item> <Item></Item> </Item> </Main>
C# code:
class Program { public Program(string Xml) { _xml = Xml; } string _xml; void PrintTags(string tagName) { string expression = @"(<{0}\/>)|" + // gets <tagName/> @"(<{0}\s[^>]*?\/>)|" + //<tagName[space]BlaBla.../> @"(<{0}>[\s\S]*?<\/{0}\s*>)|" + //<tagName>BlaBla...</tagName> @"(<{0}\s[\s\S]*?>[\s\S]*?<\/{0}\s*>)"; //<tagName[space]BlaBla...>BlaBla...</tagName> Regex regex = new Regex(String.Format(expression, tagName)); Match match = regex.Match(_xml); do { Console.WriteLine("tag: {0}", tagName); Console.WriteLine(match.Value); match = match.NextMatch(); } while (match.Success); } void Run() { PrintTags("Item"); PrintTags("Component"); PrintTags("Components"); } static void Main(string[] args) { Program program = new Program(File.ReadAllText("XMLFile1.xml")); program.Run(); Console.Read(); } }
I had to define expresions for four cases (see comments in the PrintTags method).
The following is the output:
Output:
tag: Item <Item Name="item1"/> tag: Item <Item Name ="item2"> <Components> <Component/> <Component/> <Component Name="component10"> <SubComponent/> </Component> </Components> </Item> tag: Component <Component/> tag: Component <Component/> tag: Component <Component Name="component10"> <SubComponent/> </Component> tag: Components <Components> <Component/> <Component/> <Component Name="component10"> <SubComponent/> </Component> </Components>
However this code won’t work if XML file contains nested elements with identical names. See example below.
XML:
<Item>
<Item></Item>
<Item></Item>
</Item>
If you call the PrintTags method with “Item”, you will get <Item><Item></Item>. It happens because the regular expression doesn’t count opened and closed tags.
-
Recent
- Remote desktop via VPN from SUSE 11.1
- EVPO DC project structure
- EVPODC Getting Started Part 2 (Configuration)
- EVPODC getting started part 1 (installation)
- Customize property editor in the VS designer
- How to load referenced assemblies from any place you want
- The puzzle about a car and two goats
- Effect of XML:Space=”preserve”
- regular expressions for XML tags
-
Links
-
Archives
- May 2009 (2)
- November 2008 (2)
- June 2008 (5)
-
Categories
-
RSS
Entries RSS
Comments RSS