Reading XML with PowerShell – Why Most Examples You See Are Wrong

There are thousands of articles on the internet and dozens of books about how to read an XML file (or other source) using the built-in [xml] capabilities of PowerShell (version 2.0, the most widely used).

The best way to illustrate why so many of those articles and books are flat out wrong is to illustrate with an example that is loosely based on my real-world experience.

Let’s say you have a file, named d:\projects\myexample.xml, that contains the following XML:

<SomeTopLevel>

<Categories>

<Category>

<Title>Fruit</Title>

<Description>All types of fruit</Description>

<SubCategories>

<SubCategory>

<Title>Apples</Title>

<Description>Various types of apples</Description>

</SubCategory>

<SubCategory>

<Title>Oranges</Title>

<Description>Various types of oranges</Description>

</SubCategory>

</SubCategories>

</Category>

<Category>

<Title>Vegetables</Title>

<Description>All types of vegetables</Description>

<SubCategories>

<SubCategory>

<Title>Carrots</Title>

<Description>All varieties of carrots</Description>

</SubCategory>

<SubCategory>

<Title>Peas</Title>

<Description>All varieties of peas</Description>

</SubCategory>

</SubCategories>

</Category>

</Categories>

</SomeTopLevel>

Now suppose you want to read the above file using PowerShell’s built-in [xml] capabilities.

Where most examples go bad is when they tell you do something like the following to read the above XML file and loop through the <Category> elements beneath the <Categories> element:

param([string] $xmlFilePath=’D:\projects\myexample.xml’)

[xml] $xmlContent = [xml] (Get-Content -Path $xmlFilePath)

[System.Xml.XmlElement] $categories = $xmlContent.Categories

[System.Xml.XmlElement] $category = $null

foreach($category in $categories.ChildNodes)

{

[string] $title = $category.Title

[string] $description = $category.Description

Write-Host (“Title={0},Description={1}” -f $title,$description)

}

If you run the above script, you will see the following pathetic looking (and equally useless) output:

Title=,Description=

The reason why 99% of the online and book-based examples like the one shown above fail is because they are all missing one ABSOLUTELY VITAL line of code (and a corresponding change to the line below it):

param([string] $xmlFilePath=’D:\projects\myexample.xml’)

[xml] $xmlContent = [xml] (Get-Content -Path $xmlFilePath)

# the next line is missing in 99% of all examples I have seen

[System.Xml.XmlElement] $root = $xmlContent.get_DocumentElement()

# notice the corresponding change in the next line

[System.Xml.XmlElement] $categories = $root.Categories

[System.Xml.XmlElement] $category = $null

foreach($category in $categories.ChildNodes)

{

[string] $title = $category.Title

[string] $description = $category.Description

Write-Host (“Title={0},Description={1}” -f $title,$description)

}

The output from the above corrected example is more like you would expect:

Title=Fruit,Description=All types of fruit

Title=Vegetables,Description=All types of vegetables

I hope the above “What Mother Never Told You About The Proper Way To Read XML With PowerShell” tip will save you a lot of grief.  My mistake was actually believing the books and online articles contained examples that really worked (until I tried them and found 99% of them to be flat out wrong).

It’s a shame that a lot of blogs, online articles and even some of the famous “Recipes” and “Cookbooks” on PowerShell are full of bad examples.  Do the people who write those articles actually run PowerShell in real life situations or are they just theoreticians that don’t get “real code under their finger nails”?

About these ads

One thought on “Reading XML with PowerShell – Why Most Examples You See Are Wrong

  1. I got a script working without the “extra line”. It looks something like this:
    [xml]$notecategories = Get-Content $categorysource
    $notecategories.dataroot.NoteCategory | ForEach-Object -Process {
    Write-Host (“Title={0}” -f $_.Name)
    }
    The key for me was using the dataroot property, not trying to get directly to the child nodes. Anyway, I hope this helps.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s