Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > PowerShell

Vista - Parse XML files from Powershell?

Reply
 
Old 03-20-2007   #1 (permalink)
Duncan Smith


 
 

Parse XML files from Powershell?

Say I have an xml file like the one below:

<root>
<tag id="1"/>
<tag id="2"/>
<tag id="3"/>
<tag id="4"/>
<tag id="5"/>
</root>

...and I want to query the document for a subset based on an XPath
query to only return tag elements whose id attribute is > 2, i.e.:

<root>
<tag id="3"/>
<tag id="4"/>
<tag id="5"/>
</root>

In the past, I've had to create a script to instantiate an MSXML COM
DOM object, load the file, apply the x-path and then save the results
to a new file.

I'm hoping that the Powershell will simply let me apply the XPath to
the file on the command line and spit out the results to stdout - that
would be a real time saver! I guess it might involve invoking
the .NET XML classes somehow, or maybe there are specific powershell
xml commands?

Can it be done?

Please follow up posts to microsoft.public.windows.powershell

Many thanks,

Duncan.


My System SpecsSystem Spec
Old 03-20-2007   #2 (permalink)
/\/\o\/\/ [MVP]


 
 

RE: Parse XML files from Powershell?

PoSH> $xml = @'
>> <root>
>> <tag id="1"/>
>> <tag id="2"/>
>> <tag id="3"/>
>> <tag id="4"/>
>> <tag id="5"/>
>> </root>
>> '@
>>


( [xml]$xml ).SelectNodes('/root/tag[@id>2]')

id
--
3
4
5

Greetings /\/\o\/\/


"Duncan Smith" wrote:

> Say I have an xml file like the one below:
>
> <root>
> <tag id="1"/>
> <tag id="2"/>
> <tag id="3"/>
> <tag id="4"/>
> <tag id="5"/>
> </root>
>
> ...and I want to query the document for a subset based on an XPath
> query to only return tag elements whose id attribute is > 2, i.e.:
>
> <root>
> <tag id="3"/>
> <tag id="4"/>
> <tag id="5"/>
> </root>
>
> In the past, I've had to create a script to instantiate an MSXML COM
> DOM object, load the file, apply the x-path and then save the results
> to a new file.
>
> I'm hoping that the Powershell will simply let me apply the XPath to
> the file on the command line and spit out the results to stdout - that
> would be a real time saver! I guess it might involve invoking
> the .NET XML classes somehow, or maybe there are specific powershell
> xml commands?
>
> Can it be done?
>
> Please follow up posts to microsoft.public.windows.powershell
>
> Many thanks,
>
> Duncan.
>
>

My System SpecsSystem Spec
Old 03-20-2007   #3 (permalink)
/\/\o\/\/ [MVP]


 
 

RE: Parse XML files from Powershell?

or output first than filter

PoSH> ([xml]$xml).root.tag |? {$_.id -gt 2}

id
--
3
4
5

Greetings /\/\o\/\/


"/\/\o\/\/ [MVP]" wrote:

> PoSH> $xml = @'
> >> <root>
> >> <tag id="1"/>
> >> <tag id="2"/>
> >> <tag id="3"/>
> >> <tag id="4"/>
> >> <tag id="5"/>
> >> </root>
> >> '@
> >>

>
> ( [xml]$xml ).SelectNodes('/root/tag[@id>2]')
>
> id
> --
> 3
> 4
> 5
>
> Greetings /\/\o\/\/
>
>
> "Duncan Smith" wrote:
>
> > Say I have an xml file like the one below:
> >
> > <root>
> > <tag id="1"/>
> > <tag id="2"/>
> > <tag id="3"/>
> > <tag id="4"/>
> > <tag id="5"/>
> > </root>
> >
> > ...and I want to query the document for a subset based on an XPath
> > query to only return tag elements whose id attribute is > 2, i.e.:
> >
> > <root>
> > <tag id="3"/>
> > <tag id="4"/>
> > <tag id="5"/>
> > </root>
> >
> > In the past, I've had to create a script to instantiate an MSXML COM
> > DOM object, load the file, apply the x-path and then save the results
> > to a new file.
> >
> > I'm hoping that the Powershell will simply let me apply the XPath to
> > the file on the command line and spit out the results to stdout - that
> > would be a real time saver! I guess it might involve invoking
> > the .NET XML classes somehow, or maybe there are specific powershell
> > xml commands?
> >
> > Can it be done?
> >
> > Please follow up posts to microsoft.public.windows.powershell
> >
> > Many thanks,
> >
> > Duncan.
> >
> >

My System SpecsSystem Spec
Old 03-20-2007   #4 (permalink)
Duncan Smith


 
 

Re: Parse XML files from Powershell?


>
> ( [xml]$xml ).SelectNodes('/root/tag[@id>2]')
>
> id
> --
> 3
> 4
> 5
>


Thanks, that's a good start in that it returns the correct ids.. but I
thought (at least in MSXML) that SelectNodes returned a collection of
elements (or nodes) and I was hoping to see the full content so I
could direct it to another file, something like...

$test.SelectNodes('/root/tag[@id>2]') > results.txt

and end up with:

<root>
<tag id="3"/>
<tag id="4"/>
<tag id="5"/>
</root>

I somehow need to tell the Powershell that I'm interested in the
'node.xml' contents and not just the value of the id attribute?

Thanks,

Duncan.

My System SpecsSystem Spec
Old 03-20-2007   #5 (permalink)
Oisin Grehan


 
 

Re: Parse XML files from Powershell?

On Mar 20, 2:26 pm, "Duncan Smith" <DSmith1...@googlemail.com> wrote:
> > ( [xml]$xml ).SelectNodes('/root/tag[@id>2]')

>
> > id
> > --
> > 3
> > 4
> > 5

>
> Thanks, that's a good start in that it returns the correct ids.. but I
> thought (at least in MSXML) that SelectNodes returned a collection of
> elements (or nodes) and I was hoping to see the full content so I
> could direct it to another file, something like...
>
> $test.SelectNodes('/root/tag[@id>2]') > results.txt
>
> and end up with:
>
> <root>
> <tag id="3"/>
> <tag id="4"/>
> <tag id="5"/>
> </root>
>
> I somehow need to tell the Powershell that I'm interested in the
> 'node.xml' contents and not just the value of the id attribute?
>
> Thanks,
>
> Duncan.


$o="<root>";foreach ($n in ([xml]$x).selectnodes("/root/tag[@id >
1]")) { $o += $n.get_outerxml() };$o+="</root>"

$o > out.xml


[xml] objects in powershell are a little bit different that other
objects in that you must access properties with the getter and setter
methods. All properties are generated from the xml source itself.

My System SpecsSystem Spec
Old 03-21-2007   #6 (permalink)
Duncan Smith


 
 

Re: Parse XML files from Powershell?


>
> $o="<root>";foreach ($n in ([xml]$x).selectnodes("/root/tag[@id >
> 1]")) { $o += $n.get_outerxml() };$o+="</root>"
>
> $o > out.xml
>
> [xml] objects in powershell are a little bit different that other
> objects in that you must access properties with the getter and setter
> methods. All properties are generated from the xml source itself.


That's tantalizingly useful and the Powershell works well for small
noddy-xml files, but when I scale up to real world data (an 11.5MB xml
file) I get the following results

'[xml]$race=gc race.xml' - takes 20s not too great, but I can live
with that...

'foreach($n in ([xml]$race).selectnodes("//*")) { $r +=
$n.get_outerxml() }'

This has been consuming 50% cpu for over ten minutes now and there's
still no end in sight.. Would I better off using Xerces from the
command prompt, or going back to driving MSXML from a vbs script?

Thanks,

Duncan.

My System SpecsSystem Spec
Old 03-21-2007   #7 (permalink)
Andrew Savinykh


 
 

Re: Parse XML files from Powershell?

> 'foreach($n in ([xml]$race).selectnodes("//*")) { $r +=
> $n.get_outerxml() }'
>
> This has been consuming 50% cpu for over ten minutes now and there's
> still no end in sight..


I'm not surprised, considering that "//*" returns all nodes on all
levels along with all their children =)

It usualy useful to see what output your command is producing, just to
make sure that it's doing what you think it's doing.

Try this:

([xml]$race).selectnodes("//*") | %{ $_.get_outerxml() } > out.txt

Cancel this after some time and look at out.txt, you'll see what I mean.

//Andrew
My System SpecsSystem Spec
Old 03-21-2007   #8 (permalink)
Duncan Smith


 
 

Re: Parse XML files from Powershell?


>
> Try this:
>
> ([xml]$race).selectnodes("//*") | %{ $_.get_outerxml() } > out.txt
>
> Cancel this after some time and look at out.txt, you'll see what I mean.
>


Thanks, in this case (just for a test) I was intending to get the
whole file - hence '//*' but it was still running after half an hour.
Obviously there are better ways to replicate the file (such as 'copy
fn1 fn2'), but was still surprised at just how long it was taking -
maybe there is a better choice of XPath query than //*?

Anyway, if I narrow down the filter a little more to something like:

$race.selectnodes("/root/Elem1[Elem2/@Value='n']") | %
{ $_.get_outerxml() } > out.txt

Then it is very quick (and useful).

Thanks,

Duncan.

ps If I start a command that will take a long time to display, i.e.
'type verybigfile.txt' then Ctrl-C or Esc does not cancel the command
and return to the prompt - only Ctrl-Break which terminates the whole
PowerShell session - effective, but a little brutal...?


My System SpecsSystem Spec
Old 03-21-2007   #9 (permalink)
Andrew Savinykh


 
 

Re: Parse XML files from Powershell?

Duncan,

'*//' does not replicate the file. If you look at out.txt, you'd see
that this file size is enourmous. Much much bigger then the original
files size. Let's look at a smaler example. If your input is

<root>
<tag id="1">
<subtag sub="1" />
</tag>
<tag id="2">
<subtag sub="2" />
</tag>
<tag id="3">
<subtag sub="3" />
</tag>
<tag id="4">
<subtag sub="4" />
</tag>
<tag id="5">
<subtag sub="5" />
</tag>
</root>

'*//' will return elven different nodes and each node returned will
include all the children. First node returned will be the root node with
all subchildren. This will already give you the length of your original
file. Then <tag id="1"> is returned along with all its subchildren. Then
<subtag sub="1" />. And so on, until all the eleven distinct nodes each
with respective subchildren in this file are returned. As you can see it
will produce output exponentially large then the original file. So if
you start out with 11MB file and this file contains xml that are several
levels deep and wide you can expect *very* large output, and no wonder
that it takes lots and lots of time to produce it.

What you wanted is probably '/*'.

Andrew.

Duncan Smith wrote:
>> Try this:
>>
>> ([xml]$race).selectnodes("//*") | %{ $_.get_outerxml() } > out.txt
>>
>> Cancel this after some time and look at out.txt, you'll see what I mean.
>>

>
> Thanks, in this case (just for a test) I was intending to get the
> whole file - hence '//*' but it was still running after half an hour.
> Obviously there are better ways to replicate the file (such as 'copy
> fn1 fn2'), but was still surprised at just how long it was taking -
> maybe there is a better choice of XPath query than //*?
>
> Anyway, if I narrow down the filter a little more to something like:
>
> $race.selectnodes("/root/Elem1[Elem2/@Value='n']") | %
> { $_.get_outerxml() } > out.txt
>
> Then it is very quick (and useful).
>
> Thanks,
>
> Duncan.
>
> ps If I start a command that will take a long time to display, i.e.
> 'type verybigfile.txt' then Ctrl-C or Esc does not cancel the command
> and return to the prompt - only Ctrl-Break which terminates the whole
> PowerShell session - effective, but a little brutal...?
>
>

My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Here is a PowerShell script to parse nmap XML output files. PowerShell
parse files VB Script
Can Powershell parse email? PowerShell
How Powershell parse HTMLDocument? PowerShell
parse just ip addresses from syslog files PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46