![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks. |
| |||||||
![]() |
| |
| | #1 (permalink) |
| | Parse XML files from Powershell? Say I have an xml file like the one below: <root> <tag id="1"/> <tag id="2"/> <tag id="3"/> <tag id="4"/> <tag id="5"/> </root> ...and I want to query the document for a subset based on an XPath query to only return tag elements whose id attribute is > 2, i.e.: <root> <tag id="3"/> <tag id="4"/> <tag id="5"/> </root> In the past, I've had to create a script to instantiate an MSXML COM DOM object, load the file, apply the x-path and then save the results to a new file. I'm hoping that the Powershell will simply let me apply the XPath to the file on the command line and spit out the results to stdout - that would be a real time saver! I guess it might involve invoking the .NET XML classes somehow, or maybe there are specific powershell xml commands? Can it be done? Please follow up posts to microsoft.public.windows.powershell Many thanks, Duncan. |
My System Specs![]() |
| | #2 (permalink) |
| | RE: Parse XML files from Powershell? PoSH> $xml = @' >> <root> >> <tag id="1"/> >> <tag id="2"/> >> <tag id="3"/> >> <tag id="4"/> >> <tag id="5"/> >> </root> >> '@ >> ( [xml]$xml ).SelectNodes('/root/tag[@id>2]') id -- 3 4 5 Greetings /\/\o\/\/ "Duncan Smith" wrote: > Say I have an xml file like the one below: > > <root> > <tag id="1"/> > <tag id="2"/> > <tag id="3"/> > <tag id="4"/> > <tag id="5"/> > </root> > > ...and I want to query the document for a subset based on an XPath > query to only return tag elements whose id attribute is > 2, i.e.: > > <root> > <tag id="3"/> > <tag id="4"/> > <tag id="5"/> > </root> > > In the past, I've had to create a script to instantiate an MSXML COM > DOM object, load the file, apply the x-path and then save the results > to a new file. > > I'm hoping that the Powershell will simply let me apply the XPath to > the file on the command line and spit out the results to stdout - that > would be a real time saver! I guess it might involve invoking > the .NET XML classes somehow, or maybe there are specific powershell > xml commands? > > Can it be done? > > Please follow up posts to microsoft.public.windows.powershell > > Many thanks, > > Duncan. > > |
My System Specs![]() |
| | #3 (permalink) |
| | RE: Parse XML files from Powershell? or output first than filter PoSH> ([xml]$xml).root.tag |? {$_.id -gt 2} id -- 3 4 5 Greetings /\/\o\/\/ "/\/\o\/\/ [MVP]" wrote: > PoSH> $xml = @' > >> <root> > >> <tag id="1"/> > >> <tag id="2"/> > >> <tag id="3"/> > >> <tag id="4"/> > >> <tag id="5"/> > >> </root> > >> '@ > >> > > ( [xml]$xml ).SelectNodes('/root/tag[@id>2]') > > id > -- > 3 > 4 > 5 > > Greetings /\/\o\/\/ > > > "Duncan Smith" wrote: > > > Say I have an xml file like the one below: > > > > <root> > > <tag id="1"/> > > <tag id="2"/> > > <tag id="3"/> > > <tag id="4"/> > > <tag id="5"/> > > </root> > > > > ...and I want to query the document for a subset based on an XPath > > query to only return tag elements whose id attribute is > 2, i.e.: > > > > <root> > > <tag id="3"/> > > <tag id="4"/> > > <tag id="5"/> > > </root> > > > > In the past, I've had to create a script to instantiate an MSXML COM > > DOM object, load the file, apply the x-path and then save the results > > to a new file. > > > > I'm hoping that the Powershell will simply let me apply the XPath to > > the file on the command line and spit out the results to stdout - that > > would be a real time saver! I guess it might involve invoking > > the .NET XML classes somehow, or maybe there are specific powershell > > xml commands? > > > > Can it be done? > > > > Please follow up posts to microsoft.public.windows.powershell > > > > Many thanks, > > > > Duncan. > > > > |
My System Specs![]() |
| | #4 (permalink) |
| | Re: Parse XML files from Powershell? > > ( [xml]$xml ).SelectNodes('/root/tag[@id>2]') > > id > -- > 3 > 4 > 5 > Thanks, that's a good start in that it returns the correct ids.. but I thought (at least in MSXML) that SelectNodes returned a collection of elements (or nodes) and I was hoping to see the full content so I could direct it to another file, something like... $test.SelectNodes('/root/tag[@id>2]') > results.txt and end up with: <root> <tag id="3"/> <tag id="4"/> <tag id="5"/> </root> I somehow need to tell the Powershell that I'm interested in the 'node.xml' contents and not just the value of the id attribute? Thanks, Duncan. |
My System Specs![]() |
| | #5 (permalink) |
| | Re: Parse XML files from Powershell? On Mar 20, 2:26 pm, "Duncan Smith" <DSmith1...@googlemail.com> wrote: > > ( [xml]$xml ).SelectNodes('/root/tag[@id>2]') > > > id > > -- > > 3 > > 4 > > 5 > > Thanks, that's a good start in that it returns the correct ids.. but I > thought (at least in MSXML) that SelectNodes returned a collection of > elements (or nodes) and I was hoping to see the full content so I > could direct it to another file, something like... > > $test.SelectNodes('/root/tag[@id>2]') > results.txt > > and end up with: > > <root> > <tag id="3"/> > <tag id="4"/> > <tag id="5"/> > </root> > > I somehow need to tell the Powershell that I'm interested in the > 'node.xml' contents and not just the value of the id attribute? > > Thanks, > > Duncan. $o="<root>";foreach ($n in ([xml]$x).selectnodes("/root/tag[@id > 1]")) { $o += $n.get_outerxml() };$o+="</root>" $o > out.xml [xml] objects in powershell are a little bit different that other objects in that you must access properties with the getter and setter methods. All properties are generated from the xml source itself. |
My System Specs![]() |
| | #6 (permalink) |
| | Re: Parse XML files from Powershell? > > $o="<root>";foreach ($n in ([xml]$x).selectnodes("/root/tag[@id > > 1]")) { $o += $n.get_outerxml() };$o+="</root>" > > $o > out.xml > > [xml] objects in powershell are a little bit different that other > objects in that you must access properties with the getter and setter > methods. All properties are generated from the xml source itself. That's tantalizingly useful and the Powershell works well for small noddy-xml files, but when I scale up to real world data (an 11.5MB xml file) I get the following results '[xml]$race=gc race.xml' - takes 20s not too great, but I can live with that... 'foreach($n in ([xml]$race).selectnodes("//*")) { $r += $n.get_outerxml() }' This has been consuming 50% cpu for over ten minutes now and there's still no end in sight.. Would I better off using Xerces from the command prompt, or going back to driving MSXML from a vbs script? Thanks, Duncan. |
My System Specs![]() |
| | #7 (permalink) |
| | Re: Parse XML files from Powershell? > 'foreach($n in ([xml]$race).selectnodes("//*")) { $r += > $n.get_outerxml() }' > > This has been consuming 50% cpu for over ten minutes now and there's > still no end in sight.. I'm not surprised, considering that "//*" returns all nodes on all levels along with all their children =) It usualy useful to see what output your command is producing, just to make sure that it's doing what you think it's doing. Try this: ([xml]$race).selectnodes("//*") | %{ $_.get_outerxml() } > out.txt Cancel this after some time and look at out.txt, you'll see what I mean. //Andrew |
My System Specs![]() |
| | #8 (permalink) |
| | Re: Parse XML files from Powershell? > > Try this: > > ([xml]$race).selectnodes("//*") | %{ $_.get_outerxml() } > out.txt > > Cancel this after some time and look at out.txt, you'll see what I mean. > Thanks, in this case (just for a test) I was intending to get the whole file - hence '//*' but it was still running after half an hour. Obviously there are better ways to replicate the file (such as 'copy fn1 fn2'), but was still surprised at just how long it was taking - maybe there is a better choice of XPath query than //*? Anyway, if I narrow down the filter a little more to something like: $race.selectnodes("/root/Elem1[Elem2/@Value='n']") | % { $_.get_outerxml() } > out.txt Then it is very quick (and useful). Thanks, Duncan. ps If I start a command that will take a long time to display, i.e. 'type verybigfile.txt' then Ctrl-C or Esc does not cancel the command and return to the prompt - only Ctrl-Break which terminates the whole PowerShell session - effective, but a little brutal...? |
My System Specs![]() |
| | #9 (permalink) |
| | Re: Parse XML files from Powershell? Duncan, '*//' does not replicate the file. If you look at out.txt, you'd see that this file size is enourmous. Much much bigger then the original files size. Let's look at a smaler example. If your input is <root> <tag id="1"> <subtag sub="1" /> </tag> <tag id="2"> <subtag sub="2" /> </tag> <tag id="3"> <subtag sub="3" /> </tag> <tag id="4"> <subtag sub="4" /> </tag> <tag id="5"> <subtag sub="5" /> </tag> </root> '*//' will return elven different nodes and each node returned will include all the children. First node returned will be the root node with all subchildren. This will already give you the length of your original file. Then <tag id="1"> is returned along with all its subchildren. Then <subtag sub="1" />. And so on, until all the eleven distinct nodes each with respective subchildren in this file are returned. As you can see it will produce output exponentially large then the original file. So if you start out with 11MB file and this file contains xml that are several levels deep and wide you can expect *very* large output, and no wonder that it takes lots and lots of time to produce it. What you wanted is probably '/*'. Andrew. Duncan Smith wrote: >> Try this: >> >> ([xml]$race).selectnodes("//*") | %{ $_.get_outerxml() } > out.txt >> >> Cancel this after some time and look at out.txt, you'll see what I mean. >> > > Thanks, in this case (just for a test) I was intending to get the > whole file - hence '//*' but it was still running after half an hour. > Obviously there are better ways to replicate the file (such as 'copy > fn1 fn2'), but was still surprised at just how long it was taking - > maybe there is a better choice of XPath query than //*? > > Anyway, if I narrow down the filter a little more to something like: > > $race.selectnodes("/root/Elem1[Elem2/@Value='n']") | % > { $_.get_outerxml() } > out.txt > > Then it is very quick (and useful). > > Thanks, > > Duncan. > > ps If I start a command that will take a long time to display, i.e. > 'type verybigfile.txt' then Ctrl-C or Esc does not cancel the command > and return to the prompt - only Ctrl-Break which terminates the whole > PowerShell session - effective, but a little brutal...? > > |
My System Specs![]() |
![]() |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Forum | |||
| Here is a PowerShell script to parse nmap XML output files. | PowerShell | |||
| parse files | VB Script | |||
| Can Powershell parse email? | PowerShell | |||
| How Powershell parse HTMLDocument? | PowerShell | |||
| parse just ip addresses from syslog files | PowerShell | |||