> I've got a web page I want to download. I contains roughly 100 records.
> I want to sort and group the data based on certain fields. I also want to
> calculate medians based on some of the records grouped together.
Thanks Jeffery and dreeschkind for your suggestiongs.
I've retrieved the page, and have been able to cut out a lot of stuff, by
finding a open and close tag that occurs once around the data I want.
Now, some remaining things I need to resolve:
1. I probably should remove some of the <*> HTML tags, and everything in
them. That should be easy enough, and I'll go searching for a .NET regex
expression for stripping or replacing.
2. My data isn't very orderly... It looks something like:
Header1 Header2
Header3 Datarow1col1 Datarow1col2
Datarow1col3...
Using something like this, I've been able to split all the entries:
[regex]::Matches((gc file),"<xx>(.*?)</xx>")
The problem is, I need to format this like this:
Datarow1col1 Datarow1col2 Datarow1col3
Datarow2col1 Datarow2col2 Datarow2col3
etc.
I also need to omit the column header, but have some ideas I haven't tried
yet.
How can I take all of this data, all poorly formatted to start with, and
make a nice 3 column x N row table, so I can continue massaging the data?
I've tried to explain as best I could...
Marco