• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Sorting, grouping data

M

Marco Shaw

#1
I've got a web page I want to download. I contains roughly 100 records. I
want to sort and group the data based on certain fields. I also want to
calculate medians based on some of the records grouped together.

Any hints/tips to get me going? Do I want to pull this into Excel to
manipulate?

Marco
 

My Computer

J

Jeffery Hicks

#2
On Fri, 3 Nov 2006 15:58:50 -0400, Marco Shaw wrote:

> I've got a web page I want to download. I contains roughly 100 records. I
> want to sort and group the data based on certain fields. I also want to
> calculate medians based on some of the records grouped together.
>
> Any hints/tips to get me going? Do I want to pull this into Excel to
> manipulate?
>
> Marco


Use and ADO Recordset.
--
Jeffery Hicks - www.ScriptingAnswers.com
SAPIEN Technologies - Scripting, Simplified. www.SAPIEN.com
Scripting books: www.SAPIENPress.com
 

My Computer

D

dreeschkind

#3
"Marco Shaw" wrote:

> I've got a web page I want to download.


Well, first you need get the web page. If it is just a plain HTML file that
doesn't involve any interaction like filling out forms etc. then it should be
sufficient to use:

$url = "http://www.example.com/data.html"
$data = (new-object System.Net.WebClient).DownloadString($url)


> I contains roughly 100 records.


Then you need to crack that string into the field/value pairs that you are
interested in.
If the web page is written in proper XHTML, casting the $data string to
[XML] will probably make this part easier. Otherwise you will probably need
to use regular expressions and/or string split methods to extract the
relevant information.
Depending on the format of the records it might make sense to create a
"synthetic class" for these records and store them all in an array. (See
thread "PowerShell Class") This will make sorting and grouping easier.

> I want to sort and group the data based on certain fields.


See Help for Sort-Object and Group-Object Cmdlets.

> I also want to calculate medians based on some of the records grouped together.


Pipe the grouped objects to the ForEach Cmdlet and use begin/process/end
scriptblocks or write your own function to calculate medians based on
$input/$_.

> Any hints/tips to get me going? Do I want to pull this into Excel to
> manipulate?


See Help for Export-Csv Cmdlet.

--
greetings
dreeschkind
 

My Computer

M

Marco Shaw

#4
> I've got a web page I want to download. I contains roughly 100 records.
> I want to sort and group the data based on certain fields. I also want to
> calculate medians based on some of the records grouped together.


Thanks Jeffery and dreeschkind for your suggestiongs.

I've retrieved the page, and have been able to cut out a lot of stuff, by
finding a open and close tag that occurs once around the data I want.

Now, some remaining things I need to resolve:
1. I probably should remove some of the <*> HTML tags, and everything in
them. That should be easy enough, and I'll go searching for a .NET regex
expression for stripping or replacing.
2. My data isn't very orderly... It looks something like:

Header1 Header2
Header3 Datarow1col1 Datarow1col2
Datarow1col3...

Using something like this, I've been able to split all the entries:
[regex]::Matches((gc file),"<xx>(.*?)</xx>")

The problem is, I need to format this like this:
Datarow1col1 Datarow1col2 Datarow1col3
Datarow2col1 Datarow2col2 Datarow2col3
etc.

I also need to omit the column header, but have some ideas I haven't tried
yet.

How can I take all of this data, all poorly formatted to start with, and
make a nice 3 column x N row table, so I can continue massaging the data?

I've tried to explain as best I could...

Marco
 

My Computer

L

Lee Holmes [MSFT]

#5
See if this gives you any pointers, Marco:

http://www.leeholmes.com/blog/PowerShellTheOracleInstantAnswersFromYourPrompt.aspx

--
Lee Holmes [MSFT]
Windows PowerShell Development
Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.

"Marco Shaw" <marco@Znbnet.nb.ca> wrote in message
news:OBgKmpHAHHA.4060@TK2MSFTNGP03.phx.gbl...
>> I've got a web page I want to download. I contains roughly 100 records.
>> I want to sort and group the data based on certain fields. I also want
>> to calculate medians based on some of the records grouped together.

>
> Thanks Jeffery and dreeschkind for your suggestiongs.
>
> I've retrieved the page, and have been able to cut out a lot of stuff, by
> finding a open and close tag that occurs once around the data I want.
>
> Now, some remaining things I need to resolve:
> 1. I probably should remove some of the <*> HTML tags, and everything in
> them. That should be easy enough, and I'll go searching for a .NET regex
> expression for stripping or replacing.
> 2. My data isn't very orderly... It looks something like:
>
> Header1 Header2
> Header3 Datarow1col1 Datarow1col2
> Datarow1col3...
>
> Using something like this, I've been able to split all the entries:
> [regex]::Matches((gc file),"<xx>(.*?)</xx>")
>
> The problem is, I need to format this like this:
> Datarow1col1 Datarow1col2 Datarow1col3
> Datarow2col1 Datarow2col2 Datarow2col3
> etc.
>
> I also need to omit the column header, but have some ideas I haven't tried
> yet.
>
> How can I take all of this data, all poorly formatted to start with, and
> make a nice 3 column x N row table, so I can continue massaging the data?
>
> I've tried to explain as best I could...
>
> Marco
>
 

My Computer

Users Who Are Viewing This Thread (Users: 1, Guests: 0)