Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > PowerShell

Vista - Searching large text files

Reply
 
Old 06-17-2007   #11 (permalink)
Kiron


 
 

Re: Searching large text files

So whenever -readCount is set to a value -gt 1 (because the file's content
is very large), and these collections are sent through the pipeline to
Where-Object, one could:
a) split each collection passed through and filter them again to get each
item that matches the first filter criteria;
b) set -readCount to 1, or
c) avoid Where-Object and filter the collections in a Foreach-Object loop.

--
Kiron


My System SpecsSystem Spec
Old 06-17-2007   #12 (permalink)
Keith Hill [MVP]


 
 

Re: Searching large text files

"Kiron" <Kiron@discussions.microsoft.com> wrote in message news:B400D5CC-303C-4DA3-86AA-FFCD62A36FE5@microsoft.com...
> So whenever -readCount is set to a value -gt 1 (because the file's content
> is very large), and these collections are sent through the pipeline to
> Where-Object, one could:
> a) split each collection passed through and filter them again to get each
> item that matches the first filter criteria;
> b) set -readCount to 1, or
> c) avoid Where-Object and filter the collections in a Foreach-Object loop.


You've got it. BTW A and C look good to me but I would avoid B on large files - a readCount of 1 just kills performance. On a 75 MB text file, on my machine, reading the file line by line and counting lines takes over 3 minutes:

3> measure-command { gc large.txt -read 1 | measure }


Days : 0
Hours : 0
Minutes : 3
Seconds : 22
Milliseconds : 574
Ticks : 2025743467
TotalDays : 0.00234461049421296
TotalHours : 0.0562706518611111
TotalMinutes : 3.37623911166667
TotalSeconds : 202.5743467
TotalMilliseconds : 202574.3467

While effectively getting the same count info using a -readCount of 1000 takes less than half that time:

5> measure-command { gc large.txt -read 1000 | %{$_} | measure }


Days : 0
Hours : 0
Minutes : 1
Seconds : 28
Milliseconds : 501
Ticks : 885018447
TotalDays : 0.00102432690625
TotalHours : 0.02458384575
TotalMinutes : 1.475030745
TotalSeconds : 88.5018447
TotalMilliseconds : 88501.8447

FYI I decided to benchmark the various different readCount value and it seems that for my 75 MB text file a readCount of 1000 was optimal:

ReadCount ElapsedTime
--------- -----------
1 00:03:07.4161690
10 00:01:38.5779661
100 00:01:17.9219998
1000 00:01:14.9202370
10000 00:01:22.1434037
100000 00:01:17.8457756
1000000 00:01:17.9850525
10000000 00:01:19.0217524

Here's the script I used to test this:

23> $ht = @{};for ($i = 1; $i -le 10MB; $i *= 10) {
>> write-progress "Measuring gc -readCount $i" "% Complete" `
>> -perc ([math]::log10($i)*100/[math]::log10(10MB))
>> $ts = measure-command { gc large.txt -read $i | %{$_} | measure }
>> $ht[$i] = $ts
>> }
>>


24> $ht.Keys | sort | select @{n='ReadCount';e={$_}}, @{n='ElapsedTime';e={$ht[$_].ToString()}} | ft -a

and it you have PowerGadgets you can chart this like so:

33> $ht.Keys | sort | select @{n='ReadCount';e={"RC: $_"}}, @{n='ElapsedTime';e={$ht[$_].TotalSeconds}} | out-chart -title 'Optimal ReadCount for 75MB Text File'



--
Keith
My System SpecsSystem Spec
Old 06-17-2007   #13 (permalink)
Kiron


 
 

Re: Searching large text files

Thanks for the benchmark script and PowerGadget's graph, but no PowerGadget
here ...not yet. Although it is a nice tool.
Definitely, -readCount 1 won't do on large files. I brought this up because
I thought something wasn't right.
Thanks again!

--
Kiron

My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Searching message text Live Mail
searching text within word documents Vista General
Searching for content in text files with powershell PowerShell
Help searching text within XLS files Vista file management
Searching for specific target text Vista General


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46