![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks. |
| |||||||
![]() |
| |
| | #1 (permalink) |
| | Searching large text files I have very large text files (100MB-500MB+) that I need to process in order to extract useful pieces of information. Unfortunately, I can't find an efficient way of doing this with powershell as get-content tries to pull the entire contents of the file into memory and doesn't seem to store it there very efficiently. While I have 2GB of memory on my machine, I keep getting System.OutOfMemoryException errors from get-content. When I look at the powershell.exe process in task manager I see it using over 1.5GB of memory. Is there a more efficient way to do this with Powershell? Thanks, Chris |
My System Specs![]() |
| | #2 (permalink) |
| | Re: Searching large text files Chris Harris wrote: > I have very large text files (100MB-500MB+) that I need to process in order > to extract useful pieces of information. Unfortunately, I can't find an > efficient way of doing this with powershell as get-content tries to pull the > entire contents of the file into memory and doesn't seem to store it there > very efficiently. > > While I have 2GB of memory on my machine, I keep getting > System.OutOfMemoryException errors from get-content. When I look at the > powershell.exe process in task manager I see it using over 1.5GB of memory. > > Is there a more efficient way to do this with Powershell? > > Thanks, > Chris Can you be more specific about what you are trying to accomplish? PowerShell may not be the best in this case (until v.Next). Marco |
My System Specs![]() |
| | #3 (permalink) |
| | Re: Searching large text files Use Get-Content's -ReadCount parameter, set it to 1 to send a line at a time through the pipeline but don't assign this to a variable before, instead redirect the output to a file, e.g.: gc c:\largeFile.txt -read 1 | ? {<filters>} > c:\filteredFile.txt # don't assign the out to variable like this $filterContent = gc c:\largeFile.txt -read 1 | ? {<filters>} -- Kiron |
My System Specs![]() |
| | #4 (permalink) |
| | Re: Searching large text files "Kiron" <Kiron@discussions.microsoft.com> wrote in message news:67AC4B33-13F4-4A00-BD9F-080549674FDE@microsoft.com... > Use Get-Content's -ReadCount parameter, set it to 1 to send a line at a > time > through the pipeline but don't assign this to a variable before, instead > redirect the output to a file, e.g.: > > gc c:\largeFile.txt -read 1 | ? {<filters>} > c:\filteredFile.txt > > # don't assign the out to variable like this > $filterContent = gc c:\largeFile.txt -read 1 | ? {<filters>} Alternatively you can use the System.IO.File class: [io.file]::ReadAllLines("c:\largeFile.txt") It definitely is faster than get-content, it may also make better usage of memory. Jacques |
My System Specs![]() |
| | #5 (permalink) |
| | Re: Searching large text files Thanks for the tip. The [IO.File] Method does get the contents faster but I suppose the memory overflow issue remains because it would go through the pipeline as a big chunk. Get-Content's -ReadCount could be set to a higher value than 1 to get larger chunks of data -therefore faster- without overflowing the memory, unfortunately, the comparison operators (-like, -notlike, -match, -notmatch) don't work efficently then, many lines are skipped, missed or ignored. -- Kiron |
My System Specs![]() |
| | #6 (permalink) |
| | Re: Searching large text files "Kiron" <Kiron@discussions.microsoft.com> wrote in message news 1FF3ADA-7974-4805-BFFC-6E668C42DAA4@microsoft.com...> Thanks for the tip. The [IO.File] Method does get the contents faster but > I > suppose the memory overflow issue remains because it would go through the > pipeline as a big chunk. > Get-Content's -ReadCount could be set to a higher value than 1 to get > larger > chunks of data -therefore faster- without overflowing the memory, > unfortunately, the comparison operators > (-like, -notlike, -match, -notmatch) > don't work efficently then, many lines are skipped, missed or ignored. > Even though "get-content -readcount 1000" reads a 1000 lines at a time and sends them down the pipeline, the next stage of the pipeline still sees each individual line. So that should not impact operators like -like and -notlike. This would matter for -match *if* you needed to use singleline/multiline regex mode in which case you need all the contents as a single string. -- Keith |
My System Specs![]() |
| | #7 (permalink) |
| | Re: Searching large text files Thanks Keith. That's what I thought Where-Object would do --filter one object at a time-- but when the objects are sent through the pipeline from Get-Content with the -ReadCount parameter set to other than 0 or 1, lines are skipped. Try this, it's pretty simple, ten lines, but the Count varies instead of constantly being 10: @' a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi abcdefghij '@ > test.txt gc test.txt (gc test.txt -read 1 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 2 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 3 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 4 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 5 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 6 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 7 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 8 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 9 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 10 | ? {$_ -like '*a*'} | mo).count (gc test.txt -read 1 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 2 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 3 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 4 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 5 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 6 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 7 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 8 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 9 | ? {$_ -match 'a'} | mo).count (gc test.txt -read 10 | ? {$_ -match 'a'} | mo).count # delete when done ri test.txt-- Kiron |
My System Specs![]() |
| | #8 (permalink) |
| | Re: Searching large text files mo is an alias for Measure-Object, oops! Try this, it's pretty simple, ten lines, but the Count varies instead of constantly being 10: @' a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi abcdefghij '@ > test.txt gc test.txt (gc test.txt -read 1 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 2 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 3 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 4 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 5 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 6 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 7 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 8 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 9 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 10 | ? {$_ -like '*a*'} | Measure-Object).count (gc test.txt -read 1 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 2 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 3 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 4 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 5 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 6 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 7 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 8 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 9 | ? {$_ -match 'a'} | Measure-Object).count (gc test.txt -read 10 | ? {$_ -match 'a'} | Measure-Object).count # delete when done ri test.txt -- Kiron |
My System Specs![]() |
| | #9 (permalink) |
| | Re: Searching large text files Now try filtering each object with an If statement inside a Foreach-Object scriptblock. Count is constantly 10 as expected. Where-Object and Get-Content's -ReadCount <-gt 1> don't get along: @' a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi abcdefghij '@ > test.txt gc test.txt (gc test.txt -read 1 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 2 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 3 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 4 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 5 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 6 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 7 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 8 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 9 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 10 | % {if ($_ -like '*a*') {$_}} | Measure-Object).count (gc test.txt -read 1 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 2 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 3 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 4 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 5 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 6 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 7 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 8 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 9 | % {if ($_ -match 'a') {$_}} | Measure-Object).count (gc test.txt -read 10 | % {if ($_ -match 'a') {$_}} | Measure-Object).count # delete when done ri test.txt -- Kiron |
My System Specs![]() |
| | #10 (permalink) |
| | Re: Searching large text files "Kiron" <Kiron@discussions.microsoft.com> wrote in message news:O2H3piwrHHA.4364@TK2MSFTNGP04.phx.gbl... > Now try filtering each object with an If statement inside a Foreach-Object > scriptblock. Count is constantly 10 as expected. > Where-Object and Get-Content's -ReadCount <-gt 1> don't get along: > Yeah what I said wasn't quite right. Setting -readcount to something like 5 will read five lines and send that down the pipeline as two array objects each with 5 strings in it: 64> gc test.txt -read 5 | get-typename # get-typename from PSCX Object[] Object[] 65> gc test.txt -read 5 | %{$_} | get-typename String String String String String String String String String String "Typically" these arrays are dealt with in the same way as if you had sent the strings one at a time but not in all cases. In the for each loop above, it sends the array down the pipeline which shreds the array and sends the individual elements. In the case of -like, it will work on an array as well as a scalar: 66> gc test.txt -read 5 | where {$_ -like 'a*'} a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi abcdefghij or 68> (ql a ab abc abcd) -like "a*" # ql or quote-list from PSCX a ab abc abcd Many cmdlets will accept an array of input and then operate on each element individually. However in your case, what you are measuring with measure-object is the fact the Where-Object cmdlets just sends the "original" object (which is an array) on down the pipeline if the expression evaluates to true. Fortunately both -like and -match operate on arrays and return just the elements that match: 2> (ql ab ba cd af) -match '^a' ab af 3> (ql ab ba cd af) -like 'a*' ab af What I'm not seeing is get-content ballooning the memory requirements of PowerShell. I run the following command on a 77 MB text file: 84> measure-command { gc large.txt | ?{$_ -match 'dg\s*$'} } Days : 0 Hours : 0 Minutes : 2 Seconds : 18 Milliseconds : 162 Ticks : 1381622340 TotalDays : 0.00159909993055556 TotalHours : 0.0383783983333333 TotalMinutes : 2.3027039 TotalSeconds : 138.162234 TotalMilliseconds : 138162.234 and PowerShell never gets above ~53 MB of private memory. -- Keith |
My System Specs![]() |
![]() |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Forum | |||
| Searching message text | Live Mail | |||
| searching text within word documents | Vista General | |||
| Searching for content in text files with powershell | PowerShell | |||
| Help searching text within XLS files | Vista file management | |||
| Searching for specific target text | Vista General | |||