![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
|
Welcome to Vista Forums we are your forum to discuss Windows Vista x64 and x86 systems. Whether you need help or just want to post an idea you have on Vista, this is the forum for you.
br> br> |
| |||||||
![]() |
| | Thread Tools | Display Modes |
| | #1 (permalink) |
| Guest | System.OutOfMemoryException After excellent help from Kiron (see my previous posting: Read and search through a binary file) i used the script to read and search throug a binary file. Using this script with a binary file of approximately 650 Mb, I received a 'System.OutOfMemoryException' (using approximately 2 Gb of memory). I know that Powershell v 1.0 isn't very friendly using memory. So i think that an option can be to read chunks of bytes. Is it possible to modify the script to read chunks of bytes, or is there an alternative. Thanks in advance, Robertico |
My System Specs![]() |
| | #2 (permalink) | ||||||||||||
| Guest | RE: System.OutOfMemoryException Have you tried using the -totalcount parameter on get-content ? -- Richard Siddaway All scripts are supplied "as is" and with no warranty PowerShell MVP Blog: http://richardsiddaway.spaces.live.com/ PowerShell User Group: http://www.get-psuguk.org.uk "Robertico" wrote:
| ||||||||||||
My System Specs![]() | |||||||||||||
| | #3 (permalink) |
| Guest | Re: System.OutOfMemoryException Hi Robertico, By default Get-Content reads 1 line --or in this case, 1 byte-- at a time, you can modify the Cmdlet's behavior through its -ReadCount parameter, but when more than 1 item --line or byte-- is read, Get-Content outputs an array of items --lines or bytes. In your case you would then have to unravel this array of bytes, format each byte as a hexadecimal number and join the hexadecimal bytes into a single string which is piped and processed further. In this example I'm passing a -ReadCount of 500KB but you should modify this to fit your memory and performance requirements, also be aware that when breaking the content into chunks you may affect the marker you're searching for. # for more on Get-Content's -ReadCount parameter man gc -p r* Try this: # v1 $file = <file's path> $pattern = '131B1B087C156108AE151B' $prevBytes = 8 $chunkSize = 500kb gc $file -en byte -r $chunkSize | % { $bytes = [string]::join('', ($_ | % {'{0:X2}' -f $_})) [regex]::matches($bytes, $pattern, 'ignoreCase') | % { $i = $_.index - $prevBytes * 2 [string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)]) | % { $hexBytes = $_ $byteArray = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($hexBytes.subString($_,2))"} [array]::reverse($byteArray) [bitConverter]::toString($byteArray) -replace '-' } } } -- Kiron |
My System Specs![]() |
| | #4 (permalink) |
| Guest | Re: System.OutOfMemoryException Kiron, The -ReadCount parameter works fine for the memory. But indeed, when the marker is on the border of two chunks, it passed over that marker. So, in my opinion i need to read a chunk of bytes (e.g 500kb), but the next chunk must include at least 7 bytes from the previous chunck. So i have to use the file offset. (position within the file), or something like that. Can Powershell handle this ? I didn't found such a thing in the documentation (or the books) Robertico "Kiron" <Kiron@xxxxxx> wrote in message news:362B6168-06C3-427C-BF0D-9FE4A1359C9C@xxxxxx Hi Robertico, By default Get-Content reads 1 line --or in this case, 1 byte-- at a time, you can modify the Cmdlet's behavior through its -ReadCount parameter, but when more than 1 item --line or byte-- is read, Get-Content outputs an array of items --lines or bytes. In your case you would then have to unravel this array of bytes, format each byte as a hexadecimal number and join the hexadecimal bytes into a single string which is piped and processed further. In this example I'm passing a -ReadCount of 500KB but you should modify this to fit your memory and performance requirements, also be aware that when breaking the content into chunks you may affect the marker you're searching for. # for more on Get-Content's -ReadCount parameter man gc -p r* Try this: # v1 $file = <file's path> $pattern = '131B1B087C156108AE151B' $prevBytes = 8 $chunkSize = 500kb gc $file -en byte -r $chunkSize | % { $bytes = [string]::join('', ($_ | % {'{0:X2}' -f $_})) [regex]::matches($bytes, $pattern, 'ignoreCase') | % { $i = $_.index - $prevBytes * 2 [string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)]) | % { $hexBytes = $_ $byteArray = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($hexBytes.subString($_,2))"} [array]::reverse($byteArray) [bitConverter]::toString($byteArray) -replace '-' } } } -- Kiron |
My System Specs![]() |
| | #5 (permalink) |
| Guest | Re: System.OutOfMemoryException Robertico, Your idea of attaching the last bytes from the previous chunk should work, but I think this byte array's length should be equal to the amount of bytes in the pattern plus one less the amount of bytes before the pattern you want to retrieve, also, the chunk size should _always_ be -gt this array's length. In this script $extraBytes contains the amount of bytes to retrieve from the previous chunk and $rimBytes contains the byte array retrieved, which is $null in the first loop; note the negative indexing used to retrieved the last elements of the array. Then, $rimBytes is added to the byte chunk piped from Get-Content and all bytes are joined. Finally a new $rimBytes is created from the current chunk to be used in the next loop. The rest is unchanged. Also, play around with the chunk size to find the optimum amount. Try this: # v1 $file = <file's path> $pattern = '131B1B087C156108AE151B' $prevBytes = 8 $extraBytes = $pattern.length / 2 + $prevBytes - 1 $chunkSize = 500kb gc $file -en byte -r $chunkSize | % { $bytes = [string]::join('', ($rimBytes + ($_ | % {'{0:X2}' -f $_}))) $rimBytes = $bytes[(-$extraBytes * 2)..-1] [regex]::matches($bytes, $pattern, 'ignoreCase') | % { $i = $_.index - $prevBytes * 2 [string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)]) | % { $hexBytes = $_ $byteArray = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($hexBytes.subString($_,2))"} [array]::reverse($byteArray) [bitConverter]::toString($byteArray) -replace '-' } } } -- Kiron |
My System Specs![]() |
| | #6 (permalink) |
| Guest | Re: System.OutOfMemoryException Works fine now...but veeeeery....slow :-)) Hope Microsoft will work on performance for next versions of Powershell !! Robertico "Kiron" <Kiron@xxxxxx> wrote in message news 2A62FE7-B22B-4855-BCF3-348C07BA18AA@xxxxxxRobertico, Your idea of attaching the last bytes from the previous chunk should work, but I think this byte array's length should be equal to the amount of bytes in the pattern plus one less the amount of bytes before the pattern you want to retrieve, also, the chunk size should _always_ be -gt this array's length. In this script $extraBytes contains the amount of bytes to retrieve from the previous chunk and $rimBytes contains the byte array retrieved, which is $null in the first loop; note the negative indexing used to retrieved the last elements of the array. Then, $rimBytes is added to the byte chunk piped from Get-Content and all bytes are joined. Finally a new $rimBytes is created from the current chunk to be used in the next loop. The rest is unchanged. Also, play around with the chunk size to find the optimum amount. Try this: # v1 $file = <file's path> $pattern = '131B1B087C156108AE151B' $prevBytes = 8 $extraBytes = $pattern.length / 2 + $prevBytes - 1 $chunkSize = 500kb gc $file -en byte -r $chunkSize | % { $bytes = [string]::join('', ($rimBytes + ($_ | % {'{0:X2}' -f $_}))) $rimBytes = $bytes[(-$extraBytes * 2)..-1] [regex]::matches($bytes, $pattern, 'ignoreCase') | % { $i = $_.index - $prevBytes * 2 [string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)]) | % { $hexBytes = $_ $byteArray = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($hexBytes.subString($_,2))"} [array]::reverse($byteArray) [bitConverter]::toString($byteArray) -replace '-' } } } -- Kiron |
My System Specs![]() |
| | #7 (permalink) |
| Guest | Re: System.OutOfMemoryException Kiron, With a small file the script ends but with a large file (e.g. 650 Mb) the script seems not to end. (not even after several hours !!) Is it possible to monitor te progress of this script, or are there debugging options to find out whats wrong. Thanks in advance, Robertico "Kiron" <Kiron@xxxxxx> wrote in message news 2A62FE7-B22B-4855-BCF3-348C07BA18AA@xxxxxxRobertico, Your idea of attaching the last bytes from the previous chunk should work, but I think this byte array's length should be equal to the amount of bytes in the pattern plus one less the amount of bytes before the pattern you want to retrieve, also, the chunk size should _always_ be -gt this array's length. In this script $extraBytes contains the amount of bytes to retrieve from the previous chunk and $rimBytes contains the byte array retrieved, which is $null in the first loop; note the negative indexing used to retrieved the last elements of the array. Then, $rimBytes is added to the byte chunk piped from Get-Content and all bytes are joined. Finally a new $rimBytes is created from the current chunk to be used in the next loop. The rest is unchanged. Also, play around with the chunk size to find the optimum amount. Try this: # v1 $file = <file's path> $pattern = '131B1B087C156108AE151B' $prevBytes = 8 $extraBytes = $pattern.length / 2 + $prevBytes - 1 $chunkSize = 500kb gc $file -en byte -r $chunkSize | % { $bytes = [string]::join('', ($rimBytes + ($_ | % {'{0:X2}' -f $_}))) $rimBytes = $bytes[(-$extraBytes * 2)..-1] [regex]::matches($bytes, $pattern, 'ignoreCase') | % { $i = $_.index - $prevBytes * 2 [string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)]) | % { $hexBytes = $_ $byteArray = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($hexBytes.subString($_,2))"} [array]::reverse($byteArray) [bitConverter]::toString($byteArray) -replace '-' } } } -- Kiron |
My System Specs![]() |
| | #8 (permalink) |
| Guest | Re: System.OutOfMemoryException Yes, it's very slow. The amount of bytes is extreme --if I had a cent for each one.... I tested other methods of reading blocks of bytes, formatting each byte in hexadecimal notation and creating a joined string from them. In my system the [IO.Text]::Read() Method is the fastest and the Get-Content & [String]::Join() Method is next in line, it will definitely improve the script's performance but it _will_ take a while to process 650mb. Here are two functions to demonstrate and evaluate the performance of five different methods and to get an estimate in hours of the execution time of the two fastest methods. Both functions have the block size set at 100kb as default, but you can pass any size -gt than 1kb. These are the test results I get for a 650mb file broken up in blocks of 500kb and 1mb, quite a while! # Test-ExecutionTime HugeFile.txt 500kb File size: 681,574,400 bytes Approximate execution time: Get-Content & [String]::Join() -------> 19.66 hours [IO.File]::Read() & [String]::Join() -> 16.75 hours Block size: 512,000 bytes Total blocks 1,332 # Test-ExecutionTime HugeFile.txt 1mb File size: 681,574,400 bytes Approximate execution time: Get-Content & [String]::Join() -------> 19.60 hours [IO.File]::Read() & [String]::Join() -> 16.68 hours Block size: 1,048,576 bytes Total blocks 650 # call them like this: Test-Methods HugeFile.txt 100kb Test-ExecutionTime HugeFile.txt 250kb function Test-Methods ( [string]$path = $(throw '-path can''t be $null'), [long]$block = 100kb ) { $path = $(if (test-path $path) { (rvpa $path).path} else { throw "Cannot find $path"}) if ($block -lt 1kb) { throw '-block must be at least 1kb' } $strSize = $block * 2 if ($strSize -ge 2gb) {throw 'Block size is too large'} $fs = [io.file]::OpenRead($path) # watch out for word wrapping $items = 'Get-Content & [String]::Join()', $((measure-command {$str1 = [string]::join('', (gc $path -en byte -t $block | % {'{0:X2}' -f $_}))}).totalSeconds), '[IO.File]::ReadByte(), While loop & [String]::Join()', $((measure-command {$c =0; $str2 = [string]::join('',$(while ($c -lt $block) {$fs.ReadByte() | % {'{0:X2}' -f $_}; $c++}))}).totalSeconds), '[IO.File]::Read() & [String]::Join()', $($fs.position = 0; $buffer1 = new-object byte[] $block; (measure-command {$fs.read($buffer1, $pos, $block); $str3 = [string]::join('',($buffer1 | % {'{0:X2}' -f $_}))}).totalSeconds), 'Get-Content & [Text.StringBuilder]::Append()', $($str4 = new-object text.stringBuilder; (measure-command {gc $path -en byte -t $block | % {[void]$str4.Append('{0:X2}' -f $_)}}).totalSeconds), '[IO.File]::Read() & [Text.StringBuilder]::Append()', $($fs.position = 0; $str5 = new-object text.stringBuilder; $buffer2 = new-object byte[] $block; (measure-command {$fs.read($buffer2, $pos, $block); $buffer2 | % {[void]$str5.Append('{0:X2}' -f $_)}}).totalSeconds) &{ $x = 0 0..4 | % {new-object psObject | add-member 8 Methods $items[$x++] -p | add-member 8 TotalSeconds $items[$x++] -p } } | sort TotalSeconds | ft -a [void]$fs.close [void]$fs.dispose # verify all $str are same &{ $ofs = '' "$($str1[-74..-1])`n$($str2[-74..-1])`n$($str3[-74..-1])`n$($str4.toString()[-74..-1])`n$($str5.toString()[-74..-1])" "$($str1[0..73])`n$($str2[0..73])`n$($str3[0..73])`n$($str4.toString()[0..73])`n$($str5.toString()[0..73])" } } function Test-ExecutionTime ( [string]$path = $(throw '-path can''t be $null'), [long]$block = 100kb ) { $path = $(if (test-path $path) { (rvpa $path).path} else { throw "Cannot find $path"}) if ($block -lt 1kb) { throw '-block must be at least 1kb' } $fs = [io.file]::OpenRead($path) $size = 650mb # (gi $path).length "File size: {0:n0} bytes Approximate execution time: Get-Content & [String]::Join() -------> {1:n2} hours [IO.File]::Read() & [String]::Join() -> {2:n2} hours Block size: {3:n0} bytes Total blocks {4:n0}" -f $size,( (measure-command {$str = [string]::join('', (gc $path -en byte -t $block | % {'{0:X2}' -f $_}))} ).totalHours * $size / $block), $($buffer1 = new-object byte[] $block (measure-command {$fs.read($buffer1, $pos, $block) $str3 = [string]::join('',($buffer1 | % {'{0:X2}' -f $_}))} ).totalHours * $size / $block), $block,[math]::ceiling($size / $block) [void]$fs.close [void]$fs.dispose } # I'll post the modified version of the enhanced script in a bit. -- Kiron |
My System Specs![]() |
| | #9 (permalink) |
| Guest | Re: System.OutOfMemoryException Here's the modified script, I have tested it some, but not thoroughly. Note the Write-Progress statement, it display a progress bar to indicate the script's status. $file = <file's path> $fs = [io.file]::OpenRead($file) $pattern = '131B1B087C156108AE151B' $prevBytes = 8 $extraBytes = $pattern.length / 2 + $prevBytes - 1 $chunkSize = 500kb if ($fs.length -lt $chunkSize) {$chunkSize = $fs.length} $total = [math]::ceiling($fs.length / $chunkSize) $buffer = new-object byte[] $chunkSize $block = 1 while ($block -le $total) { write-progress Processing "Block $block of $total" -pe ($block/$total*100) [void]$fs.read($buffer, 0, $chunkSize) $bytes = [string]::join('', ($rimBytes + ($buffer | % {'{0:X2}' -f $_}))) $chars = $bytes[(-$extraBytes * 2)..-1] $rimBytes = 0..($chars.Length - 1) | ? {!($_ -band 1)} | % {[string]::join('', $chars[$_++..$_])} [regex]::matches($bytes, $pattern, 'ignoreCase') | % {$i = $_.index - $prevBytes * 2 [string]::join('', $bytes[$i..($i + $prevBytes * 2 - 1)]) | % {$hexBytes = $_ $byteArray = 0..($_.length - 1) | ? {!($_ -band 1)} | % {"0x$($hexBytes.subString($_,2))"} [array]::reverse($byteArray) [bitConverter]::toString($byteArray) -replace '-' } } $block++ } [void]$fs.close [void]$fs.dispose -- Kiron |
My System Specs![]() |
| | #10 (permalink) |
| Guest | Re: System.OutOfMemoryException I forgot to remove a testing value in function Test-ExecutionTime # the line: $size = 650mb # (gi $path).length # , should be: $size = $fs.length -- Kiron |
My System Specs![]() |
![]() |
| Thread Tools | |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Re: Newbie Get-Content+OutOfMemoryException Question | Lee Holmes [MSFT] | PowerShell | 0 | 03-01-2007 11:18 AM |
| RE: Newbie Get-Content+OutOfMemoryException Question | RichS | PowerShell | 2 | 02-24-2007 07:58 AM |
| RE: Newbie Get-Content+OutOfMemoryException Question | RichS | PowerShell | 0 | 02-24-2007 02:21 AM |
| Vista Blocking Shared System Drive when in WinXP (Dual Boot System) | ThommyG | Vista security | 5 | 07-22-2006 08:02 AM |
| Validation needed: Tabular output throws OutofMemoryException errors with Out-File -Width 0x7fffffff | Alex K. Angelopoulos [MVP] | PowerShell | 5 | 06-28-2006 06:49 AM |