![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks. |
| |||||||
![]() |
| |
| | #1 (permalink) |
| | Search and Replace text file very slow Hello, I have text files approx 16-20mb in size. They are flat files with approx 18-20 thousand records each. Every night I have to search them for invalid ascii characters, and replace them with spaces. Call it a filter if you will. What I wrote was (gc filein.txt) -replace "^[\u0020-\u007F]"," " | sc fileout.txt Although this seems to do the job, it runs VERY SLOW! I have tried adjust the -read by adding (gc filein.txt -read 5000 -replace "^[\u0020-\u007F]"," " | sc fileout.txt This runs much faster. However, it then seems to miss the line endings. Thanks in advance, Jeremy |
My System Specs![]() |
| | #2 (permalink) |
| | Re: Search and Replace text file very slow Wrapping the Get-Content statement in an Expression ( ) and passing the [String[]] to -replace _is_ a good technique, specially when you want to overwrite the file being read, but it is _not_ recommended for huge files because it hogs up lots of RAM and takes forever if the shell doesn't crash. Since you're writing the output to a different file, pipe each String to ForEach-Object, do the replacement and pipe the new String to Set-Content gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt -- Robert |
My System Specs![]() |
| | #3 (permalink) |
| | Re: Search and Replace text file very slow Hi Robert, Thanks for the reply. I did as you mentioned, but still note powershell to be very slow. If I write the code in perl, it only takes approx 2 seconds to run. The powershell example takes 10-20 seconds to run (approx 10 times as long). I imagine this is because powershell is processing this on a "line by line" basis", whereas perl I can redirect STDIN, do a TC, then print out. Would there be any way to process the whole file in a few iterations making using of the -read parameter? How about switching the gc mode to binary? Jeremy In Perl "Robert Robelo" wrote: Quote: > Wrapping the Get-Content statement in an Expression ( ) and passing the > [String[]] to -replace _is_ a good technique, specially when you want to > overwrite the file being read, but it is _not_ recommended for huge files > because it hogs up lots of RAM and takes forever if the shell doesn't > crash. > > Since you're writing the output to a different file, pipe each String to > ForEach-Object, do the replacement and pipe the new String to Set-Content > > gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt > > -- > Robert > |
My System Specs![]() |
| | #4 (permalink) |
| | Re: Search and Replace text file very slow Oops, forgot to attach the perl binmode STDIN; binmode STDOUT; while(<STDIN>) { tr/\040-\176\012\015/ /c; print $_; } "jerschmidt14" wrote: Quote: > Hi Robert, > Thanks for the reply. I did as you mentioned, but still note powershell > to be very slow. If I write the code in perl, it only takes approx 2 seconds > to run. The powershell example takes 10-20 seconds to run (approx 10 times > as long). I imagine this is because powershell is processing this on a "line > by line" basis", whereas perl I can redirect STDIN, do a TC, then print out. > Would there be any way to process the whole file in a few iterations making > using of the -read parameter? How about switching the gc mode to binary? > > Jeremy > > In Perl > > > > > "Robert Robelo" wrote: > Quote: > > Wrapping the Get-Content statement in an Expression ( ) and passing the > > [String[]] to -replace _is_ a good technique, specially when you want to > > overwrite the file being read, but it is _not_ recommended for huge files > > because it hogs up lots of RAM and takes forever if the shell doesn't > > crash. > > > > Since you're writing the output to a different file, pipe each String to > > ForEach-Object, do the replacement and pipe the new String to Set-Content > > > > gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt > > > > -- > > Robert > > |
My System Specs![]() |
| | #5 (permalink) |
| | Re: Search and Replace text file very slow On Jun 2, 10:18*am, jerschmidt14 <jerschmid...@xxxxxx> wrote: Quote: > Hi Robert, > * * Thanks for the reply. *I did as you mentioned, but still note powershell > to be very slow. *If I write the code in perl, it only takes approx 2 seconds > to run. *The powershell example takes 10-20 seconds to run (approx 10 times > as long). *I imagine this is because powershell is processing this on a"line > by line" basis", whereas perl I can redirect STDIN, do a TC, then print out. * > Would there be any way to process the whole file in a few iterations making > using of the -read parameter? *How about switching the gc mode to binary? * > > Jeremy > > In Perl > > > > "Robert Robelo" wrote: Quote: > > Wrapping the Get-Content statement in an Expression ( ) and passing the > > [String[]] to -replace _is_ a good technique, specially when you want to > > overwrite the file being read, but it is _not_ recommended for huge files > > because it hogs up lots of RAM and takes forever if the shell doesn't > > crash. Quote: > > Since you're writing the output to a different file, pipe each String to > > ForEach-Object, do the replacement and pipe the new String to Set-Content Quote: > > gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt Quote: > > -- > > Robert http://msdn.microsoft.com/en-us/libr...amreader..aspx Also, you could try compiling a regex ahead of time and then using the Replace() method. It's possible that using -replace might be compiling the regex each time. |
My System Specs![]() |
![]() |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Forum | |||
| Replace text in multiple text files | VB Script | |||
| search and replace in binary file | VB Script | |||
| How to search and replace text in a string | PowerShell | |||
| Search for Text in UTF-8 File | Vista General | |||
| Search and replace in a text file? | PowerShell | |||