Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > PowerShell

Vista - Search and Replace text file very slow

Reply
 
Old 05-27-2009   #1 (permalink)
jerschmidt14


 
 

Search and Replace text file very slow

Hello,
I have text files approx 16-20mb in size. They are flat files with
approx 18-20 thousand records each. Every night I have to search them for
invalid ascii characters, and replace them with spaces. Call it a filter if
you will. What I wrote was

(gc filein.txt) -replace "^[\u0020-\u007F]"," " | sc fileout.txt

Although this seems to do the job, it runs VERY SLOW!

I have tried adjust the -read by adding

(gc filein.txt -read 5000 -replace "^[\u0020-\u007F]"," " | sc fileout.txt

This runs much faster. However, it then seems to miss the line endings.

Thanks in advance,

Jeremy


My System SpecsSystem Spec
Old 05-27-2009   #2 (permalink)
Robert Robelo


 
 

Re: Search and Replace text file very slow

Wrapping the Get-Content statement in an Expression ( ) and passing the [String[]] to -replace _is_ a good technique, specially when you want to overwrite the file being read, but it is _not_ recommended for huge files because it hogs up lots of RAM and takes forever if the shell doesn't crash.

Since you're writing the output to a different file, pipe each String to ForEach-Object, do the replacement and pipe the new String to Set-Content

gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt

--
Robert
My System SpecsSystem Spec
Old 06-02-2009   #3 (permalink)
jerschmidt14


 
 

Re: Search and Replace text file very slow

Hi Robert,
Thanks for the reply. I did as you mentioned, but still note powershell
to be very slow. If I write the code in perl, it only takes approx 2 seconds
to run. The powershell example takes 10-20 seconds to run (approx 10 times
as long). I imagine this is because powershell is processing this on a "line
by line" basis", whereas perl I can redirect STDIN, do a TC, then print out.
Would there be any way to process the whole file in a few iterations making
using of the -read parameter? How about switching the gc mode to binary?

Jeremy

In Perl




"Robert Robelo" wrote:
Quote:

> Wrapping the Get-Content statement in an Expression ( ) and passing the
> [String[]] to -replace _is_ a good technique, specially when you want to
> overwrite the file being read, but it is _not_ recommended for huge files
> because it hogs up lots of RAM and takes forever if the shell doesn't
> crash.
>
> Since you're writing the output to a different file, pipe each String to
> ForEach-Object, do the replacement and pipe the new String to Set-Content
>
> gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt
>
> --
> Robert
>
My System SpecsSystem Spec
Old 06-02-2009   #4 (permalink)
jerschmidt14


 
 

Re: Search and Replace text file very slow

Oops, forgot to attach the perl

binmode STDIN;
binmode STDOUT;

while(<STDIN>) {
tr/\040-\176\012\015/ /c;
print $_;
}

"jerschmidt14" wrote:
Quote:

> Hi Robert,
> Thanks for the reply. I did as you mentioned, but still note powershell
> to be very slow. If I write the code in perl, it only takes approx 2 seconds
> to run. The powershell example takes 10-20 seconds to run (approx 10 times
> as long). I imagine this is because powershell is processing this on a "line
> by line" basis", whereas perl I can redirect STDIN, do a TC, then print out.
> Would there be any way to process the whole file in a few iterations making
> using of the -read parameter? How about switching the gc mode to binary?
>
> Jeremy
>
> In Perl
>
>
>
>
> "Robert Robelo" wrote:
>
Quote:

> > Wrapping the Get-Content statement in an Expression ( ) and passing the
> > [String[]] to -replace _is_ a good technique, specially when you want to
> > overwrite the file being read, but it is _not_ recommended for huge files
> > because it hogs up lots of RAM and takes forever if the shell doesn't
> > crash.
> >
> > Since you're writing the output to a different file, pipe each String to
> > ForEach-Object, do the replacement and pipe the new String to Set-Content
> >
> > gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt
> >
> > --
> > Robert
> >
My System SpecsSystem Spec
Old 06-02-2009   #5 (permalink)
tojo2000


 
 

Re: Search and Replace text file very slow

On Jun 2, 10:18*am, jerschmidt14
<jerschmid...@xxxxxx> wrote:
Quote:

> Hi Robert,
> * * Thanks for the reply. *I did as you mentioned, but still note powershell
> to be very slow. *If I write the code in perl, it only takes approx 2 seconds
> to run. *The powershell example takes 10-20 seconds to run (approx 10 times
> as long). *I imagine this is because powershell is processing this on a"line
> by line" basis", whereas perl I can redirect STDIN, do a TC, then print out. *
> Would there be any way to process the whole file in a few iterations making
> using of the -read parameter? *How about switching the gc mode to binary? *
>
> Jeremy
>
> In Perl
>
>
>
> "Robert Robelo" wrote:
Quote:

> > Wrapping the Get-Content statement in an Expression ( ) and passing the
> > [String[]] to -replace _is_ a good technique, specially when you want to
> > overwrite the file being read, but it is _not_ recommended for huge files
> > because it hogs up lots of RAM and takes forever if the shell doesn't
> > crash.
>
Quote:

> > Since you're writing the output to a different file, pipe each String to
> > ForEach-Object, do the replacement and pipe the new String to Set-Content
>
Quote:

> > gc filein.txt | % {$_ -replace "^[\u0020-\u007F]"," "} | sc fileout.txt
>
Quote:

> > --
> > Robert
You might want to check out System.IO.StreamReader:
http://msdn.microsoft.com/en-us/libr...amreader..aspx

Also, you could try compiling a regex ahead of time and then using the
Replace() method. It's possible that using -replace might be
compiling the regex each time.
My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Replace text in multiple text files VB Script
search and replace in binary file VB Script
How to search and replace text in a string PowerShell
Search for Text in UTF-8 File Vista General
Search and replace in a text file? PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46