Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > PowerShell

Vista - Select-String problem with not English ASCII text

Reply
 
Old 01-16-2007   #1 (permalink)
Roman Kuzmin


 
 

Select-String problem with not English ASCII text

Perhaps, I have got a disappointing problem: if I use Select-String with
ASCII text files with not English text then not English text is disappeared
(replaced with spaces?) in the result strings.

In my case text files contain Russian text. Other console applications
(including .NET) output Russian text from those files just fine. Well,
PowerShell Get-Content works fine, too.

So, is it an issue of Select-String, or am I missing something?

--
Thanks,
Roman

My System SpecsSystem Spec
Old 01-16-2007   #2 (permalink)
Andrew Watt [MVP]


 
 

Re: Select-String problem with not English ASCII text

Roman,

The following is my understanding but I'm no expert in this stuff.

ASCII covers up to U+0255, I think.

Russian characters begin at U+0402 or thereabouts.

So I don't see how an "ASCII" file can contain Cyrillic characters.

So, is it possible that you "lost" the Cyrillic characters (they were
replaced by spaces or some other character) when the ASCII files were
created? i.e. before you used select-string on the files?

Or are the files not ASCII files?

Andrew Watt MVP


On Tue, 16 Jan 2007 07:34:00 -0800, Roman Kuzmin
<RomanKuzmin@discussions.microsoft.com> wrote:

>Perhaps, I have got a disappointing problem: if I use Select-String with
>ASCII text files with not English text then not English text is disappeared
>(replaced with spaces?) in the result strings.
>
>In my case text files contain Russian text. Other console applications
>(including .NET) output Russian text from those files just fine. Well,
>PowerShell Get-Content works fine, too.
>
>So, is it an issue of Select-String, or am I missing something?
>
>--
>Thanks,
>Roman

My System SpecsSystem Spec
Old 01-16-2007   #3 (permalink)
Roman Kuzmin


 
 

Re: Select-String problem with not English ASCII text

> So I don't see how an "ASCII" file can contain Cyrillic characters.
Don't you think that Russian text exists only as Unicode? Of course, it is
not true (thanks, God). Well, just believe me, Russian text exists in ASCII
files and Get-Content perfectly proves it. Unfortunately, I guess,
Select-String is not so accurate as Get-Content.

--
Thanks,
Roman


My System SpecsSystem Spec
Old 01-16-2007   #4 (permalink)
Lee Holmes [MSFT]


 
 

Re: Select-String problem with not English ASCII text

Roman -- does

Get-Content cyrillic.txt | select-string <whatever>

work?

I assume the problem you're running into is:

Select-String <whatever> cyrillic.txt

--
Lee Holmes [MSFT]
Windows PowerShell Development
Microsoft Corporation
This posting is provided "AS IS" with no warranties, and confers no rights.


"Roman Kuzmin" <z@z.z> wrote in message
news:us5C6vZOHHA.780@TK2MSFTNGP03.phx.gbl...
>> So I don't see how an "ASCII" file can contain Cyrillic characters.

> Don't you think that Russian text exists only as Unicode? Of course, it is
> not true (thanks, God). Well, just believe me, Russian text exists in
> ASCII files and Get-Content perfectly proves it. Unfortunately, I guess,
> Select-String is not so accurate as Get-Content.
>
> --
> Thanks,
> Roman
>
>



My System SpecsSystem Spec
Old 01-16-2007   #5 (permalink)
Marcel J. Ortiz [MSFT]


 
 

Re: Select-String problem with not English ASCII text

Hi Roman,

Any chance you could give us the commands you are running and if its not too
large the ASCII file?

Thanks
Marcel

"Roman Kuzmin" <z@z.z> wrote in message
news:us5C6vZOHHA.780@TK2MSFTNGP03.phx.gbl...
>> So I don't see how an "ASCII" file can contain Cyrillic characters.

> Don't you think that Russian text exists only as Unicode? Of course, it is
> not true (thanks, God). Well, just believe me, Russian text exists in
> ASCII files and Get-Content perfectly proves it. Unfortunately, I guess,
> Select-String is not so accurate as Get-Content.
>
> --
> Thanks,
> Roman
>
>



My System SpecsSystem Spec
Old 01-16-2007   #6 (permalink)
Roman Kuzmin


 
 

Re: Select-String problem with not English ASCII text

> Any chance you could give us the commands you are running and if its not
> too
> large the ASCII file?


Get-Content cyrillic.txt | select-string far
- works like a charm, but result is just strings, not what I want.

Select-String far cyrillic.txt
- does not work fine: i.e. cyrillic text in strings is mangled

Please, find attached cyrillic.txt

--
Thanks,
Roman




My System SpecsSystem Spec
Old 01-16-2007   #7 (permalink)
Roman Kuzmin


 
 

Re: Select-String problem with not English ASCII text

Just an observation:

[System.IO.File]::ReadAllLines('cyrillic.txt')
- has the same problems as: Select-String far cyrillic.txt

[System.IO.File]::ReadAllLines('cyrillic.txt',
[System.Text.Encoding]:efault)
- works fine, cyrillic text is as it should be

--
Thanks,
Roman


My System SpecsSystem Spec
Old 01-16-2007   #8 (permalink)
Keith Hill [MVP]


 
 

Re: Select-String problem with not English ASCII text

"Roman Kuzmin" <z@z.z> wrote in message
news:us5C6vZOHHA.780@TK2MSFTNGP03.phx.gbl...
>> So I don't see how an "ASCII" file can contain Cyrillic characters.

> Don't you think that Russian text exists only as Unicode? Of course, it is
> not true (thanks, God). Well, just believe me, Russian text exists in
> ASCII files and Get-Content perfectly proves it. Unfortunately, I guess,
> Select-String is not so accurate as Get-Content.


ASCII really only defines the code points from 0 to 127 (decimal) or U+0000
to U+007F. For single byte character sets, Windows uses ANSI code pages to
define the code points between U+0080 and U+00FF. Localized versions of
Windows would most likely use a code page that defines the native code
points for the locale in the upper half of the range of code points in a
byte.

By the time Select-String sees the text it has been converted to Unicode
because .NET strings are strictly Unicode. It does seem like a limitation
that Select-String's Path oriented parameter set doesn't let you indicate
the encoding to use when reading the file. However if the file is either
Unicode or UTF-8, it should just work. If you open the text file in
Notepad, go to save as, what does the "Encoding" drop down say? If it is
ANSI, try changing to UTF-8, save the file out and do Select-String on the
UTF-8 encoded file.

--
Keith


My System SpecsSystem Spec
Old 01-16-2007   #9 (permalink)
Roman Kuzmin


 
 

Re: Select-String problem with not English ASCII text

Keith,

Thank you for your nice explanation, it looks like I did not use term ASCII
precisely. Meantime it looks like Select-String really has lack of encoding
flexibility and moreover its default behavior is rather disappointing
(Get-Content's default behavior is just fine).

--
Thanks,
Roman


My System SpecsSystem Spec
Old 01-16-2007   #10 (permalink)
Keith Hill [MVP]


 
 

Re: Select-String problem with not English ASCII text

"Roman Kuzmin" <z@z.z> wrote in message
news:OL4zd8aOHHA.1248@TK2MSFTNGP02.phx.gbl...
> Keith,
>
> Thank you for your nice explanation, it looks like I did not use term
> ASCII precisely. Meantime it looks like Select-String really has lack of
> encoding flexibility and moreover its default behavior is rather
> disappointing (Get-Content's default behavior is just fine).


Roman, just out of curiousity, if you save the file out from notepad using
Unicode (or UTF-8) does Select-String work on that file? I would expect
that to work.

--
Keith


My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Concatenate text string and text in variable with no space between PowerShell
Need full line of text from select-string PowerShell
problems with $var | select-string -pattern $string -q PowerShell
Replacing text in ascii file PowerShell
Select-String problem PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46