![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks. |
| |||||||
![]() |
| |
| | #1 (permalink) |
| XP | Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bugs For proper character support, I need UTF8 or Unicode support. Powershell provides this for the purposes of writing to text files via the Add-Content -Encoding UTF8 feature. Set-Content -Encoding UTF8 ....works great ...... but...... Add-Content -Encoding UTF8.....adds strange chars to the beginning of lines in the text file!! These extra chars look like little squares... They are not visible when I use Set-Content -Encoding UTF8, only when I use Add-Content -Encoding UTF8, so I am assuming that it is not the end of line chars. I am using MS Notepad to look at the *.txt file. Using Powershell 1 and XP. Can anyone explain this? How to get around this? Is it going to be fixed soon? |
My System Specs![]() |
| | #2 (permalink) |
| | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bugs On Jul 3, 5:10*pm, ioioio322 <gu...@xxxxxx-email.com> wrote: Quote: > For proper character support, I need UTF8 or Unicode support. > > Powershell provides this for the purposes of writing to text files via > the Add-Content -Encoding UTF8 feature. > > Set-Content -Encoding UTF8 ....works great > > ...... but...... > > Add-Content -Encoding UTF8.....adds strange chars to the beginning of > lines in the text file!! > > These extra chars look like little squares... *They are not visible > when I use Set-Content -Encoding UTF8, only when I use Add-Content > -Encoding UTF8, so I am assuming that it is not the end of line chars. > > I am using MS Notepad to look at the *.txt file. *Using Powershell 1 > and XP. > > Can anyone explain this? *How to get around this? *Is it going to be > fixed soon? > > -- > ioioio322 data to a file that was created as a Windows UTF-8 file? There's a well-known issue with Windows UTF-8 files where they include a BOM, where most Linux/Unix/Other utilities may not be expecting it. If you're adding to an existing file, and you're not sure if it has the BOM, then maybe it might be best to just read the old file, add your data, and then over-write the file. |
My System Specs![]() |
| | #3 (permalink) |
| XP | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bugs ^Sometimes I'm adding a line to an already existing file (that's what add-content is for). Other times i use it to construct a complex txt file 1 line at a time from scratch. Same effect. Only happens with Unicode or UTF-8 files....and these files were not made UNIX side. I haven't tried the txt file UNIX side. So notepad should be able to open it without special chars appearing (like empty squares at the beginning of lines). Has this bug been addressed in newer versions? Does MS know about it? |
My System Specs![]() |
| | #4 (permalink) |
| | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bu I understand how easy it is to point the finger at Microsoft however this time the bug is with whatever tool you are using in the *nix world. I suggest you bring this up with the tool vender you're having problems with. Reference this which explain why and when you need the BOM. http://www.unicode.org/faq/utf_bom.html There is a very good reason why Powershell and Notepad use the BOM you've named 3 is this ASCII is this UTF-8 is this UNICODE big-endian but there are more "ioioio322" wrote: Quote: > > ^Sometimes I'm adding a line to an already existing file (that's what > add-content is for). Other times i use it to construct a complex txt > file 1 line at a time from scratch. > Same effect. Only happens with Unicode or UTF-8 files....and these > files were not made UNIX side. > > I haven't tried the txt file UNIX side. So notepad should be able to > open it without special chars appearing (like empty squares at the > beginning of lines). > > Has this bug been addressed in newer versions? Does MS know about it? > > > -- > ioioio322 > |
My System Specs![]() |
| | #5 (permalink) |
| XP | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bugs ^This has nothing to do with nix. You misread my post. I said I am NOT using unix\linux... I specifically said what I was using. -Powershell 1 -MS Notepad -Win XP Pro |
My System Specs![]() |
| | #6 (permalink) |
| | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bu My applogies I mis-read the Unix part. I guess this was tojo that said some linux are not aware of this. Nevertheless what I said applies to whatever "tool" is reading the file. There is a section in this FAQ "How I should deal with BOM's which explains how you or whoever should handle this. The BOM is only written once regardless of # of times Add-Content is used and is absolutely required in order for any text processor to be able to interrupt the character encoding. Both Add-Content and Set-Content add a BOM unless the character encoding is ASCII. Look Set-Content -Encoding UTF8 -path sc.txt 'hello world' Get-Content -encoding byte sc.txt | format-hex ef bb bf 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a hello.world.. ^^^^^^ BOM = UTF-8 Add-Content -Path ac.txt -Encoding UTF8 "hello world" Add-Content -Path ac.txt -Encoding UTF8 "good bye" Get-Content -encoding byte ac.txt | format-hex ef bb bf 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a hello.world.. 67 6f 6f 64 20 62 79 65 0d 0a good.bye.. ## here GC is reading the BOM and interpreting the text according to the character encoding Get-Content ac.txt hello world good bye If what I said above has nothing to do with what you're asking repost and I promise to say away from this thread. "ioioio322" wrote: Quote: > > ^This has nothing to do with nix. You misread my post. I said I am > *NOT* using unix\linux... > > I specifically said what I was using. > -Powershell 1 > -MS Notepad > -Win XP Pro > > > -- > ioioio322 > |
My System Specs![]() |
| | #7 (permalink) |
| | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bu On Jul 4, 1:21*pm, Bob Landau <BobLan...@xxxxxx> wrote: Quote: > My applogies I mis-read the Unix part. I guess this was tojo that said some > linux are not aware of this. > > Nevertheless what I said applies to whatever "tool" is reading the file. * > There is a section in this FAQ "How I should deal with BOM's which explains > how you or whoever should handle this. > > The BOM is only written once regardless of # of times Add-Content is used > and is absolutely required in order for any text processor to be able to > interrupt the character encoding. Both Add-Content and Set-Content add a BOM > unless the character encoding is ASCII. > > Look > > *Set-Content -Encoding UTF8 -path sc.txt 'hello world' > *Get-Content -encoding byte sc.txt | format-hex > *ef bb bf 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a hello.world.. > ^^^^^^ > BOM = UTF-8 > > *Add-Content -Path ac.txt -Encoding UTF8 "hello world" > *Add-Content -Path ac.txt -Encoding UTF8 "good bye" > *Get-Content -encoding byte ac.txt | format-hex > *ef bb bf 68 65 6c 6c 6f 20 77 6f 72 6c 64 0d 0a hello.world.. > *67 6f 6f 64 20 62 79 65 0d 0a good.bye.. > > ## here GC is reading the BOM and interpreting the text according to the > character encoding > > Get-Content ac.txt * > hello world > good bye > > If what I said above has nothing to do with what you're asking repost andI > promise to say away from this thread. > > > > "ioioio322" wrote: > Quote: > > ^This has nothing to do with nix. *You misread my post. *I said I am > > *NOT* using unix\linux... Quote: > > I specifically said what I was using. > > -Powershell 1 > > -MS Notepad > > -Win XP Pro Quote: > > -- > > ioioio322 only be added to the beginning of the file, but the original post sounds like it's adding it to the beginning of each line, which would explain why Notepad can't figure out what to do with it. The Linux thing is a red herring in this situation, I think. BOM issues with UTF-8 are usually associated with Linux because it is almost never used (it interferes with the shebang for executable files). |
My System Specs![]() |
| | #8 (permalink) |
| XP | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bugs Unfortunately "format-hex is not recognized as a cmdlet" in my version of Powershell (v1)... but the empty square chars happened on EVERY text line....except... the first line of text. And Linux/Unix is not involved in any way. |
My System Specs![]() |
| | #9 (permalink) |
| | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bugs On Jul 4, 4:48*pm, ioioio322 <gu...@xxxxxx-email.com> wrote: Quote: > Unfortunately "format-hex is not recognized as a cmdlet" in my version > of Powershell (v1)... > > but the empty square chars happened on EVERY text line....except... the > first line of text. *And Linux/Unix is not involved in any way. > > -- > ioioio322 because notepad would correctly identify that as the BOM and not show it in the output. Are you going to submit that as a bug? Someone should. |
My System Specs![]() |
| | #10 (permalink) |
| | Re: Add-Content -Encoding UTF8 and -Encoding Unicode Powershell bu Unfortunately its not part of v2 either. Format-hex is work in progresss. I've been writing it because I've not found one that worked quite the way I wanted. I thought by showing the the command line and output in v2 would be clearer than putting it in words. If you are seeing the BOM being added to each line then I would also call this a bug. However given this has been fixed in v2 you may find it difficult to convince them resolve the bug as fixed in v1. A workaround would be either to use the string member StartsWith or a regex ^efbbbf to find and eliminate these. "ioioio322" wrote: Quote: > > Unfortunately "format-hex is not recognized as a cmdlet" in my version > of Powershell (v1)... > > but the empty square chars happened on EVERY text line....except... the > first line of text. And Linux/Unix is not involved in any way. > > > -- > ioioio322 > |
My System Specs![]() |
![]() |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Forum | |||
| Problem settting encoding with Set-Content | PowerShell | |||
| Microsoft PO3 Accounts & Unicode Encoding | Live Mail | |||
| get-content -encoding byte problem, v1 and v2 ctp3 | PowerShell | |||
| add-content -encoding unicode has strange outcome... | PowerShell | |||
| Re: Encoding HTML mail in Powershell | PowerShell | |||