Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > VB Script

Vista - Request.Form + XML + UTF-8 problem

Reply
 
Old 11-19-2008   #1 (permalink)
nicolas.lethierry


 
 

Request.Form + XML + UTF-8 problem

Hi,

In ASP/vbScript I'm facing a problem of what seems to be double
encoded UTF-8 when I serialize an XML file with some content read from
the Request.Form collection. I attach a simple script showing this
behaviour. Here's what it does:

- There's a form with one field ("body"): let's type in the character
é (UTF-8 : é) and submit the form.
- This value is read with Request.Form("body") and is inserted as a
text node in a simple (UTF-8) XML document (serialized as A.xml)
- A similar XML document is created with the same content (é) hard-
coded in the script. Serialized as B.xml

Both XML document should be the identical, but when you open them in
an XML viewer (that interprets UTF-8), you get the following:

A.xml : <?xml version="1.0" encoding="utf-8" ?><doc>é</doc>
B.xml : <?xml version="1.0" encoding="utf-8" ?><doc>é</doc>

The byte value of the é character in A.xml is é and in B.xml : é
So it appears that the Request objet is doing something with the
encoding.

I'd be very happy if someone could explain this issue and help me
solve it?

Cheers,

Nicolas

<% @LANGUAGE = "VBScript" %>
<% Option Explicit %>
<% Response.Buffer = True %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://
www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
</head>
<body>
<%
Dim XMLDoc1
If Not(Request.Form("Submit") = Empty) Then
Set XMLDoc1 = Server.CreateObject("MSXML2.DOMDocument")
XMLDoc1.Async = False
XMLDoc1.LoadXML("<?xml version=""1.0"" encoding=""utf-8""?><doc>" &
Request.Form("body") & "</doc>")
XMLDoc1.Save(Server.MapPath("A.xml"))
XMLDoc1.LoadXML("<?xml version=""1.0"" encoding=""utf-8""?><doc>" &
"é" & "</doc>")
XMLDoc1.Save(Server.MapPath("B.xml"))
Set XMLDoc1 = Nothing
End If
%>
<form action="test.asp" method="post">
<fieldset>
<textarea name="body"></textarea>
<input type="submit" name="Submit" value="OK">
</fieldset>
</form>
</body>
</html>

My System SpecsSystem Spec
Old 11-20-2008   #2 (permalink)
nicolas.lethierry


 
 

Re: Request.Form + XML + UTF-8 problem

OK, I'll answer myself. Add:

Response.CodePage = 65001

Never heard of it before, but the line must be added to specify that
the strings within the intrinsic objects are to encoded as UTF-8.
Request.Form is URLEncoded, but when an item of the collection is
read, it is converted by default to ANSI. A UTF-8 character encoded
with 2 bytes such as é in my example was thus treated as 2 distincts
characters, subsequently encoded to UTF-8 when placed in the XML
document.
My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Data type mismatch in criteria expression (with Request.form) VB Script
Form Problem PowerShell
problem with creating form with check boxes Vista General
Hotfix Request Web Submission Form Vista General
Http request problem PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46