Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > VB Script

Vista - Fetch special characters like "Ñ" and absolute URL from href attribute of anchors

Reply
 
Old 08-29-2008   #1 (permalink)
jason


 
 

Fetch special characters like "Ñ" and absolute URL from href attribute of anchors

Hi there

I used the following codes to fetch the source codes from the web page
(assigned to url2 in the following codes) but got two painful problems.

1. odd characters or character missed, e.g. the name "ALBARIÑO ORGANISTRUN"
become "ALBARI? ORGANISTRUN" if display in Notepad++ or "ALBARI ORGANISTRUN"
if display in Notepad. waht I want is "ALBARIÑO ORGANISTRUN".

2. relative url, e.g. the href value of the anchor "Blancos" is
"prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&strSearchCriteria="
, But I really want is its absolute address like
"http://www.elcatavinos.com/tienda/prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&strSearchCriteria="

Can any one help me sort them out?

Thanks in advance!

Jason

__________________________________________________________
codes I used:

dim url1
dim url2
dim xmlhttp
dim datafile
dim FS
dim dataFileTs
dim i
dim cookie

url1 = "http://www.elcatavinos.com/tienda/store/dynamicIndex.asp?sm=b1"
url2 =
"http://www.elcatavinos.com/tienda/product.asp?numRecordPosition=5&P_ID=25473&strPageHistory=cat&strKeywords=&SearchFor=&PT_ID=107"
datafile = "c:\temp\test.dat"

set FS = Wscript.CreateObject("Scripting.FileSystemObject")
set datafileTs = FS.CreateTextFile(datafile, True, True)
set xmlHTTP = Wscript.CreateObject("MSXML2.XMLHTTP.3.0")
xmlHTTP.Open "HEAD",url1, false
xmlHTTP.Send
i = 0
do until xmlHTTP.readyState = 4
Wscript.Sleep 100
i = i + 1
if i > 1000 then exit do
Loop
cookie = xmlhttp.getResponseHeader("set-cookie")

xmlHTTP.Open "GET",url2, false
xmlHTTP.SetRequestHeader "set-cookie",cookie
xmlHTTP.SetRequestHeader "Content-Type","text/html; charset=iso-8859-1"
xmlHTTP.SetRequestHeader "Content-Location","absoluteURI"
xmlHTTP.Send
i = 0
do until xmlHTTP.readyState = 4
Wscript.Sleep 100
i = i + 1
if i > 1000 then exit do
Loop
datafileTs.Writeline xmlhttp.responseText



My System SpecsSystem Spec
Old 08-29-2008   #2 (permalink)
Anthony Jones


 
 

Re: Fetch special characters like "Ñ" and absolute URL from href attribute of anchors


"jason" <atechmark@xxxxxx> wrote in message
news:%23G427DZCJHA.1228@xxxxxx
Quote:

> Hi there
>
> I used the following codes to fetch the source codes from the web page
> (assigned to url2 in the following codes) but got two painful problems.
>
> 1. odd characters or character missed, e.g. the name "ALBARIÑO
ORGANISTRUN"
Quote:

> become "ALBARI? ORGANISTRUN" if display in Notepad++ or "ALBARI
ORGANISTRUN"
Quote:

> if display in Notepad. waht I want is "ALBARIÑO ORGANISTRUN".
>
> 2. relative url, e.g. the href value of the anchor "Blancos" is
>
"prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&
strSearchCriteria="
Quote:

> , But I really want is its absolute address like
>
"http://www.elcatavinos.com/tienda/prodtype.asp?PT_ID=107&numRecordPosition=
1&strPageHistory=cat&strKeywords=&strSearchCriteria="
Quote:

>
> Can any one help me sort them out?
>
> Thanks in advance!
>
> Jason
>
> __________________________________________________________
> codes I used:
>
> dim url1
> dim url2
> dim xmlhttp
> dim datafile
> dim FS
> dim dataFileTs
> dim i
> dim cookie
>
> url1 = "http://www.elcatavinos.com/tienda/store/dynamicIndex.asp?sm=b1"
> url2 =
>
"http://www.elcatavinos.com/tienda/product.asp?numRecordPosition=5&P_ID=2547
3&strPageHistory=cat&strKeywords=&SearchFor=&PT_ID=107"
Quote:

> datafile = "c:\temp\test.dat"
>
> set FS = Wscript.CreateObject("Scripting.FileSystemObject")
> set datafileTs = FS.CreateTextFile(datafile, True, True)
> set xmlHTTP = Wscript.CreateObject("MSXML2.XMLHTTP.3.0")
> xmlHTTP.Open "HEAD",url1, false
> xmlHTTP.Send
Quote:

> i = 0
> do until xmlHTTP.readyState = 4
> Wscript.Sleep 100
> i = i + 1
> if i > 1000 then exit do
> Loop
Whats the loop for? The call to .send will be synchronous, the readyState
isn't going to change after .send, if isn't 4 after the call its never going
to be.

Quote:

> cookie = xmlhttp.getResponseHeader("set-cookie")
>
> xmlHTTP.Open "GET",url2, false
> xmlHTTP.SetRequestHeader "set-cookie",cookie
set-cookie is not a a request header, why are you sending it?
Quote:

> xmlHTTP.SetRequestHeader "Content-Type","text/html; charset=iso-8859-1"
You're not sending any content why are you specifying a Content-Type?
Quote:

> xmlHTTP.SetRequestHeader "Content-Location","absoluteURI"
Content-Location is not a Request Header and absoluteURI is not a literal
value, it indicates that a response my supply an alternate absolute URI to
the resource being sent by the server.

Quote:

> xmlHTTP.Send
> i = 0
> do until xmlHTTP.readyState = 4
> Wscript.Sleep 100
> i = i + 1
> if i > 1000 then exit do
> Loop
Lose the loop
Quote:

> datafileTs.Writeline xmlhttp.responseText
>
My guess is that server is not specifing a charset in the Content-Type
header it is sending OR the charset is specifies doesn't match the actual
encoding sent.

Does the HTML returned contain a meta tag specifying the
content-type/charset?

Do you administer the site you are accessing?



--
Anthony Jones - MVP ASP/ASP.NET


My System SpecsSystem Spec
Old 08-29-2008   #3 (permalink)
Paul Randall


 
 

Re: Fetch special characters like "Ñ" and absolute URL from href attribute of anchors


"jason" <atechmark@xxxxxx> wrote in message
news:%23G427DZCJHA.1228@xxxxxx
Quote:

> Hi there
>
> I used the following codes to fetch the source codes from the web page
> (assigned to url2 in the following codes) but got two painful problems.
>
> 1. odd characters or character missed, e.g. the name "ALBARIÑO
> ORGANISTRUN" become "ALBARI? ORGANISTRUN" if display in Notepad++ or
> "ALBARI ORGANISTRUN" if display in Notepad. waht I want is "ALBARIÑO
> ORGANISTRUN".
>
> 2. relative url, e.g. the href value of the anchor "Blancos" is
> "prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&strSearchCriteria="
> , But I really want is its absolute address like
> "http://www.elcatavinos.com/tienda/prodtype.asp?PT_ID=107&numRecordPosition=1&strPageHistory=cat&strKeywords=&strSearchCriteria="
>
> Can any one help me sort them out?
>
> Thanks in advance!
>
> Jason
>
> __________________________________________________________
> codes I used:
>
> dim url1
> dim url2
> dim xmlhttp
> dim datafile
> dim FS
> dim dataFileTs
> dim i
> dim cookie
>
> url1 = "http://www.elcatavinos.com/tienda/store/dynamicIndex.asp?sm=b1"
> url2 =
> "http://www.elcatavinos.com/tienda/product.asp?numRecordPosition=5&P_ID=25473&strPageHistory=cat&strKeywords=&SearchFor=&PT_ID=107"
> datafile = "c:\temp\test.dat"
>
> set FS = Wscript.CreateObject("Scripting.FileSystemObject")
> set datafileTs = FS.CreateTextFile(datafile, True, True)
> set xmlHTTP = Wscript.CreateObject("MSXML2.XMLHTTP.3.0")
> xmlHTTP.Open "HEAD",url1, false
> xmlHTTP.Send
> i = 0
> do until xmlHTTP.readyState = 4
> Wscript.Sleep 100
> i = i + 1
> if i > 1000 then exit do
> Loop
> cookie = xmlhttp.getResponseHeader("set-cookie")
>
> xmlHTTP.Open "GET",url2, false
> xmlHTTP.SetRequestHeader "set-cookie",cookie
> xmlHTTP.SetRequestHeader "Content-Type","text/html; charset=iso-8859-1"
> xmlHTTP.SetRequestHeader "Content-Location","absoluteURI"
> xmlHTTP.Send
> i = 0
> do until xmlHTTP.readyState = 4
> Wscript.Sleep 100
> i = i + 1
> if i > 1000 then exit do
> Loop
> datafileTs.Writeline xmlhttp.responseText
I'm running a US-English version of WXP SP2 & IE6

When I manually open a browser window and navigate to your url2, I get a web
page that displays ALBARIÑO ORGANISTRUN.
I can select and copy that two-word phrase and paste it into Notepad, where
it displays properly. If I save it as ansi text, things become a little
strange. If I open that ansi.txt file in notepad, I get chinese-like
characters, but on another WXPSP2 computer I get exactly ten boxes. If I
open it in Wordpad or IE, it displays properly.

If I save as Unicode instead of ansi text, then Notepad displays it properly
when I reopen the file.

I'm thinking that you have an encoding problem. The statement:
datafileTs.Writeline xmlhttp.responseText
gets response.Text into a local unnamed variant, and then passes that to
datafileTs.Writeline. Somewhere in this statement, there is a
locale/encoding mismatch which is giving you a problem. Perhaps you could
force datafileTs to be Unicode.

-Paul Randall


My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Shortcut changes when changing attribute from "Normal window" to "Maximized" Vista General
Invisible special XP folders like "My Music" and My Videos" in Vista. Vista General
How to restore the special "Music" and "Pictures" icons. Vista General
How to insert the "modified time" attribute in "date taken" attribute in batch mode-in vista or theough a software? Vista file management
How to insert the "modified time" attribute in "date taken" attrib Vista music pictures video


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46