Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > VB Script

Vista - Extracting Links from an HTML document using a Script

Reply
 
Old 08-28-2009   #1 (permalink)
JoJo


 
 

Extracting Links from an HTML document using a Script

Folks:

I have an HTML document that is about 100 pages long. I assembled this
document from the "Articles By
This Author" section of the following web page:
http://www.tigersharktrading.com/authors/23/Harry-Boxer

Scattered throughout this document are many links to the web. The links of
interest to me all start with the ">>" characters, as seen
at TigerSharkTrading, then the name of the article is given as a link.

* How can I quickly extract these links and transfer same to a new file
?
* Is there some type of script that can quickly accomplish this task ?


Thanks,
JoJo.




My System SpecsSystem Spec
Old 08-28-2009   #2 (permalink)
mr_unreliable


 
 

Re: Extracting Links from an HTML document using a Script

JoJo wrote:
Quote:

> Folks:
>
> I have an HTML document that is about 100 pages long. I assembled this
> document from the "Articles By
> This Author" section of the following web page:
> http://www.tigersharktrading.com/authors/23/Harry-Boxer
>
> Scattered throughout this document are many links to the web. The links of
> interest to me all start with the ">>" characters, as seen
> at TigerSharkTrading, then the name of the article is given as a link.
>
> * How can I quickly extract these links and transfer same to a new file
> ?
> * Is there some type of script that can quickly accomplish this task ?
>
hi JoJo,

I suggest using the "all" collection (of the document
object).

Let's say that your links appear in an "anchor" (A) tag.

Then you could get your collection of anchor tags like this:

document.all.tags("A")

To get the tags you want, you could "walk-the-list" with
some sort of a loop (your choice, try "For Each").

The individual items would be addressed as:

document.all.tags("A")(i) ' where i is your index

And the number of items would be:

document.all.tags("A").Length

In your discussion, you mentioned the URL's, which are
probably appearing as the "href" attribute of the "A"
tag. My guess is that you can get the URL as:

document.all.tags("A")(i).href


cheers, jw
____________________________________________________________

You got questions? WE GOT ANSWERS!!! ..(but, no guarantee
the answers will be applicable to the questions)



My System SpecsSystem Spec
Old 08-29-2009   #3 (permalink)
Larry Serflaten


 
 

Re: Extracting Links from an HTML document using a Script


"JoJo" <swiftTrades@xxxxxx> wrote
Quote:

> I have an HTML document that is about 100 pages long. I assembled this
> document from the "Articles By
> This Author" section of the following web page:
> http://www.tigersharktrading.com/authors/23/Harry-Boxer
>
> Scattered throughout this document are many links to the web. The links of
> interest to me all start with the ">>" characters, as seen
> at TigerSharkTrading, then the name of the article is given as a link.
>
> * How can I quickly extract these links and transfer same to a new file
> ?
> * Is there some type of script that can quickly accomplish this task ?

As indicated by mr_unreliable, you will probable want to use the DOM
objects to parse the document. I was just going to add that it appears
all the links of interest are contained in SPAN objects that have a class name
of 'title'. So, instead of grabbing 'all' anchors, you could grab all 'SPAN'
objects and check for a className of title, and then do another grab
within that object for all anchors (of which there is only one, the one you
want)

Something like: (warning - air code)

For each sp in document.all.tags("SPAN")
If sp.className = "title" Then
For each ref in sp.all.tags("A")
' Save hRef to new file ex...
AppendToFile ref.hRef
Next
End If
Next

Your own AppendToFile routine night as well make the file an HTML
document, so you can load it in a browser and click on any interesting
links....

Have fun!
LFS


My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
html script Chillout Room
Error message when clicking on links and extra pop up html window. Vista mail
Html links not formatted correctly in windows mail Vista mail
how to get the body of an html document ? PowerShell
HTML links in Mail Vista General


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46