Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > .NET General

Vista - Regular expression for nested HTML tags

Reply
 
Old 02-28-2008   #1 (permalink)
Sudheer


 
 

Regular expression for nested HTML tags

I am looking for a regular expression for finding a certain content
presnt in a HTML page

The html page looks something like this:

<div class="info">
<h5>Genre:</h5>
<a href="http://www.imdb.com/Sections/Genres/Action/">Action</a> / <a
href="http://www.imdb.com/Sections/Genres/Adventure/">Adventure</a> /
<a href="http://www.imdb.com/Sections/Genres/Crime/">Crime</a> / <a
href="http://www.imdb.com/Sections/Genres/Thriller/">Thriller</a> <a
class="tn15more inline" href="http://www.imdb.com/title/tt0337978/
keywords" onclick="(new Image()).src='/rg/title-tease/keywords/images/
b.gif?link=/title/tt0337978/keywords';">more</a>
</div>

<div class="info">
<h5>Tagline:</h5>
Yippee Ki Yay Mo - John 6:27
</div>

<div class="info">
<h5>Plot Outline:</h5>
John McClane takes on an Internet-based terrorist organization who is
systematically shutting down the United States. <a class="tn15more
inline" href="http://www.imdb.com/title/tt0337978/plotsummary"
onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif?
link=/title/tt0337978/plotsummary';">more</a>
</div>



now i need a regular expression that looks out the entire HTML and
helps me extract
1. the tagline of the movie
2. the plot outline etc etc.


it is assured that they will be present in a div with id= "info"

any help in this regard would be appreciated!

My System SpecsSystem Spec
Old 02-28-2008   #2 (permalink)
Jesse Houwing


 
 

Re: Regular expression for nested HTML tags

Hello Sudheer,
Quote:

> I am looking for a regular expression for finding a certain content
> presnt in a HTML page
>
> The html page looks something like this:
>
> <div class="info">
> <h5>Genre:</h5>
> <a href="http://www.imdb.com/Sections/Genres/Action/">Action</a> / <a
> href="http://www.imdb.com/Sections/Genres/Adventure/">Adventure</a> /
> <a href="http://www.imdb.com/Sections/Genres/Crime/">Crime</a> / <a
> href="http://www.imdb.com/Sections/Genres/Thriller/">Thriller</a> <a
> class="tn15more inline" href="http://www.imdb.com/title/tt0337978/
> keywords" onclick="(new Image()).src='/rg/title-tease/keywords/images/
> b.gif?link=/title/tt0337978/keywords';">more</a>
> </div>
> <div class="info">
> <h5>Tagline:</h5>
> Yippee Ki Yay Mo - John 6:27
> </div>
> <div class="info">
> <h5>Plot Outline:</h5>
> John McClane takes on an Internet-based terrorist organization who is
> systematically shutting down the United States. <a class="tn15more
> inline" href="http://www.imdb.com/title/tt0337978/plotsummary"
> onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif?
> link=/title/tt0337978/plotsummary';">more</a>
> </div>
> now i need a regular expression that looks out the entire HTML and
> helps me extract
> 1. the tagline of the movie
> 2. the plot outline etc etc.
> it is assured that they will be present in a div with id= "info"
>
> any help in this regard would be appreciated!

That would be pretty easy to do:

"<div class=\"info\">\s*<h5>Tagline:</h5>(?<Tagline>((?!</div).)+)"
"<div class=\"info\">\s*<h5>Plot Outline:</h5>(?<Plot>((?!</div).)+)"

Or more generic:
"<div class=\"info\">\s*<h5>(?<Key>[^:]+):</h5>(?<Value>((?!</div).)+)"

Another option, that would be a little more rebust, would be to use the HTML
Agility Pack (can be found on www.codeplex.com).

--
Jesse Houwing
jesse.houwing at sogeti.nl


My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Regular Expression help C# .NET General
regular expression capture PowerShell
Regular Expression for ../ .NET General
Help with a regular expression VB Script
regular expression help VB Script


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46