Windows Vista Forums
Vista Forums Home Join Vista Forums Windows 7 Forum Vista Tutorials Tags
Welcome to Windows Vista Forums. Our forum is dedicated to helping you find solutions with any problems, errors or issues you are experiencing with Windows Vista. The Vista forum also covers news and updates and has an extensive Windows Vista tutorial section that covers a wide range of tips and tricks.

Go Back   Vista Forums > Misc Newsgroups > .NET General

Vista - Regex Question

Reply
 
Old 12-10-2008   #1 (permalink)
Joerg Battermann


 
 

Regex Question

Hello there,

I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...


Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?


Cheers and thanks,
-Jörg

My System SpecsSystem Spec
Old 12-10-2008   #2 (permalink)
Dathan


 
 

Re: Regex Question

On Dec 10, 10:20*am, Joerg Battermann <j...@xxxxxx>
wrote:
Quote:

> Hello there,
>
> I am a little bit confused by a regex I need to run on some strings...
> basically I want to match it all "file:// .... .xls" occurrences, but
> only the ones that do NOT start or end with a " (quote). The reason
> for that is basically I want to find pure text occurrences of a file://-link
> within a html file, and not the ones that are <a href="file:\
> \....xls">abc</a> ones...
>
> Does anyone maybe know from the top of his/her head what the correct
> regex would be in this situation?
>
> Cheers and thanks,
> -Jörg
I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too. So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.
My System SpecsSystem Spec
Old 12-10-2008   #3 (permalink)
Dathan


 
 

Re: Regex Question

On Dec 10, 4:26*pm, Dathan <dat...@xxxxxx> wrote:
Quote:

> On Dec 10, 10:20*am, Joerg Battermann <j...@xxxxxx>
> wrote:
>
Quote:

> > Hello there,
>
Quote:

> > I am a little bit confused by a regex I need to run on some strings...
> > basically I want to match it all "file:// .... .xls" occurrences, but
> > only the ones that do NOT start or end with a " (quote). The reason
> > for that is basically I want to find pure text occurrences of a file://-link
> > within a html file, and not the ones that are <a href="file:\
> > \....xls">abc</a> ones...
>
Quote:

> > Does anyone maybe know from the top of his/her head what the correct
> > regex would be in this situation?
>
Quote:

> > Cheers and thanks,
> > -Jörg
>
> I think something like [^"](file://.+\.xls)[^"] should do the trick.
> Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
> single-quoted attributes, too. *So you might have to change the regex
> to [^'"](file://.+\.xls)[^'"] or something similar.
May need to use .+? instead of .+, as .+ does greedy matching and .+?
does not. With .+, if you have multiple occurrences of
"file://.......xsl" on a single line, it'll include the first file://
and the last .xls and everything between as a single match.

~Dathan
May need to change this to [^'"](file://.+?\.xls)[^'"] to turn off
greedy matching. (I think that's the correct syntax
My System SpecsSystem Spec
Old 12-11-2008   #4 (permalink)
Andrew Morton


 
 

Re: Regex Question

Joerg Battermann wrote:
Quote:

> I am a little bit confused by a regex I need to run on some strings...
> basically I want to match it all "file:// .... .xls" occurrences, but
Um, that could be file:/// - as in three slashes - some circumstances. Or
even four:

http://en.wikipedia.org/wiki/File_URI_scheme


Andrew


My System SpecsSystem Spec
Old 12-11-2008   #5 (permalink)
Jeff Johnson


 
 

Re: Regex Question

"Dathan" <dathan@xxxxxx> wrote in message
news:a52d02e6-53b9-4bff-83b5-018f6cbd89a0@xxxxxx
Quote:
Quote:

>> I think something like [^"](file://.+\.xls)[^"] should do the trick.
>> Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
>> single-quoted attributes, too. So you might have to change the regex
>> to [^'"](file://.+\.xls)[^'"] or something similar.
Quote:

> May need to use .+? instead of .+, as .+ does greedy matching and .+?
> does not. With .+, if you have multiple occurrences of
> "file://.......xsl" on a single line, it'll include the first file://
> and the last .xls and everything between as a single match.
You should probably also use a backreference so that whatever type of
quotation mark you match the first time, you match the second time. Don't
ask me for the syntax, I don't remember; I just know it exists.

The other thing that came to my mind when I saw this question was maybe it
will require lookahead/lookbehind. But maybe that's overkill in this
situation.


My System SpecsSystem Spec
Reply

Thread Tools


Similar Threads
Thread Forum
Simple regex question PowerShell
Re: Simple Regex Question PowerShell
Regex Question PowerShell
Regex multiline question PowerShell
Regex question PowerShell


Vista Forums is an independent web site and has not been authorized,
sponsored, or otherwise approved by Microsoft Corporation.
"Windows Vista", the Start Orb, and related materials are trademarks of Microsoft Corp.
© Designer Media Ltd

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46