• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

VBScript String Replace - Remove / Replace Characters in String

D

dsoutter

#1
VBScript String Replace

http://www.code-tips.com/2009/04/vbscript-string-clean-function-remove.html

Remove or replace specific characters from a string. The article below
provides a function in VBScript to remove or replace characters in a
string.

VBScript String Replace

http://www.code-tips.com/2009/04/vbscript-string-clean-function-remove.html

remove Illegal Characters from a string: VBScript String Replace

http://www.code-tips.com/2009/04/vbscript-string-clean-function-remove.html

VBScript replace characters in string.
 

My Computer

D

dsoutter

#2
http://groups.google.com/group/web-programming-seo/browse_thread/thread/9fcc0e6307ccbce0

On Mar 2, 4:11 pm, dsoutter <webmasterhub....@newsgroup> wrote:

> VBScript String Replace
>
> http://www.code-tips.com/2009/04/vbscript-string-clean-function-remov...
>
> Remove or replace specific characters from a string. The article below
> provides a function in VBScript to remove or replace characters in a
> string.
>
> VBScript String Replace
>
> http://www.code-tips.com/2009/04/vbscript-string-clean-function-remov...
>
> remove Illegal Characters from a string: VBScript String Replace
>
> http://www.code-tips.com/2009/04/vbscript-string-clean-function-remov...
>
> VBScript replace characters in string.
 

My Computer

A

Al Dunbar

#3
"dsoutter" <webmasterhub.net@newsgroup> wrote in message
news:a49b3803-8e79-46eb-a8c2-61454f499ee7@newsgroup

> http://groups.google.com/group/web-programming-seo/browse_thread/thread/9fcc0e6307ccbce0
>
> On Mar 2, 4:11 pm, dsoutter <webmasterhub....@newsgroup> wrote:

>> VBScript String Replace
>>
>> http://www.code-tips.com/2009/04/vbscript-string-clean-function-remov...
>>
>> Remove or replace specific characters from a string. The article below
>> provides a function in VBScript to remove or replace characters in a
>> string.
>>
>> VBScript String Replace
>>
>> http://www.code-tips.com/2009/04/vbscript-string-clean-function-remov...
>>
>> remove Illegal Characters from a string: VBScript String Replace
>>
>> http://www.code-tips.com/2009/04/vbscript-string-clean-function-remov...
>>
>> VBScript replace characters in string.
>
Here is how I would code this function if I ever needed such a thing:

msgbox clean("C:\<test>&<done>")

function clean (strtoclean)
strtemp = strtoclean
badchars =
Array("?","/","\",":","*","""","<",">","","&","#","~","%","{","}","+","_",".")
for each badchar in badchars
select case badchar
case "&": goodchar = " and "
case ":": goodchar = "-"
case else: goodchar = " "
end select
strtemp = replace( strtemp, badchar, goodchar )
next
clean = strtemp
end function

IMHO, this has the same result but the logic is somewhat simpler. What
benefit would I get from switching from my version to yours?

/Al
 

My Computer

D

dsoutter

#4
On Mar 3, 3:16 pm, "Al Dunbar" <aland...@newsgroup> wrote:

> "dsoutter" <webmasterhub....@newsgroup> wrote in message
>
> news:a49b3803-8e79-46eb-a8c2-61454f499ee7@newsgroup
>
>
>
>
> >

> > On Mar 2, 4:11 pm, dsoutter <webmasterhub....@newsgroup> wrote:

> >> VBScript String Replace
> >

> >> Remove or replace specific characters from a string. The article below
> >> provides a function in VBScript to remove or replace characters in a
> >> string.
>

> >> VBScript String Replace
> >

> >> remove Illegal Characters from a string: VBScript String Replace
> >

> >> VBScript replace characters in string.
>
> Here is how I would code this function if I ever needed such a thing:
>
>     msgbox clean("C:\<test>&<done>")
>
>     function clean (strtoclean)
>         strtemp = strtoclean
>         badchars =
> Array("?","/","\",":","*","""","<",">","","&","#","~","%","{","}","+","_","­.")
>         for each badchar in badchars
>             select case badchar
>                 case "&": goodchar = " and "
>                 case ":": goodchar = "-"
>                 case else: goodchar = " "
>             end select
>             strtemp = replace( strtemp, badchar, goodchar )
>         next
>         clean = strtemp
>     end function
>
> IMHO, this has the same result but the logic is somewhat simpler. What
> benefit would I get from switching from my version to yours?
>
> /Al- Hide quoted text -
>
> - Show quoted text -
Hi Al, the logic is simpler as you are using the replace() function to
perform the string replace, where the function provided takes the left
and right parts of a string, either side of an illegal character. In
many cases, your method would be more suitable mainly due to the
simpler logic, especially when all instances of each character are to
be processed in the same way.

As the method provided parses the string character by character, you
should have greater control over the output when more complex
operations need to be performed, such as removing or replacing a
character only if it within a specific context:

Eg. replace "&" with " and " if padded with spaces or other specific
character, or with a "+" if not
"something & something else" would become "something and something
else"
"somethin&something else" would become "somethin+something else".

Eg. replace ":" only if NOT part of a url:

"the website is http://code-tips.com " would remain "the website is
http://code-tips.com "
"See Here: http://code-tips.com " would become "See Here http://code-tips.com
"

This would be achieved by either checking the previous 3-5 characters
when a ":" is found to see if it is in the context of a url or not
(http, https, ftp), or by checking the characters following the
current ":" is "//" which would indicate that the semicolon is part of
a url.

This functionality has not been included in the function provided, but
would be easy to implement, as the string is incrementally parsed and
manipulated using a numeric string position value relative to the
current position/character in the string.

There may also be differences in performance between the two methods,
as the function provided includes the code required to remove or
replace each of the specified characters without calling the replace()
function. I suspect that the replace function uses a similar approach
to replace the specified characters so any difference in performance
would be minimal, unless parsing a large string value. I haven't yet
tested this for performance differences.

Thanks
 

My Computer

W

WebmasterHub.net

#5
On Mar 3, 3:16 pm, "Al Dunbar" <aland...@newsgroup> wrote:

> "dsoutter" <webmasterhub....@newsgroup> wrote in message
>
> news:a49b3803-8e79-46eb-a8c2-61454f499ee7@newsgroup
>
>
>
>
> >

> > On Mar 2, 4:11 pm, dsoutter <webmasterhub....@newsgroup> wrote:

> >> VBScript String Replace
> >

> >> Remove or replace specific characters from a string. The article below
> >> provides a function in VBScript to remove or replace characters in a
> >> string.
>

> >> VBScript String Replace
> >

> >> remove Illegal Characters from a string: VBScript String Replace
> >

> >> VBScript replace characters in string.
>
> Here is how I would code this function if I ever needed such a thing:
>
>     msgbox clean("C:\<test>&<done>")
>
>     function clean (strtoclean)
>         strtemp = strtoclean
>         badchars =
> Array("?","/","\",":","*","""","<",">","","&","#","~","%","{","}","+","_","­.")
>         for each badchar in badchars
>             select case badchar
>                 case "&": goodchar = " and "
>                 case ":": goodchar = "-"
>                 case else: goodchar = " "
>             end select
>             strtemp = replace( strtemp, badchar, goodchar )
>         next
>         clean = strtemp
>     end function
>
> IMHO, this has the same result but the logic is somewhat simpler. What
> benefit would I get from switching from my version to yours?
>
> /Al- Hide quoted text -
>
> - Show quoted text -
Hi Al, the logic is simpler as you are using the replace() function
to
perform the string replace, where the function provided takes the
left
and right parts of a string, either side of an illegal character. In
many cases, your method would be more suitable mainly due to the
simpler logic, especially when all instances of each character are to
be processed in the same way.

As the method provided parses the string character by character, you
should have greater control over the output when more complex
operations need to be performed, such as removing or replacing a
character only if it within a specific context:


Eg. replace "&" with " and " if padded with spaces or other specific
character, or with a "+" if not
"something & something else" would become "something and something
else"
"somethin&something else" would become "somethin+something else".


Eg. replace ":" only if NOT part of a url:


"the website is http://code-tips.com " would remain "the website is
http://code-tips.com "
"See Here: http://code-tips.com " would become "See Here http://code-tips.com
"


This would be achieved by either checking the previous 3-5 characters
when a ":" is found to see if it is in the context of a url or not
(http, https, ftp), or by checking the characters following the
current ":" is "//" which would indicate that the semicolon is part
of
a url.


This functionality has not been included in the function provided,
but
would be easy to implement, as the string is incrementally parsed and
manipulated using a numeric string position value relative to the
current position/character in the string.

There may also be differences in performance between the two methods,
as the function provided includes the code required to remove or
replace each of the specified characters without calling the
replace()
function. I suspect that the replace function uses a similar
approach
to replace the specified characters so any difference in performance
would be minimal, unless parsing a large string value. I haven't yet
tested this for performance differences.
 

My Computer

M

mayayana

#6
This looks like some kind of advertisement
for a blog, but it's an interesting question.
In compiled VB both of the foregoing methods
would be extremely slow on large strings.
The webpage sample is allocating a vast
number of strings to do its job. As the strings
get bigger it would slow to a crawl. The Replace
function looks much better to me, but it's also
fairly slow. (Replace itself is slow.)

Probably none of that matters if the function
is only being used for filename strings of 20+-
characters. And it's not easy to optimize for
speed in VBS anyway. But personally I'd still much
prefer your Replace loop. I don't see the sense of
writing a highly inefficient Replace method in
VBS when the scripting runtime can do it internally.

But in general, why not tokenize? In compiled
code that should be by far the fastest, with much
greater speed achieved if the characters can be
treated as numbers in an array so that the operation
is not allocating new strings or deciphering the Chr
value of each stored numeric value of the string.
In VBS, I don't know whether treating characters as
numbers will help, since it's still a variant that has
to be "parsed". I haven't tested the possibilities.
But I'm using numeric conversion below. I figured that
it should be a little faster than having the function
need to do a string comparison. (In a Select Case
where the character is not an "illegal" there would be
20-30 string comparisons happening if one uses the
string version.)

Another adsvantage of tokenizing is flexibility.
There can be dozens of Case declares with very
little cost.

' Note: I just wrote this as an "air code" sample.
' I didn't bother to get all of the ascii values since
' it's just a demo.

Function Clean(sIn)
Dim i2, iChar, A1()

ReDim A1(len(sIn) - 1)
For i2 = 1 to Len(sIn)
iChar = Asc(Mid(sIn, i2, 1))
Select Case iChar
Case 63, 47, 92, 58, 42, 60, 62, 44, 46, 43, 126
A1(i2 - 1) = "-"
Case Else
A1(i2 - 1) = Chr(iChar)
End Select
Next
Clean = Join(A1, "")
End Function
 

My Computer

J

James

#7
On Mar 5, 1:59 am, "mayayana" <mayay...@newsgroup> wrote:

>   This looks like some kind of advertisement
> for a blog, but it's an interesting question.
> In compiled VB both of the foregoing methods
> would be extremely slow on large strings.
> The webpage sample is allocating a vast
> number of strings to do its job. As the strings
> get bigger it would slow to a crawl. The Replace
> function looks much better to me, but it's also
> fairly slow. (Replace itself is slow.)
>
>    Probably none of that matters if the function
> is only being used for filename strings of 20+-
> characters. And it's not easy to optimize for
> speed in VBS anyway. But personally I'd still much
> prefer your Replace loop. I don't see the sense of
> writing a highly inefficient Replace method in
> VBS when the scripting runtime can do it internally.
>
>    But in general, why not tokenize? In compiled
> code that should be by far the fastest, with much
> greater speed achieved if the characters can be
> treated as numbers in an array so that the operation
> is not allocating new strings or deciphering the Chr
> value of each stored numeric value of the string.
> In VBS, I don't know whether treating characters as
> numbers will help, since it's still a variant that has
> to be "parsed". I haven't tested the possibilities.
> But I'm using numeric conversion below. I figured that
> it should be a little faster than having the function
> need to do a string comparison. (In a Select Case
> where the character is not an "illegal" there would be
> 20-30 string comparisons happening if one uses the
> string version.)
>
>    Another adsvantage of tokenizing is flexibility.
> There can be dozens of Case declares with very
> little cost.
>
> ' Note: I just wrote this as an "air code" sample.
> ' I didn't bother to get all of the ascii values since
> ' it's just a demo.
>
> Function Clean(sIn)
>  Dim i2, iChar, A1()
>
>  ReDim A1(len(sIn) - 1)
>     For i2 = 1 to Len(sIn)
>        iChar = Asc(Mid(sIn, i2, 1))
>       Select Case iChar
>         Case 63, 47, 92, 58, 42, 60, 62, 44, 46, 43, 126
>            A1(i2 - 1) = "-"
>         Case Else
>           A1(i2 - 1) = Chr(iChar)
>       End Select
>     Next
>       Clean = Join(A1, "")
> End Function
Hi Mayayana,

As the "air code" sample of your method parses the string character by
character, I suspect theat a combination of your method and the
function provided should allow characters to be replaced, taking into
account the context of each illegal character.

I am using the method to clean a plain text string that may or may not
contain URLs. If there are URLs present in the string, they are later
replaced with an internal url with paramaters pointing to a logging
script that loggs and forwards the request to the original url. The
cleaned string is also used to generate a set of keywords and
keyphrases from the text supplied.

I have based the code below from the "air code" demo, which has also
not been tested. I have incorporated the contextual tests to only
remove/replace some characters if they are not in a scpecific context
(using a URL as an example).

The method below must certainly be a better approach to the function
linked from this thread, or suggested by Al. What do you think? Also,
is there a better way to incorporate the contextual tests for each
illegal character the string?

Thanks

James

-------------------------

Function Clean(sIn)
Dim i2, iChar, A1()

ReDim A1(len(sIn) - 1)
For i2 = 1 to Len(sIn)
iChar = Asc(Mid(sIn, i2, 1))
Select Case iChar
Case 58
rChars = Mid(sIn, i2+1, 2)
If rChars = "//" Then
A1(i2 - 1) = Chr(iChar)
End If

Case 47
rChar = Asc(Mid(sIn, i2+1, 1))
lChar = Asc(Mid(sIn, i2-1, 1))

If rChar = 47 OR lChar = 47 Then
A1(i2 - 1) = Chr(iChar)
Else
A1(i2 - 1) = "-"
End If

Case 63, 92, 42, 60, 62
A1(i2 - 1) = "-"

Case 44, 46, 43, 126
A1(i2 - 1) = ""

Case Else
A1(i2 - 1) = Chr(iChar)
End Select
Next
Clean = Join(A1, "")
End Function
 

My Computer

M

mayayana

#8
>
The method below must certainly be a better approach to the function
linked from this thread, or suggested by Al. What do you think? Also,
is there a better way to incorporate the contextual tests for each
illegal character the string? I think that's pretty much what I meant in saying
it's flexible. There's no limit, really. One could even
call separate functions from within the Select Case.

Parsing URLs
sounds tricky, but it can be done. For instance, you
could check each ":" to see if it's part of "http://",
then get the whole URL and write your edited
URL to the array. You'd just have to find the end
of the URL, calculate the offset of the start and end
characters, and keep track of how many characters
you've actually written to the array. With edits involved
you might need to use a bigger array and then Redim
Preserve it at the end before the Join call.

-------------------------

Function Clean(sIn)
Dim i2, iChar, A1()

ReDim A1(len(sIn) - 1)
For i2 = 1 to Len(sIn)
iChar = Asc(Mid(sIn, i2, 1))
Select Case iChar
Case 58
rChars = Mid(sIn, i2+1, 2)
If rChars = "//" Then
A1(i2 - 1) = Chr(iChar)
End If

Case 47
rChar = Asc(Mid(sIn, i2+1, 1))
lChar = Asc(Mid(sIn, i2-1, 1))

If rChar = 47 OR lChar = 47 Then
A1(i2 - 1) = Chr(iChar)
Else
A1(i2 - 1) = "-"
End If

Case 63, 92, 42, 60, 62
A1(i2 - 1) = "-"

Case 44, 46, 43, 126
A1(i2 - 1) = ""

Case Else
A1(i2 - 1) = Chr(iChar)
End Select
Next
Clean = Join(A1, "")
End Function
 

My Computer

J

James

#9
On Mar 5, 1:55 pm, "mayayana" <mayay...@newsgroup> wrote:

> The method below must certainly be a better approach to the function
> linked from this thread, or suggested by Al. What do you think? Also,
> is there a better way to incorporate the contextual tests for each
> illegal character the string?
>
>
>
>   I think that's pretty much what I meant in saying
> it's flexible. There's no limit, really. One could even
> call separate functions from within the Select Case.
>
>   Parsing URLs
> sounds tricky, but it can be done. For instance, you
> could check each ":" to see if it's part of "http://",
> then get the whole URL and write your edited
> URL to the array. You'd just have to find the end
> of the URL, calculate the offset of the start and end
> characters, and keep track of how many characters
> you've actually written to the array. With edits involved
> you might need to use a bigger array and then Redim
> Preserve it at the end before the Join call.
>
> -------------------------
>
> Function Clean(sIn)
>  Dim i2, iChar, A1()
>
>  ReDim A1(len(sIn) - 1)
>     For i2 = 1 to Len(sIn)
>        iChar = Asc(Mid(sIn, i2, 1))
>       Select Case iChar
> Case 58
> rChars = Mid(sIn, i2+1, 2)
> If rChars = "//" Then
> A1(i2 - 1) = Chr(iChar)
> End If
>
> Case 47
> rChar = Asc(Mid(sIn, i2+1, 1))
> lChar = Asc(Mid(sIn, i2-1, 1))
>
> If rChar = 47 OR lChar = 47 Then
> A1(i2 - 1) = Chr(iChar)
> Else
> A1(i2 - 1) = "-"
> End If
>
> Case 63, 92, 42, 60, 62
>    A1(i2 - 1) = "-"
>
> Case 44, 46, 43, 126
>    A1(i2 - 1) = ""
>
>         Case Else
>           A1(i2 - 1) = Chr(iChar)
>       End Select
>     Next
>       Clean = Join(A1, "")
> End Function
Thanks Mayayana,

The illegal characters are being removed or replaced as expected. I
am using a regular expression with the replace function to remove all
html tags exept for "a" tags (hyperlinks). I am then removing all "a"
tags so that only the href value is left, which is placed after the
anchor text in brackets.

The next step I am using the string clean function from the linked
article (now modified to include suggestions in this thread) to remove
all special characters from the string except when a component of a
URL.

The final step, which I am currently working on is to parse the
cleaned string to replace urls with the internal redirect. It is
working as expected, but there are some cases where URLs are not
followed by a space depending on the context in the original string.
The problem being that there isn't currently a consistent method to
find the end of each URL. I am working toward adjusting the function
so that all URLs are contained in square brackets [] once processed
using the string clean function so that they can be found easily when
parsing to update the URLs.

I am replacing all special characters with a space, then re-parsing
the string to remove double (or more) spaces between words / URLs.
This works most of the time, but as i am not removing "." chars (ASCII
# 46), a url may end up with an additional "." at the end (http://
address.com.). To prevent this, i am replacing all "." with " ."
before parsing URLs so allow URLS to be recognised consistently.
After parsing and converting URLs, I then replace any occurrances of
" ." with the original "."

This seems to work, but I am not sure that it is the best way to do
this as the same string is parsed a number of times before the desired
results are achieved.

The string clean function works well using the tokenizing method.
Thanks again for your suggestion.

James
 

My Computer

M

mayayana

#10
>
This seems to work, but I am not sure that it is the best way to do
this as the same string is parsed a number of times before the desired
results are achieved. I think if it were me I'd put it *all* in the tokenizer.
For instance, for "<" you could do something like:

Case 60
If ucase(Mid(sIn, i2 + 1, 1)) = "A" then
'This is an anchor tag, so parse it.
Else 'drop out all other tags.
Do
i2 = i2 + 1
if Mid(sIn, i2, 1) = ">" then exit do
Loop
End If

One note with that: You'd want to use Do/Loop
for the main loop so that you can change the
value of i2. The code above would go back to the
start of the main loop and begin processing the next
character after the end of the tag. My original code
used: For i2 = ..... Next

I guess it all gets down to a matter of personal
preference at some point, though. You're the one
who's going to have to maintain your script. :)
 

My Computer

A

Al Dunbar

#11
"dsoutter" <webmasterhub.net@newsgroup> wrote in message
news:2456c1b9-460d-46dd-af7f-62620e277e83@newsgroup

> On Mar 3, 3:16 pm, "Al Dunbar" <aland...@newsgroup> wrote:

>> "dsoutter" <webmasterhub....@newsgroup> wrote in message
<snip>

>> Here is how I would code this function if I ever needed such a thing:
<snip>

>> IMHO, this has the same result but the logic is somewhat simpler. What
>> benefit would I get from switching from my version to yours?
>>
>> /Al- Hide quoted text -
>>
>> - Show quoted text -
>
> Hi Al, the logic is simpler as you are using the replace() function to
> perform the string replace, where the function provided takes the left
> and right parts of a string, either side of an illegal character.
A nice analysis, and exactly my point. Thanks for making it for me.

> In
> many cases, your method would be more suitable mainly due to the
> simpler logic, especially when all instances of each character are to
> be processed in the same way.
True. But, as written, your function will also only process all instances of
each character in the same way. My method might therefore appear to be
better in all cases in which the functions, as written, could be used. If
you want to compare our methods when applied to a different problem space,
such as you describe here:

> As the method provided parses the string character by character, you
> should have greater control over the output when more complex
> operations need to be performed, such as removing or replacing a
> character only if it within a specific context:
You cannot compare my function as written with your function as modified to
solve some new problem. A better comparison would be to compare your
modified function with a different function I might write to solve that
problem.

> Eg. replace "&" with " and " if padded with spaces or other specific
> character, or with a "+" if not
> "something & something else" would become "something and something
> else"
> "somethin&something else" would become "somethin+something else".
>
> Eg. replace ":" only if NOT part of a url:
>
> "the website is http://code-tips.com " would remain "the website is
> http://code-tips.com "
> "See Here: http://code-tips.com " would become "See Here
> http://code-tips.com
> "
>
> This would be achieved by either checking the previous 3-5 characters
> when a ":" is found to see if it is in the context of a url or not
> (http, https, ftp), or by checking the characters following the
> current ":" is "//" which would indicate that the semicolon is part of
> a url.
There might even be other ways to perform this kind of parsing...

> This functionality has not been included in the function provided, but
> would be easy to implement, as the string is incrementally parsed and
> manipulated using a numeric string position value relative to the
> current position/character in the string.
You seem to be proposing that simple functions be written in such a way that
they are more directly adaptable into more complex ones capable of more
complex operations. I disagree with this approach, UNLESS a function is
coded in such a way that it can be made to perform the more complex work
without first having to be modified to do so by calling it in a different
manner.

I'm not saying that you are wrong to do it your way, just that it may not be
the best approach for others to emulate.

> There may also be differences in performance between the two methods,
> as the function provided includes the code required to remove or
> replace each of the specified characters without calling the replace()
> function.
Yes, you avoid calling replace. But you do that by calling instr for each
possible bad character, plus left, mid, len, and and two string
concatenations for each bad character actually present. If you are concerned
with the overhead of calling a built-in function, my method does that fewer
times.

> I suspect
suspect, but do not know...

> that the replace function uses a similar approach
> to replace the specified characters so any difference in performance
> would be minimal, unless parsing a large string value. I haven't yet
> tested this for performance differences.
I haven't tested either, however, the actual logic used by a built-in
function, while possibly logically identical to that of a function written
in vbscript, is more likely to be faster and more efficient. This is mainly
because the built-in functions are coded in a lower level language.

Regardless, no argument over ultimate relative efficiency can really be
resolved without rigorous testing. Since neither of us feel it important
enough to do that, we probably both are willing to accept some
inefficiencies, given that our functions each perform their intended tasks
perfectly! ;-)

Or do they? I haven't tested your code, but my reading of it suggests to me
that it make unstated assumptions about the nature of the string it is
processing (does it, for example, presume that the string represents a valid
NTFS, UNC or URL path of some sort?).

If you wouldn't mind, try running your function against a string such as
"C::\". I suspect the result might be "C :\", a string containing an illegal
character. If so, you would have to either include an internal recursive
call, or call your function in a loop until the result no longer changed. Or
you would have to qualify your documentation to explain that it is intended
only to process valid paths strings (or whatever the case actually is).

Regardless, another knock against your function as posted, if you are
interested in objective criticism, is that it does not fully document
itself. The nature of an "illegal character" is somewhat inferred, but not
fully explained. If the goal is to convert a valid path to a string that
could be used as a filename, here are a few quirks you appear not to have
addressed:

non-uniqueness: Run your function (or mine, for that matter) on these two
different paths: "C:\documents and settings" and
"C:\documents\and\settings", and you get the same result: "C documents and
settings".

other filename invalidities: run it on one of those huge URL strings and you
might wind up with a filename that was actually too long for the file system
to handle.

the concept of adapting the function to do more comprehensive processing. If
that actually was the reason for your less simple approach, your audience is
not getting the benefit if you do not explain that.

the vagueness of the name of the function itself: clean? there's nothing
dirty here. Calling it Path2Filename might be a more accurate representation
of its purpose (or it might not - I could not tell the purpose from the code
itself without your additional explanation.

/Al
 

My Computer

A

Al Dunbar

#12
"WebmasterHub.net" <webmasterhub.net@newsgroup> wrote in message
news:923bab94-9163-4786-b9f3-c3f283a97ff2@newsgroup

> On Mar 3, 3:16 pm, "Al Dunbar" <aland...@newsgroup> wrote:

>> "dsoutter" <webmasterhub....@newsgroup> wrote in message
<snip>


> Hi Al, the logic is simpler as you are using the replace() function
> to
> perform the string replace, where the function provided takes the
> left
> and right parts of a string, either side of an illegal character. In
> many cases, your method would be more suitable mainly due to the
> simpler logic, especially when all instances of each character are to
> be processed in the same way.
>
> As the method provided parses the string character by character, you
> should have greater control over the output when more complex
> operations need to be performed, such as removing or replacing a
> character only if it within a specific context:
>
>
> Eg. replace "&" with " and " if padded with spaces or other specific
> character, or with a "+" if not
> "something & something else" would become "something and something
> else"
> "somethin&something else" would become "somethin+something else".
>
>
> Eg. replace ":" only if NOT part of a url:
>
>
> "the website is http://code-tips.com " would remain "the website is
> http://code-tips.com "
> "See Here: http://code-tips.com " would become "See Here
> http://code-tips.com
> "
>
>
> This would be achieved by either checking the previous 3-5 characters
> when a ":" is found to see if it is in the context of a url or not
> (http, https, ftp), or by checking the characters following the
> current ":" is "//" which would indicate that the semicolon is part
> of
> a url.
>
>
> This functionality has not been included in the function provided,
> but
> would be easy to implement, as the string is incrementally parsed and
> manipulated using a numeric string position value relative to the
> current position/character in the string.
>
> There may also be differences in performance between the two methods,
> as the function provided includes the code required to remove or
> replace each of the specified characters without calling the
> replace()
> function. I suspect that the replace function uses a similar
> approach
> to replace the specified characters so any difference in performance
> would be minimal, unless parsing a large string value. I haven't yet
> tested this for performance differences.
I already replied to your identical post from your alter ego ;-)

/Al
 

My Computer

A

Al Dunbar

#13
"James" <webmasterhub.net@newsgroup> wrote in message
news:f4d5de01-3c8f-430a-8c8c-1fcbd78aa5df@newsgroup

> On Mar 5, 1:59 am, "mayayana" <mayay...@newsgroup> wrote:

>> This looks like some kind of advertisement
>> for a blog, but it's an interesting question.
<snip>

> Hi Mayayana,
>
> As the "air code" sample of your method parses the string character by
> character, I suspect theat a combination of your method and the
> function provided should allow characters to be replaced, taking into
> account the context of each illegal character.
>
> I am using the method to clean a plain text string that may or may not
> contain URLs. If there are URLs present in the string, they are later
> replaced with an internal url with paramaters pointing to a logging
> script that loggs and forwards the request to the original url. The
> cleaned string is also used to generate a set of keywords and
> keyphrases from the text supplied.
You see, that whole description is not inherent in the listing you have
posted of your clean function.

> I have based the code below from the "air code" demo, which has also
> not been tested. I have incorporated the contextual tests to only
> remove/replace some characters if they are not in a scpecific context
> (using a URL as an example).
>
> The method below must certainly be a better approach to the function
> linked from this thread, or suggested by Al.
It might indeed be better, but I don't see where this must certainly be so.
Your original function and my "simpler" version never even tried to do the
contextual bit, so saying code that was designed to do so is better is a bit
like saying a hammer is a better tool than a nailfile for nailing things
together.

> What do you think? Also,
> is there a better way to incorporate the contextual tests for each
> illegal character the string?
My guess: yes, probably there is. I just find your code below even harder to
follow than the original clean function. But as implied previously, it seems
odd to have two functions doing two different things but having the same
name.

/Al

> Thanks
>
> James
>
> -------------------------
>
> Function Clean(sIn)
> Dim i2, iChar, A1()
>
> ReDim A1(len(sIn) - 1)
> For i2 = 1 to Len(sIn)
> iChar = Asc(Mid(sIn, i2, 1))
> Select Case iChar
> Case 58
> rChars = Mid(sIn, i2+1, 2)
> If rChars = "//" Then
> A1(i2 - 1) = Chr(iChar)
> End If
>
> Case 47
> rChar = Asc(Mid(sIn, i2+1, 1))
> lChar = Asc(Mid(sIn, i2-1, 1))
>
> If rChar = 47 OR lChar = 47 Then
> A1(i2 - 1) = Chr(iChar)
> Else
> A1(i2 - 1) = "-"
> End If
>
> Case 63, 92, 42, 60, 62
> A1(i2 - 1) = "-"
>
> Case 44, 46, 43, 126
> A1(i2 - 1) = ""
>
> Case Else
> A1(i2 - 1) = Chr(iChar)
> End Select
> Next
> Clean = Join(A1, "")
> End Function
 

My Computer

A

Al Dunbar

#14
"mayayana" <mayayana@newsgroup> wrote in message
news:OCgsS8AvKHA.4220@newsgroup
> The method below must certainly be a better approach to the function
> linked from this thread, or suggested by Al. What do you think? Also,
> is there a better way to incorporate the contextual tests for each
> illegal character the string? >
> I think that's pretty much what I meant in saying
> it's flexible. There's no limit, really. One could even
> call separate functions from within the Select Case.
>
> Parsing URLs
> sounds tricky, but it can be done. For instance, you
> could check each ":" to see if it's part of "http://",
> then get the whole URL and write your edited
> URL to the array. You'd just have to find the end
> of the URL, calculate the offset of the start and end
> characters, and keep track of how many characters
> you've actually written to the array. With edits involved
> you might need to use a bigger array and then Redim
> Preserve it at the end before the Join call.
in my opinion, the use of regular expressions seems more likely to be more
efficient than coding all the ifs ands and buts in vbscript. But sorry, I'm
not a regular expression kind of guy.

/Al

> -------------------------
>
> Function Clean(sIn)
> Dim i2, iChar, A1()
>
> ReDim A1(len(sIn) - 1)
> For i2 = 1 to Len(sIn)
> iChar = Asc(Mid(sIn, i2, 1))
> Select Case iChar
> Case 58
> rChars = Mid(sIn, i2+1, 2)
> If rChars = "//" Then
> A1(i2 - 1) = Chr(iChar)
> End If
>
> Case 47
> rChar = Asc(Mid(sIn, i2+1, 1))
> lChar = Asc(Mid(sIn, i2-1, 1))
>
> If rChar = 47 OR lChar = 47 Then
> A1(i2 - 1) = Chr(iChar)
> Else
> A1(i2 - 1) = "-"
> End If
>
> Case 63, 92, 42, 60, 62
> A1(i2 - 1) = "-"
>
> Case 44, 46, 43, 126
> A1(i2 - 1) = ""
>
> Case Else
> A1(i2 - 1) = Chr(iChar)
> End Select
> Next
> Clean = Join(A1, "")
> End Function
>
>
 

My Computer

A

Al Dunbar

#15
"James" <webmasterhub.net@newsgroup> wrote in message
news:cc81aa22-8549-43a8-ac8a-0d96a2bd6314@newsgroup

> On Mar 5, 1:55 pm, "mayayana" <mayay...@newsgroup> wrote:

>> The method below must certainly be a better approach to the function
>> linked from this thread, or suggested by Al. What do you think? Also,
>> is there a better way to incorporate the contextual tests for each
>> illegal character the string?
<snip>

>
> Thanks Mayayana,
>
> The illegal characters are being removed or replaced as expected. I
> am using a regular expression with the replace function to remove all
> html tags exept for "a" tags (hyperlinks). I am then removing all "a"
> tags so that only the href value is left, which is placed after the
> anchor text in brackets.
>
> The next step I am using the string clean function from the linked
> article (now modified to include suggestions in this thread) to remove
> all special characters from the string except when a component of a
> URL.
>
> The final step, which I am currently working on is to parse the
> cleaned string to replace urls with the internal redirect. It is
> working as expected, but there are some cases where URLs are not
> followed by a space depending on the context in the original string.
> The problem being that there isn't currently a consistent method to
> find the end of each URL. I am working toward adjusting the function
> so that all URLs are contained in square brackets [] once processed
> using the string clean function so that they can be found easily when
> parsing to update the URLs.
So I am curious. What was the purpose of your initial post? To get some
feedback on a script you are trying to develop? Or to advertise a site
containing expertly developed code? Or to get feedback on a site purportedly
containing expertly developed code?

/Al
 

My Computer

J

James

#16
Hi Al, Thanks for your wise words. The reason for using the function
in this case is not for filenames, although it was written for this
purpose. You method using the replace function will not work at all
for what I am trying to achieve. If you read the response to your
question, you will actually see that i agreed with you that the
replace method would be more suitable if every all illegal characters
are being processed in the same way (remove all / replace all
occurrences with the same char). As i am removing characters from the
text that are not a component of a url, the replace method in your
function would not be suitable, as it doesn't allow me to test
characters surrounding an illegal character.

> You cannot compare my function as written with your function as modified to
> solve some new problem.
There was no comparison with "some new problem" and your function. I
acknowledged that in the context of the linked article and in response
to your intelligent rhetorical question that you method would be
better. BUT, in the context of the solution I am working towards yours
would not be suitable, which is why I needed to explain the scenario
in more detail.

> Regardless, another knock against your function as posted, if you are
> interested in objective criticism, is that it does not fully document
> itself. The nature of an "illegal character" is somewhat inferred, but not
> fully explained. If the goal is to convert a valid path to a string that
> could be used as a filename, here are a few quirks you appear not to have
> addressed:
The term "illegal characters" is used because that is what the article
and function was originally written for removing characters that are
illegal in filenames. This doesn't mean that the function can only
ever be used to remove characters in filenames. I am not using it for
filenames at all in this case, which makes most of what you have said
irrelevant. Thanks for pointing out this highly important fact.

Sorry that you seem to have gotten your knickers in a knot. If you
just looking for an argument, then you should find another community
to abuse.

James
 

My Computer

M

mayayana

#17
>> I haven't tested the possibilities.

>
> I strongly suspect that the variant thing will
> make most vbscript code less
> efficient than a compiled language, and that
> it might cause the tokenized
> approach to be less efficient than it might be expected to be.
>
There's not much sense in talking about it
if we're all just going to speculate, so I tried
it out. I think you're clearly right. Replace bogs
down in compiled code, but the reverse is the
case with VBS. And a different-length replacement
string doesn't seem to affect the results to
speak of.

While the
tokenizing provides a very nice way to do a very
complex operation on a string, it doesn't come
close compared to Replace.

I tried your function, my numeric tokenizer, and
a tokenizer that left each character as a string.
Testing a few large HTML files I found that the
numeric tokeinzer was slightly faster than the
string tokenizer, but the Replace method was
about 10 times as fast.

Dim Arg, FSO, TS, s1, i1, i2, s2
Arg = WScript.arguments(0)

Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.OpenTextFile(Arg, 1)
s1 = TS.ReadAll
TS.Close
Set TS = Nothing

i1 = timer
s2 = CleanTok(s1)
i2 = timer
MsgBox "Time for tokenize: " & (i2 - i1) * 1000 & " ms"

i1 = timer
s2 = CleanTokS(s1)
i2 = timer
MsgBox "Time for tokenizeS: " & (i2 - i1) * 1000 & " ms"


i1 = timer
s2 = CleanRep(s1)
i2 = timer
MsgBox "Time for replace: " & (i2 - i1) * 1000 & " ms"

Set FSO = nothing

Function CleanRep (strtoclean)
strtemp = strtoclean
badchars = Array("?", "/", "\", ":", "*", """", "<", ">", ",", "&",
"#", "~", "%", "{", "}", "+", "_", ".")
For Each badchar in badchars
Select Case badchar
Case "&": goodchar = " and "
Case ":": goodchar = "-"
Case Else: goodchar = " "
End Select
strtemp = replace( strtemp, badchar, goodchar )
Next
cleanRep = strtemp
End Function

Function CleanTokS(sIn)
Dim i2, Char, A1()
ReDim A1(len(sIn) - 1)
For i2 = 1 to Len(sIn)
Char = Mid(sIn, i2, 1)
Select Case Char
Case "?", "/", "\", ":", "*", """", "<", ">", ",", "&", "#", "~",
"%", "{", "}", "+", "_", "."
A1(i2 - 1) = "-"
Case Else
A1(i2 - 1) = Char
End Select
Next
CleanTokS = Join(A1, "")
End Function

Function CleanTok(sIn)
Dim i2, iChar, A1()
ReDim A1(len(sIn) - 1)
For i2 = 1 to Len(sIn)
iChar = Asc(Mid(sIn, i2, 1))
Select Case iChar
Case 63, 47, 92, 58, 42, 60, 62, 44, 46, 43, 126, 37, 123, 125, 43,
95, 46
A1(i2 - 1) = "-"
Case Else
A1(i2 - 1) = Chr(iChar)
End Select
Next
CleanTok = Join(A1, "")
End Function
 

My Computer

A

Al Dunbar

#18
"James" <webmasterhub.net@newsgroup> wrote in message
news:1a1afd8b-2ac6-459a-8be9-f930469d4675@newsgroup

> Hi Al, Thanks for your wise words. The reason for using the function
> in this case is not for filenames, although it was written for this
> purpose. You method using the replace function will not work at all
> for what I am trying to achieve. If you read the response to your
> question, you will actually see that i agreed with you that the
> replace method would be more suitable if every all illegal characters
> are being processed in the same way (remove all / replace all
> occurrences with the same char). As i am removing characters from the
> text that are not a component of a url, the replace method in your
> function would not be suitable, as it doesn't allow me to test
> characters surrounding an illegal character.
I think we are talking at cross-purposes here. I have been comparing my
replace-based version of your "clean" function with your version. I have not
been saying that one should use replace or that it can be used in every
situation. All I have been saying is that if you have two functions that
produce identical results, the better choice is usually the simpler of the
two.

I misread you as representing your "clean" function as one that you were
making available for others to use, as-is, as an example of a well-written
function. I did not anticipate that this thread would evolve into a
discussion of an application for which neither version of the function would
suffice, but one that would need to be adapted.

>> You cannot compare my function as written with your function as modified
>> to
>> solve some new problem.
>
> There was no comparison with "some new problem" and your function.
Thanks for putting me straight on that. This goes to my upthread comment
about talking at cross-purposes.

> I
> acknowledged that in the context of the linked article and in response
> to your intelligent rhetorical question that you method would be
> better. BUT, in the context of the solution I am working towards yours
> would not be suitable, which is why I needed to explain the scenario
> in more detail.
I never suggested that my version of your function would do anything
different than it does. But at least I think I am starting to understand
where you are coming from...

>> Regardless, another knock against your function as posted, if you are
>> interested in objective criticism, is that it does not fully document
>> itself. The nature of an "illegal character" is somewhat inferred, but
>> not
>> fully explained. If the goal is to convert a valid path to a string that
>> could be used as a filename, here are a few quirks you appear not to have
>> addressed:
>
> The term "illegal characters" is used because that is what the article
> and function was originally written for removing characters that are
> illegal in filenames. This doesn't mean that the function can only
> ever be used to remove characters in filenames. I am not using it for
> filenames at all in this case, which makes most of what you have said
> irrelevant. Thanks for pointing out this highly important fact.
Not so important a fact, just a comment made with constructive intent on the
assumption that you were, indeed, looking for comment.

> Sorry that you seem to have gotten your knickers in a knot. If you
> just looking for an argument, then you should find another community
> to abuse.
If my knickers were in a knot over this teapot tempest (which they aren't)
that would be my fault, not yours. I apologize for seeming to be taking an
abuse approach here, as that was truly not my intent.

/Al
 

My Computer

Users Who Are Viewing This Thread (Users: 1, Guests: 0)