opensubscriber
   Find in this group all groups
 
Unknown more information…

o : oro-user@jakarta.apache.org 10 June 2006 • 3:24AM -0400

Odd Regex behavior in oro 2.0.8 lib
by CJ Jouhal

REPLY TO AUTHOR
 
REPLY TO GROUP




Hi,
  I am seeing some odd regex behavior.

Using the demo applet:
http://jakarta.apache.org/oro/demo.html

I try the following pattern:
<(script|object|applet|style|noscript)[^>]*>[\s\S]*?</\1[^>]*>
or another alternate version of (with single line
flag)
<(script|object|applet|style|noscript)[^>]*>.*?</\1[^>]*>

With the following test input:
   <td height="35" colspan="2" align="center"
class="style1">
    
<script type="text/javascript">
function spawn(fileName,width,height) {
window.open(fileName,'new','toolbar=0,location=0,directories=0,status=0,menubar=0,scrollbars=0,width='+width+',height='+height+',resizable=0');
}
</script>
<style type="text/css">
.Copyright { font-size: 10px; font-family: Verdana,
Arial; color: #FFF; padding:2px; margin:0px;
vertical-align:1px; line-height:11px; }
.Copyright A { color: #FFF; }
</style>
<span class="Copyright">© 2006 <a
href="http://www.domain.com/" target="_blank">Vantage
Media Corporation</a> - <a
href="JavaScript:spawn('http://www.domain.com/privacy.html','770','501');">Privacy
Statement</a> - <a
href="JavaScript:spawn('http://www.domain.com/feedback/?data=aHR0cDovL2NvbGxlZ2UudXMuY29tL2NlYy9mdXR1cmVkZWdyZWUvZGVzaWduLnBocA','460','520');">Send
Us Feedback</a></span>    </td>
    <td valign="top"> </td>
  </tr>
</table>
=================================================

And the first pattern matches twice (second pattern
obviously doesn't match in the applet since the applet
doesn't have the single line flag applied)

But the following code:

Perl5Compiler s_perlCompiler = new Perl5Compiler();
m_matcher = new Perl5Matcher();
m_matcher.setMultiline(false);

Pattern m_forbiddenTagsWithContentPattern =
s_perlCompiler.compile(

"<(script|object|applet|style|noscript)[^>]*>[\\s\\S]*?</\1[^>]*>",
Perl5Compiler.CASE_INSENSITIVE_MASK
| Perl5Compiler.READ_ONLY_MASK);

// remove content and tags that include
script/applet/object etc
StringSubstitution substitution1 = new
StringSubstitution(SPACE);
filteredStr =
Util.substitute(m_matcher,
m_forbiddenTagsWithContentPattern,
substitution1,
text,
Util.SUBSTITUTE_ALL);
// text is set as the above sample text.

The subtitution does nothing.  I even tried:
PatternMatcherInput input = new
PatternMatcherInput(text);
while(m_matcher.contains(input, pattern)) {
System.out.println("In manual strip method - Found
match btw:" + input.getMatchBeginOffset() + "," +
input.getMatchEndOffset() + ":" +
input.substring(input.getMatchBeginOffset(),
input.getMatchEndOffset()));
}

And the above logs nothing.

I tried compiling the pattern with the
SINGLE_LINE_MASK but that made no difference.

Any ideas/help would be appreciated.

TIA,
CJ


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jaka...
For additional commands, e-mail: oro-user-help@jaka...

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

Related Messages

opensubscriber is not affiliated with the authors of this message nor responsible for its content.