Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - RegEx keywords to eliminate junk email with invalid html tags
  FAQ FAQ  Forum Search   Register Register  Login Login

RegEx keywords to eliminate junk email with invalid html tags

 Post Reply Post Reply
Author
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Topic: RegEx keywords to eliminate junk email with invalid html tags
    Posted: 15 June 2003 at 11:02pm

Starting with build v1.2.0.151 SpamFilter is able to scan the whole email content + subject header for RegEx (Regular Expression) keywords.

This allows very powerful keyword searches. Many spammers send html emails containing invalid (thus invisible) html tags or html comments in between letters to avoid normal keyword detection.

For example, the following html source:

<!--fxkbu8116c72f6-->SP<mynqhy2d9bswg-->AM 
    <!--ei2rq7erjldy3y-->MER<!--ywf1ph1zmgcik9-->

will actually display SPAMMER in an email client.

We've been using the following RegEx search string to, so far, successfully block a lot of this spam:

(<[!--]*[a-zA-Z0-9]{11,})

This is what the above expressions looks for (remember that SpamFilter requires a RegEx expression to be sorrounded by parenthesis () in order to distinguish it from regular keywords):

  • <   look for an open tag start character, immediately followed by...
  • [!--]*   this looks for zero or more occurrences of the  !--  characters indicating an html comment, immediately followd by...
  • [a-zA-Z0-9]  any letter or digit....
  • {11,}  repeated at least 11 times. This has to be a combination of only either letters or numbers. Any space, tab, single quote, double quote etc will break the sequence.

For example, <a href="aaaa.htm"> will not cause a trigger since there is a space immediately following the a before href.
We choose a minimum repetition of 11 since <blockquote> is a valid tag 10 characters long...

If anyone has comments, problems, or improvements with this "apparently magic" keyword search, please let us know!

Roberto Franceschetti
LogSat Software

Back to Top
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Posted: 16 June 2003 at 11:25am

When will any of these new feature appear in the official RELEASE version.

I do not wish to experiment with the beta release, but feel that I am missing out on all the new features by continuing to use the most current official release (1.1.2.124) when the beta keeps getting all the new features.

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 16 June 2003 at 12:49pm

Alan,

We had to make drastic changes in the code to support the new quarantine database and the web functionality. The code was not as stable as we would have liked, so we created and made public our beta test versions so that we could have more users test the application and report problems. Had we released an official release it would have been with several bugs, and would have many  many users complaining of crashes. We really did not want that.

After two weeks of testing we finally seem to have a much more stable product. Unless any major problems arise in the next few days, we are thinking of making this beta official by the end of this week.

Roberto Franceschetti
LogSat Software

Back to Top
George View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote George Quote  Post ReplyReply Direct Link To This Post Posted: 16 June 2003 at 1:59pm

Roberto,
Thanks for posting the RegEx code for the html comments used to by pass keyword filtering. It works great so far. I was able to eliminate all of the differant comment tags I was using and reduce the keyword list to a smaller size.

Great Job,

g

Back to Top
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Posted: 16 June 2003 at 4:14pm
So this will only work with the current beta version?  (not the most recent official release?)
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 17 June 2003 at 3:29pm

Currently yes, these features are only available in the beta. But we do anticipate to be releasing it officially within the next few days, so the wait will be very small!

Roberto Franceschetti
LogSat Software

Back to Top
MarvinFS View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote MarvinFS Quote  Post ReplyReply Direct Link To This Post Posted: 18 June 2003 at 8:08am
almost all Return receipts are being caught by this regexp... so...
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 18 June 2003 at 8:24am

Can you post the source of such an email so we can try to find a way around it?

Roberto Frnceschetti
LogSat Software

Back to Top
Abdu View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Abdu Quote  Post ReplyReply Direct Link To This Post Posted: 19 June 2003 at 1:21pm

 

Can RegEx work with a dictionary list?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 19 June 2003 at 3:30pm

What you mean exactly by "work with a dictionary list"?

Roberto Franceschetti
LogSat Software

Back to Top
JimMeredith View Drop Down
Newbie
Newbie


Joined: 27 January 2005
Location: United States
Status: Offline
Points: 28
Post Options Post Options   Thanks (0) Thanks(0)   Quote JimMeredith Quote  Post ReplyReply Direct Link To This Post Posted: 20 June 2003 at 8:11pm

The "invalid html tags" RegEx keyword has been working *almost* perfectly, but there have been a few situations where legitimate emails are being bounced by this rule.

Here's what appears to be happening.  If a message contains a forwarded message within its text, this forwarded message text is likely to include the original from/to email addresses.  In many cases, these email addresses are enclosed in <>'s.  For example:

To: <longusername@earthlink.net>

This matches the RegEx keyword criteria of [a-zA-Z0-9]{11,} so... bounce!

To get it working, I've changed the [!--]* portion of the RegEx keyword (zero or more occurrences of !--) to instead read [!--]+ (one or more occurrences).  This is still very effective, and has eliminated the bounces of legit messages... but is obviously not perfect as it doesn't offer protection from invalid html tags that are not comments.

RegEx is new to me, but I might try working with it later and coming up with some sort of logical NOT based on the occurrence of a @ within the string.  If someone more familiar with RegEx could just fire this out and post it here, that would be even better. :)

Jim

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.244 seconds.