RegEx keywords to eliminate junk email with invalid html tags |
Post Reply |
Author | |
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
Posted: 15 June 2003 at 11:02pm |
Starting with build v1.2.0.151 SpamFilter is able to scan the whole email content + subject header for RegEx (Regular Expression) keywords. This allows very powerful keyword searches. Many spammers send html emails containing invalid (thus invisible) html tags or html comments in between letters to avoid normal keyword detection. For example, the following html source: <!--fxkbu8116c72f6-->SP<mynqhy2d9bswg-->AM will actually display SPAMMER in an email client. We've been using the following RegEx search string to, so far, successfully block a lot of this spam: (<[!--]*[a-zA-Z0-9]{11,}) This is what the above expressions looks for (remember that SpamFilter requires a RegEx expression to be sorrounded by parenthesis () in order to distinguish it from regular keywords):
For example, <a href="aaaa.htm"> will not cause a trigger since there is a space immediately following the a before href. If anyone has comments, problems, or improvements with this "apparently magic" keyword search, please let us know! Roberto Franceschetti |
|
Alan
Guest Group |
Post Options
Thanks(0)
|
When will any of these new feature appear in the official RELEASE version. I do not wish to experiment with the beta release, but feel that I am missing out on all the new features by continuing to use the most current official release (1.1.2.124) when the beta keeps getting all the new features. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Alan, We had to make drastic changes in the code to support the new quarantine database and the web functionality. The code was not as stable as we would have liked, so we created and made public our beta test versions so that we could have more users test the application and report problems. Had we released an official release it would have been with several bugs, and would have many many users complaining of crashes. We really did not want that. After two weeks of testing we finally seem to have a much more stable product. Unless any major problems arise in the next few days, we are thinking of making this beta official by the end of this week. Roberto Franceschetti |
|
George
Guest Group |
Post Options
Thanks(0)
|
Roberto, Great Job, g |
|
Alan
Guest Group |
Post Options
Thanks(0)
|
So this will only work with the current beta version? (not the most recent official release?)
|
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Currently yes, these features are only available in the beta. But we do anticipate to be releasing it officially within the next few days, so the wait will be very small! Roberto Franceschetti |
|
MarvinFS
Guest Group |
Post Options
Thanks(0)
|
almost all Return receipts are being caught by this regexp... so...
|
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Can you post the source of such an email so we can try to find a way around it? Roberto Frnceschetti |
|
Abdu
Guest Group |
Post Options
Thanks(0)
|
Can RegEx work with a dictionary list? |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
What you mean exactly by "work with a dictionary list"? Roberto Franceschetti |
|
JimMeredith
Newbie Joined: 27 January 2005 Location: United States Status: Offline Points: 28 |
Post Options
Thanks(0)
|
The "invalid html tags" RegEx keyword has been working *almost* perfectly, but there have been a few situations where legitimate emails are being bounced by this rule. Here's what appears to be happening. If a message contains a forwarded message within its text, this forwarded message text is likely to include the original from/to email addresses. In many cases, these email addresses are enclosed in <>'s. For example: To: <longusername@earthlink.net> This matches the RegEx keyword criteria of [a-zA-Z0-9]{11,} so... bounce! To get it working, I've changed the [!--]* portion of the RegEx keyword (zero or more occurrences of !--) to instead read [!--]+ (one or more occurrences). This is still very effective, and has eliminated the bounces of legit messages... but is obviously not perfect as it doesn't offer protection from invalid html tags that are not comments. RegEx is new to me, but I might try working with it later and coming up with some sort of logical NOT based on the occurrence of a @ within the string. If someone more familiar with RegEx could just fire this out and post it here, that would be even better. :) Jim |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.244 seconds.