RegEx keywords to eliminate junk email with invalid html tags |
Post Reply ![]() |
Author | |
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() Posted: 15 June 2003 at 11:02pm |
Starting with build v1.2.0.151 SpamFilter is able to scan the whole email content + subject header for RegEx (Regular Expression) keywords. This allows very powerful keyword searches. Many spammers send html emails containing invalid (thus invisible) html tags or html comments in between letters to avoid normal keyword detection. For example, the following html source: <!--fxkbu8116c72f6-->SP<mynqhy2d9bswg-->AM will actually display SPAMMER in an email client. We've been using the following RegEx search string to, so far, successfully block a lot of this spam: (<[!--]*[a-zA-Z0-9]{11,}) This is what the above expressions looks for (remember that SpamFilter requires a RegEx expression to be sorrounded by parenthesis () in order to distinguish it from regular keywords):
For example, <a href="aaaa.htm"> will not cause a trigger since there is a space immediately following the a before href. If anyone has comments, problems, or improvements with this "apparently magic" keyword search, please let us know! Roberto Franceschetti |
|
![]() |
|
Alan ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
When will any of these new feature appear in the official RELEASE version. I do not wish to experiment with the beta release, but feel that I am missing out on all the new features by continuing to use the most current official release (1.1.2.124) when the beta keeps getting all the new features. |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Alan, We had to make drastic changes in the code to support the new quarantine database and the web functionality. The code was not as stable as we would have liked, so we created and made public our beta test versions so that we could have more users test the application and report problems. Had we released an official release it would have been with several bugs, and would have many many users complaining of crashes. We really did not want that. After two weeks of testing we finally seem to have a much more stable product. Unless any major problems arise in the next few days, we are thinking of making this beta official by the end of this week. Roberto Franceschetti |
|
![]() |
|
George ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Roberto, Great Job, g |
|
![]() |
|
Alan ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
So this will only work with the current beta version? (not the most recent official release?)
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Currently yes, these features are only available in the beta. But we do anticipate to be releasing it officially within the next few days, so the wait will be very small! Roberto Franceschetti |
|
![]() |
|
MarvinFS ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
almost all Return receipts are being caught by this regexp... so...
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Can you post the source of such an email so we can try to find a way around it? Roberto Frnceschetti |
|
![]() |
|
Abdu ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Can RegEx work with a dictionary list? |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
What you mean exactly by "work with a dictionary list"? Roberto Franceschetti |
|
![]() |
|
JimMeredith ![]() Newbie ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 28 |
![]() ![]() ![]() ![]() ![]() |
The "invalid html tags" RegEx keyword has been working *almost* perfectly, but there have been a few situations where legitimate emails are being bounced by this rule. Here's what appears to be happening. If a message contains a forwarded message within its text, this forwarded message text is likely to include the original from/to email addresses. In many cases, these email addresses are enclosed in <>'s. For example: To: <longusername@earthlink.net> This matches the RegEx keyword criteria of [a-zA-Z0-9]{11,} so... bounce! To get it working, I've changed the [!--]* portion of the RegEx keyword (zero or more occurrences of !--) to instead read [!--]+ (one or more occurrences). This is still very effective, and has eliminated the bounces of legit messages... but is obviously not perfect as it doesn't offer protection from invalid html tags that are not comments. RegEx is new to me, but I might try working with it later and coming up with some sort of logical NOT based on the occurrence of a @ within the string. If someone more familiar with RegEx could just fire this out and post it here, that would be even better. :) Jim |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.240 seconds.