Keyword may have not been scanned
Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=5978
Printed Date: 14 March 2025 at 3:56am
Topic: Keyword may have not been scanned
Posted By: sgeorge
Subject: Keyword may have not been scanned
Date Posted: 16 February 2007 at 9:58am
Hi All, long time no see. :)
One message came through that I was hoping a RegEx blacklist keyword would match. I've checked my logs to see if there was any whitelisting, or if part of the message was skipped for being over the max scan size, and from the logs it looks like neither was the case.
Here's the RegEx keyword:
((?i)\w ?\w ?\w ?\w ?\. ?p ?k) |
Here is a copy of the plain-text content of the message:
front of get-smart playtime can create has a A lack of spontaneous three = mornings and parents alike.=20
Nothing could be better than=20 ● CHINA BIOLIFE ENTERP (CBFE.PK) ● STOCK!!!
New CBFE.PK STOCK this is GREAT OPPORTUNITY to BE a rich man!!! Forecasts for YOU is only positive just purchase this CBFE.PK SHARE!!!
Trust us cause we ASSURE U the real profit!!! For more info about CBFE.PK check brokers web-site!!!
Hurry up U must buy this CBFE.PK SHARE on FRIDAY: 02/16/07
a lack of playtime contribute to depression become creative, videos, = enrichment |
And here's the relevant snippet from the log files (i.p.s and addresses have been changed...):
02/16/07 04:25:25:984 -- (6628) Connection from: 123.123.123.123 - Originating country : United States 02/16/07 04:25:26:343 -- (6628) Resolving 123.123.123.123 - intrepid.xo.com 02/16/07 04:25:26:718 -- (6628) - SPF analysis for spam.com done: - none 02/16/07 04:25:26:718 -- (6628) Mail from: sender@spam.com 02/16/07 04:25:27:640 -- (6628) - MAPS search done... 02/16/07 04:25:27:640 -- (6628) RCPT TO: recipient@mydomain.com accepted 02/16/07 04:25:27:890 -- (6628) EMail from sender@spam.com to recipient@mydomain.com passes Bayesian filter - 22.3561% spam (31ms) 02/16/07 04:25:27:890 -- (6628) EMail from sender@spam.com to recipient@mydomain.com was queued. Size: 1 KB, 1024 bytes 02/16/07 04:25:27:906 -- (7496) Sending email from sender@spam.com to recipient@mydomain.com -- 02/16/07 04:25:27:953 -- (6392) Time to add Msg to Bayes corpus:0 02/16/07 04:25:28:047 -- (6628) Disconnect 02/16/07 04:25:28:218 -- (7496) EMail from sender@spam.com to recipient@mydomain.com -- was forwarded to 10.10.10.1:26 |
I am running v 3.1.3.615. Also, my max scan setting in SpamFilter.ini is:
MaxMsgSizeForKeywordScan=64
Thanks for your help. I'm hoping that I'm just missing something, but it seems kind of funky.
Stephen
|
Replies:
Posted By: sgeorge
Date Posted: 16 February 2007 at 10:04am
Also, I meant to mention something interesting I noticed in my "RegEx Test" tab in SpamFilter. If I enter the RegEx search string "(?i)\w ?\w ?\w ?\w ?\. ?p ?k" (no quotes), I found the following...
The pattern was found in this text:
All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy. TEST.PK All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy. |
But it was not found in this text:
All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy. All work and no play makes Jack a dull boy. TEST.PK All work and no play makes Jack a dull boy.
|
Thanks for listenin'. :)
Stephen
|
Posted By: sgeorge
Date Posted: 23 February 2007 at 6:49pm
Just a mini-update...
I tried doing a full uninstall & reinstall of v 3.1.3.615. Oddly, it did not fix the problem.
Stephen
|
Posted By: ImInAfrica
Date Posted: 25 February 2007 at 4:20pm
I tested this on 650 and can confirm same issue Looks like over certain number of characters before the regex hit the regex fails.
Amir
|
Posted By: sgeorge
Date Posted: 26 February 2007 at 5:11pm
Hey, thanks for testing it man.
|
Posted By: mikek
Date Posted: 08 March 2007 at 10:54am
I can confirm this, I was always wondering why so many spams with inline images came through, although I had the correct "src=cid:..." keywords set.
Just tested my keyword with a mail that came through. If I paste the whole email, the regex test outputs "not found". If I just paste a few lines around the src=cid, it will output "found", like it should...
This is a serious issue that has to be looked into!
Cheers,
Mike
|
Posted By: LogSat
Date Posted: 08 March 2007 at 11:05am
Mike,
Can you please froward us the whole email (headers and email body included)?
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: mikek
Date Posted: 08 March 2007 at 11:06am
Just did some more tests and it looks like it has something to do with the regex that is used...
For me, the error shows with this regex: src="cid:(.)*\$(.)*@(.)*"
E-Mail is on it's way...
|
Posted By: LogSat
Date Posted: 09 March 2007 at 11:34pm
Everyone,
It seems that some of your RegEx are causing a stack overflow for their complexity, and while SpamFilter will recover from the error, this will cause it to miss the keyword match in that particular string.
We're currently looking at the "greedy" option in RegEx, that is enabled by default in SpamFilter. In the sample mikek provided, we modified his RegEx to include the modifier: (?-g) at the beginning of the expression. This disables the "greedy" mode in RegEx and successfully detects the string.
Mike, if you change your string from:
((?i)(src="cid:(.)*\$(.)*@(.)*"))
to
((?-gi)(src="cid:(.)*\$(.)*@(.)*")) or ((?-g)(?i)(src="cid:(.)*\$(.)*@(.)*"))
your expression will work.
Unfortunately this means you may have to add the (?-g) modifier in all your RegEx. We're looking into what side-effects we'd have if we were to disable greedy mode by default in SpamFilter...
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: mikek
Date Posted: 12 March 2007 at 6:52am
Hi Roberto
turning off "greedy" mode worked!
personally, i would not change the default behaviour, but maybe update the documentation to state that greedy mode is on by default (as it is with most regex implementations) and mention the -g parameter.
it would also be nice if an exception caused by a regex would be logged...
Cheers,
Mike
|
Posted By: sgeorge
Date Posted: 12 March 2007 at 10:46am
...Nice detective work. Thanks you two!
Stephen
|
|