Print Page | Close Window

Request: Max Mesage Size for Keyword Scan

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=5399
Printed Date: 23 February 2025 at 1:00am


Topic: Request: Max Mesage Size for Keyword Scan
Posted By: sgeorge
Subject: Request: Max Mesage Size for Keyword Scan
Date Posted: 01 December 2005 at 9:19am
Hi, is it possible to have keywords still search subject messages when the message is too large for a keyword scan?

I receive messages with large attachments, and would block them with this blacklist keyword, but the message is too large to initiate the search:

(subject:You.visit.illegal.websites)





Replies:
Posted By: Desperado
Date Posted: 01 December 2005 at 10:54am

Hmm ... Perhaps I misunderstood the INI setting.  I *thought* it meant how deep into a message to scan .... *not* "do not scan if message is too large".

Minor Correction:

;Any emails whose text portion exceeds this number of KB will not be scanned for keywords and Bayes
;Higher values *may* catch more spam but will cause higher load on processor
MaxMsgSizeForKeywordScan=64

This should mean the an attachment will not effect the scanning ... just if the text size is over the limit.  das

Roberto ... Comments?



-------------
The Desperado
Dan Seligmann.
Work: http://www.mags.net
Personal: http://www.desperado.com



Posted By: LogSat
Date Posted: 01 December 2005 at 6:43pm
Dan is correct. SpamFilter should truncate the messsage text portion *after* the maximum msg size is reached, and only scan for keyword text in the firs nnn bytes of the email (in the first 64KB by default).

-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: sgeorge
Date Posted: 02 December 2005 at 10:00am
My apologies, I was certainly wrong about the reason this email was slipping through.  I was assuming that it was skipping the scan of the text portion of the email.

Instead, the problem was that I wasn't familiar enough with SpamFilter's behavior interpretation of RegEx.  I'm using version 2.6.3.473, unregistered.  Could I list some test cases here and ask someone to verify that this is normal behavior for SpamFilter?

Consider email 1 has the subject: You visit illegal websites

And email 2 has the subject: you visit illegal websites
(The only difference being that "y" is lower-case)

Case 1: emails 1 and 2 are received and scanned with the following black list keyword:
(subject:You.visit.illegal.websites)
Result: Neither email 1 nor email 2 is filtered

Case 2: emails 1 and 2 are received and scanned with the following black list keyword:
(subject:you.visit.illegal.websites)
(The only difference being that "y" is lower-case)
Result: Both email 1 and email 2 is filtered

Is this normal?  I've always used the (?i) control command to identify case-insensitive expressions, but I'm now finding that a RegEx rule with only lower-case letters and no (?i) command still matches any case.  Any help would be appreciated.  Thanks!


Posted By: LogSat
Date Posted: 02 December 2005 at 5:24pm

sgeorge,

You are correct in finding a problem, we need to make the documentation more clear. SpamFilter, for perfromance reasons, converts all the email's text to lower case when performing text searches. This does not matter for non-regex searches, but as you discovered, it does make a difference for RegEx. If you enter a lower-case RegEx expressions, there will not be any need for the (?i) switch.



-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: sgeorge
Date Posted: 04 December 2005 at 9:29am
Thanks, it helps to know what's going on.  In addition there being no need for the (?i) switch for lower-case regular expressions, have you observed this as well:

a keyword of:
(subject:You)

...fails to match subjects with "You" or "you" in them.


Posted By: LogSat
Date Posted: 04 December 2005 at 5:00pm
In theory it could happen if in the email's source the word "you" is encoded with a different charset. If you can post the original email's source code we can take a look. Please note that some email clients like MS Outlook will completely change an email's source *and* headers so an analysis using those clients cannot be made.

-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: sgeorge
Date Posted: 06 December 2005 at 11:07am

Actually, I'm even finding this oddity when I go to the "RegEx Test" tab.

For the RegEx Search String, I enter You (no quotes, no parens).  And in the large text box I enter the following two lines:

You
you

The search results in a "Not Found!" message.  Has this issue been encountered before?  I'm using version 2.3.6.473, unregistered.

Thanks for your help Roberto.  I honestly really appreciate so much assistance with software that I haven't even registered yet..



Posted By: LogSat
Date Posted: 06 December 2005 at 5:12pm
No problem for the help, we try to help everyone!

The same criteria applies for the RegEx Test tab, as it duplicates the functionality of the "real life" incoming emails. All text in the "large box" is treated as if it was an email's text, and is thus converted to lower case for the search. That is why the "You" with capital Y does not return a result.

And I apologize for the answer to the earlier question:

============
have you observed this as well:

a keyword of:
(subject:You)

...fails to match subjects with "You" or "you" in them
=============

as again the same answer also applies. The email is converted to lower case, so the upper case Y will not cause a match.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: sgeorge
Date Posted: 06 December 2005 at 5:48pm
Well, at least it makes sense the way that you explain it.  I can definitely live with that.  Thanks again.   



Print Page | Close Window