Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - The keyword filter now also searches in the
  FAQ FAQ  Forum Search   Register Register  Login Login

The keyword filter now also searches in the "Received:" headers

 Post Reply Post Reply
Author
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Topic: The keyword filter now also searches in the "Received:" headers
    Posted: 28 April 2004 at 6:09pm

I noticed in the newer pre-release versions:

// New to VersionNumber = '2.0.1.333';
{TODO -cNew : The keyword filter now also searches in the Received: headers}

I think that this will be a big plus to the Bayesian filtering in that spam that is routed through the same open-relays and/or from the same sources will be filtered.

Roberto can you clarify the limitations of this new feature and how you see it being best utilized?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 28 April 2004 at 11:48pm

Alan,

The usage of the ability to search the headers is something we'll leave to the user's inventive. What SpamFilter does is to retrieve all the "Received:" header values, and adds them to the body of the email so that the keyword filter will scan thru them as well.

We have not included them in the Bayes analysis yet, as during our initial testing (that included all other headers as well) we were loosing some performance. This is something we may revisit in the near future however.

Roberto F.
LogSat Software

Back to Top
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Posted: 29 April 2004 at 12:15pm

Roberto, can you make it so the header info (or just parts of the header info such as "Receieved:") can be included in Bayesian filtering as an option?  Maybe using a check box?I think it would really be a powerful new tool to catch spam that passed thorough some of the known open relays that some spammers find and continue to reuse.  That way those that have powerful hardware can take advantage of the feature and those that feel they don't need/want to take a performance hit can leave it turned off.

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 30 April 2004 at 1:04am

Alan,

We're testing build 2.0.1.345 which is, as you requested, looking at all the Received: headers in the Bayesian filtering. More testing will be needed to see the effect this has on performance and the average size increase of the corpus database.

This build is available for download in the registered user area on our website. If you'd like to test it we'd like to hear back from you how it's performing.

Roberto F.
LogSat Software

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 30 April 2004 at 1:08am

As a followup on the previous answer, if you use the new 345 build, to be accurate you will probably need to start with a fresh corpus so that the received headers have the proper weight in the corpus database.

Roberto F.
LogSat Software

Back to Top
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Posted: 30 April 2004 at 12:13pm

I am giving the 345 release a try.  It may take a few days to build up tokens.  It looks like you haven't implimented any way to turn the feature on/off ?

One possible problem did come to mind.  If you use backup spooling servers in your MX record, some spammers target them as a secondary entryway.  If you get a lot of spam sent using this method, I suspect the spooling servers could eventually be detected as spam by the Bayesian filtering? 

Roberto does this sound correct?

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 30 April 2004 at 11:02pm

I would let statistics do their work... If spammers send mail to your backup MX server, most of the email you will receive from it will be spam. The Received headers will contain your backup's IP, and they will then be taken into consideration. When SpamFilter receives email from your backup, it will see the IP/server name in the received headers, which will cause the probability to be spam to increase slightly, but this is correct since most email from your backup is spam. If the message is good to begin with, statistically the number of "good" tokens will likely make up for the "bad" score caused by the ip.

This is all theory however, actual use will prove its validity.

Roberto F.
LogSat Software

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.146 seconds.