Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Baysein filter - How can I tell if it's working?
  FAQ FAQ  Forum Search   Register Register  Login Login

Baysein filter - How can I tell if it's working?

 Post Reply Post Reply
Author
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Topic: Baysein filter - How can I tell if it's working?
    Posted: 27 July 2004 at 4:01pm

I know this may sound crazy should we see in the quarntine emails that are "Baysein Filtered" or something like that? I stopped the filter on Monday am, deleted the corpus files and restarted so we can "retrain" the filter. Now it's been a couple days and the activity log shows Baysein filtering 0% but I cannot find one quarantined message. The slide bar is set near the middle showing 98.75% filtering.

Am I overlooking the obvious here?

Thanks in advance,

Bob

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 27 July 2004 at 8:06pm

Bob,

First the obvious ... is "Learning" on?  If so,  look in the Corpus folder and then open the corpus.ini file.  It should look something like:

[Messages]
Spam=244275
Good=68470

The Bayesian filter will not start quarantining messages until both values are above 5000.  This value is set by the SpamFilter.ini file as:

MinEmailsForBayesKickIn=5000

You can lower this number but you increase the probability of false positives.

Let me know if this helps or if you are still having issues

Regards,

Dan S. (User)

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 28 July 2004 at 10:02am

Dan,

Thanks for the quick reply.

Yes learning is on and the .ini file is over 10,000 for both Spam and Good.

So next question: If the Bayseian filter quarantines a message is that stated in the log file? I've searched our logs to only find blocked by IP or keyword or attachements etc.. I'm looking for something that says blocked by Bayseian filter or something like that. Am I dreaming?

In the corpus directory there are a couple .tmp files, a .dat and .prg that are all over 3mb in size so something is working I'm just not sure I'm looking for the right results on a log file?

 

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 28 July 2004 at 10:12am

I finally found a few messages filtered by the Bayseian filter. They appear to have an id of 14 in the reject code but there are very few of them in over 250,000 quaranteened messages? The Bayseian Filter Threshold seems to adjust itself? Could that be correct? It's currently set at 92.xx% Is that too high? What is the recommeded level to set it at?

I think at this point the filter is working, it's just set at to high of a level to detect much?

 

Thanks,

Bob

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 July 2004 at 10:57am

Hey,

The filter will get better and better as time goes on depending on how good your standard filters are because those are what the filter bases it's statistics on.  Also, I have my filter set at 99.something % in all cases.

Dan

Back to Top
CyberBob View Drop Down
Groupie
Groupie


Joined: 26 January 2005
Status: Offline
Points: 43
Post Options Post Options   Thanks (0) Thanks(0)   Quote CyberBob Quote  Post ReplyReply Direct Link To This Post Posted: 28 July 2004 at 11:39am

Sorry for the continued questions but you are saying the higher you set the % the better the filtering?

Also the more we update our keyword filters/attachments etc.. the better the Bayseian filter becomes?

I thought the filter was there to lessen the amount of manual filters we have to create? but you say it builds it's list from the filters we have manually put in place?

I created a query and since Monday when I deleted the Corpus DB files and restarted the Bayseian filter has only blocked 10 message and we have tens of thousands per day so something doesn't seem setup correctly yet?

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 July 2004 at 12:02pm

First,  No, the percent is the level thatthe filter blocks at and the recomended value is in the 99% range so that you don't get overly agressive.

Second, the Bayesian filter builds it's information on what the filters teach it about the content of the messages that are blocked and as time goes on, it learns what the spam looks like.  In theory, once a good database is built, you could remove filters but I would not do that because I think that after a few days, it would start to reduce it's ability to detect garbage.

dan

Back to Top
Benny View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Benny Quote  Post ReplyReply Direct Link To This Post Posted: 02 August 2004 at 4:27pm
How do I know learning is on? So far my email block is over 20,000, but in the corpus.ini, it only shows 1520
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.111 seconds.