Baysein filter - How can I tell if it's working? |
Post Reply ![]() |
Author | |
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() Posted: 27 July 2004 at 4:01pm |
I know this may sound crazy should we see in the quarntine emails that are "Baysein Filtered" or something like that? I stopped the filter on Monday am, deleted the corpus files and restarted so we can "retrain" the filter. Now it's been a couple days and the activity log shows Baysein filtering 0% but I cannot find one quarantined message. The slide bar is set near the middle showing 98.75% filtering. Am I overlooking the obvious here? Thanks in advance, Bob |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
Bob, First the obvious ... is "Learning" on? If so, look in the Corpus folder and then open the corpus.ini file. It should look something like: [Messages] The Bayesian filter will not start quarantining messages until both values are above 5000. This value is set by the SpamFilter.ini file as: MinEmailsForBayesKickIn=5000 You can lower this number but you increase the probability of false positives. Let me know if this helps or if you are still having issues Regards, Dan S. (User) |
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Dan, Thanks for the quick reply. Yes learning is on and the .ini file is over 10,000 for both Spam and Good. So next question: If the Bayseian filter quarantines a message is that stated in the log file? I've searched our logs to only find blocked by IP or keyword or attachements etc.. I'm looking for something that says blocked by Bayseian filter or something like that. Am I dreaming? In the corpus directory there are a couple .tmp files, a .dat and .prg that are all over 3mb in size so something is working I'm just not sure I'm looking for the right results on a log file?
|
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
I finally found a few messages filtered by the Bayseian filter. They appear to have an id of 14 in the reject code but there are very few of them in over 250,000 quaranteened messages? The Bayseian Filter Threshold seems to adjust itself? Could that be correct? It's currently set at 92.xx% Is that too high? What is the recommeded level to set it at? I think at this point the filter is working, it's just set at to high of a level to detect much?
Thanks, Bob |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
Hey, The filter will get better and better as time goes on depending on how good your standard filters are because those are what the filter bases it's statistics on. Also, I have my filter set at 99.something % in all cases. Dan |
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Sorry for the continued questions but you are saying the higher you set the % the better the filtering? Also the more we update our keyword filters/attachments etc.. the better the Bayseian filter becomes? I thought the filter was there to lessen the amount of manual filters we have to create? but you say it builds it's list from the filters we have manually put in place? I created a query and since Monday when I deleted the Corpus DB files and restarted the Bayseian filter has only blocked 10 message and we have tens of thousands per day so something doesn't seem setup correctly yet? |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
First, No, the percent is the level thatthe filter blocks at and the recomended value is in the 99% range so that you don't get overly agressive. Second, the Bayesian filter builds it's information on what the filters teach it about the content of the messages that are blocked and as time goes on, it learns what the spam looks like. In theory, once a good database is built, you could remove filters but I would not do that because I think that after a few days, it would start to reduce it's ability to detect garbage. dan |
|
![]() |
|
Benny ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
How do I know learning is on? So far my email block is over 20,000, but in the corpus.ini, it only shows 1520
|
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.111 seconds.