Bayesian doesn't seem to work |
Post Reply |
Author | |
peet
Newbie Joined: 01 August 2007 Location: United States Status: Offline Points: 21 |
Post Options
Thanks(0)
Posted: 19 October 2009 at 10:55pm |
I'm not sure why, but Bayesian filter doesn't seem to filter out any e-mails.
I've cranked it down over the weeks little at a time and now I'm at 13.929% and still nothing seem to get caught. I took an e-mail's raw content and dumped it to the Bayesian Probability screen, clicked the Show Bayes Prob, and on the Corpus Database tab I got: 10/19/09 19:50:56:203 -- **** R E S U L T S ********* 10/19/09 19:50:56:203 -- passes Bayesian filter - 0% spam 10/19/09 19:51:23:968 -- **** R E S U L T S ********* 10/19/09 19:51:23:968 -- passes Bayesian filter - 0% spam 10/19/09 19:51:49:781 -- **** R E S U L T S ********* 10/19/09 19:51:49:781 -- passes Bayesian filter - 0% spam I have learn new incoming enabled. The folder: \SpamFilter\corpus has: db.dat at 77MB db.dat.prb at 61 MB Corpus.ini file says: (but I'm not sure what that means) [Messages] Spam=503515 Good=1452 Any thoughts? |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
peet,
The Bayesian filter will start blocking emails only after 5000 valid emails and 5000 spam have been received. This is because this filter needs enough initial data on the incoming traffic to make accurate predictions about the future ones. In your case, you only received 1452 good emails, so the Bayesian filter is still in "learning" mode, analyzing the incoming traffic without stopping any. Also pelase note that, when primed, this filter will be very selective. Most emails will be either 0.001% clean, or 99.99% spam. You will see thus very "clear-cut" probabilities that an email is either spam or not. In addition, since this filter is the one that is applied for last, after all the other filters have been applied, most of the spam will already have been caught by the other filters, so there will be very little left for this filter to stop. Often the Bayesian filter will block less than 0.1% - 1% of the spam when compared to the other filters, as most spam will already have been blocked.
|
|
peet
Newbie Joined: 01 August 2007 Location: United States Status: Offline Points: 21 |
Post Options
Thanks(0)
|
Roberto,
Thanks! Can this be expedited? Meaning the 5000 reduced to 2000 for example? Also, in the e-mail headers, will the Bayesian add the % probability of it being spam or not? In the Web quarantine review, I'd like to show based on some header data if an e-mail is Low probability of being spam, medium or high probability of being spam.
|
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Yes, this can be changed, but we don't recommend to as it may cause inaccurate results. If you still wish to proceed, look for the setting:
MinEmailsForBayesKickIn=5000 in the SpamFilter.ini file. There's no need to restart SpamFilter after the change. The Bayesian filter will not log its value in the headers, especially since, as I mentioned earlier, it's only used for less than 1% of the incoming emails, meaning that more than 99% of the email will already have been blocked before the bayesian filter has a chance to look at them, making any stats it would compute for the remaining 1% not very useful.
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.203 seconds.