Print Page | Close Window

Bayesian doesn't seem to work

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=6763
Printed Date: 04 January 2025 at 7:20pm


Topic: Bayesian doesn't seem to work
Posted By: peet
Subject: Bayesian doesn't seem to work
Date Posted: 19 October 2009 at 10:55pm
I'm not sure why, but Bayesian filter doesn't seem to filter out any e-mails.
I've cranked it down over the weeks little at a time and now I'm at 13.929% and still nothing seem to get caught.

I took an e-mail's raw content and dumped it to the Bayesian Probability screen, clicked the Show Bayes Prob, and on the Corpus Database tab I got:


10/19/09 19:50:56:203 -- **** R E S U L T S *********
10/19/09 19:50:56:203 -- passes Bayesian filter - 0% spam
10/19/09 19:51:23:968 -- **** R E S U L T S *********
10/19/09 19:51:23:968 -- passes Bayesian filter - 0% spam
10/19/09 19:51:49:781 -- **** R E S U L T S *********
10/19/09 19:51:49:781 -- passes Bayesian filter - 0% spam

I have learn new incoming enabled.
The folder: \SpamFilter\corpus has:
db.dat at 77MB
db.dat.prb at 61 MB

Corpus.ini file says:  (but I'm not sure what that means)
[Messages]
Spam=503515
Good=1452

Any thoughts?



Replies:
Posted By: LogSat
Date Posted: 20 October 2009 at 8:45pm
peet,

The Bayesian filter will start blocking emails only after 5000 valid emails and 5000 spam have been received. This is because this filter needs enough initial data on the incoming traffic to make accurate predictions about the future ones. In your case, you only received 1452 good emails, so the Bayesian filter is still in "learning" mode, analyzing the incoming traffic without stopping any.

Also pelase note that, when primed, this filter will be very selective. Most emails will be either 0.001% clean, or 99.99% spam. You will see thus very "clear-cut" probabilities that an email is either spam or not. In addition, since this filter is the one that is applied for last, after all the other filters have been applied, most of the spam will already have been caught by the other filters, so there will be very little left for this filter to stop. Often the Bayesian filter will block less than 0.1% - 1% of the spam when compared to the other filters, as most spam will already have been blocked.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: peet
Date Posted: 20 October 2009 at 9:04pm
Roberto,
Thanks!

Can this be expedited? Meaning the 5000 reduced to 2000 for example?

Also, in the e-mail headers, will the Bayesian add the % probability of it being spam or not?
In the Web quarantine review, I'd like to show based on some header data if an e-mail is Low probability of being spam, medium or high probability of being spam.


Posted By: LogSat
Date Posted: 20 October 2009 at 9:13pm
Yes, this can be changed, but we don't recommend to as it may cause inaccurate results. If you still wish to proceed, look for the setting:

MinEmailsForBayesKickIn=5000

in the SpamFilter.ini file. There's no need to restart SpamFilter after the change.

The Bayesian filter will not log its value in the headers, especially since, as I mentioned earlier, it's only used for less than 1% of the incoming emails, meaning that more than 99% of the email will already have been blocked before the bayesian filter has a chance to look at them, making any stats it would compute for the remaining 1% not very useful.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP



Print Page | Close Window