Bayesian Filter questions amd problems |
Post Reply |
Author | |
robj
Newbie Joined: 06 February 2006 Location: United States Status: Offline Points: 3 |
Post Options
Thanks(0)
Posted: 06 February 2006 at 2:06pm |
How long does this take to "learn" I understand I can disable it by putting the value to 0. I don't want to do that but I want to make sure I'm doing everything correctly. I have also a created a white-to.txt with a list of only valid emails in our system to try and cut out a lot of unwanted junk. Is this advisable? Thanks for the help, Rob Edited by robj |
|
Jason
Guest Group |
Post Options
Thanks(0)
|
I would like some more info on this as well. I recently purchased the software and am testing it on 1 domain currenlty that received quite a bit of spam. So far the system has taken 10,000 emails and only 1,800 of them have been passed.
I have noticed that the Bayesian Filter stilll shows that everything is at 0% and passes the email that is clearly spam. Any help oy mabye I just have something setup incorrectly. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Jason, Robj,
The Bayesian filter kicks in after SpamFilter has received and processed 5,000 good emails and 5,000 spam emails. Before those limits are reached, SpamFilter will only build its internal statistical database. Please note that the statistical analysis only occurs after all other filters have failed to catch spam, so that even the the Bayesian filter becomes active, the number of emails it will block will be very small when compared to others. As a comparison, the following shows the current number of emails blocked by the various filter on our own SpamFilter installation. You'll see that the Bayesian filter has a very low count, but that is normal as that simply means that all the other filters combined allowed 107 emails to slip thru... 64937 IP found in MAPS search 16230 IP address is from a blacklisted country 15726 SPF Sender Policy Framework match 12604 Exceeded maximum number of RCPT TO 9693 Invalid sender domain MX record 6434 URL in email found in SURBL search 3399 Keywords found in content 588 Mail From and Mail To domains are equal 486 IP blocked by honeypot entry 395 Virus Found in email 107 Statistical filter match 9 Mail From and Mail To are equal 1 Domain is in local blacklist file |
|
robj
Guest Group |
Post Options
Thanks(0)
|
But my problem is everything is coming in at 100%, tonight I'm still pulling out good emails. I can get some counts tomorrow (I'm at home now), but my quaratine DB is 175meg, (starts at 3800, ends about 18000), when I sort on the reject over half is the statistical 100% spam. so that's several thousand. My counts are connections 82000, forwarded 9300, blocked 44700, attempts 40500. This was an upgrade from a 2+ year old V 1.0. What am I missing. I guess I'd rather have it working less than more because of all the calls I'm getting. Thanks, Rob |
|
RobJ
Guest Group |
Post Options
Thanks(0)
|
oh good emails are 9300
|
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Robj,
It may be better to start with a fresh/clean statistical corpus database in case the one you have became corrupted. To do so, can you please stop SpamFilter, then delete or rename the SpamFilter\corpus directory, and then restart SpamFilter. Please note that the Bayesian filter will again need to process the initial 5,000 good and 5,000 spam emails. |
|
robj
Newbie Joined: 06 February 2006 Location: United States Status: Offline Points: 3 |
Post Options
Thanks(0)
|
Ok Stopped and reset this morning. Usual junk is flowing again. My Bays number on the pie chart was 6600. Not sure how to reset this but I did reset the main counters. Stuff showing 0% spam match getting through. Any interest in checking out the corpus DB? It was full, lots of the tokens had the same number, many were different. Rob |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Sure, go ahead and zip us the whole corpus subdirectory at support at logsat dot com.
|
|
Lee
Groupie Joined: 04 February 2005 Location: United States Status: Offline Points: 50 |
Post Options
Thanks(0)
|
Roberto I noticed that all of the sudden my Bayesian filter is all of the sudden stopping emails from friends that I could receive on a few days ago. It is possible that there is a problem with my Corpus database but is there any way to repair it without losing all of the tokens that it has collected over such a long period of time ? Lee |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Lee,
You can't directly modify the corpus database, but you can cheat... If you have the original, unmodified source of the emails that you received from them, you could forward them so that SpamFilter processes them again, ensuring that when you re-send them you are whitelisting them. This way the Bayesian filter will "learn" that they are good emails and will adapt. You should also make sure you force-delivery of the good quarantined emails as this will cause SF to "undo" the entries it added to the Baysian database, and will actually additionally update to "heavily mark" those tokens as good for the future. |
|
Robj
Guest Group |
Post Options
Thanks(0)
|
Well since I corrected the cache blacklist thing I've been working fine. But my Bays filter hasn't caught a single email.
Rob |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Have you upgraded to the latest versions of SpamFilter? One of the most visible improvements in the new version is
a greater effectiveness of the Bayesian filter. Its spam catch rate has,
sometimes, increased 100-fold.
This is the release note that applies:
// New to VersionNumber
= '2.7.1.526'; {TODO -cNew : Added DoNotStartWithoutAV option in SpamFilter.ini file to prevent SpamFilter from running unless the antivirus is working} {TODO -cFix : Greatly improved Bayesian filter accuracy} |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
Per Dan's suggestion, Ive created a new corpus directory. Meanwhile some of the domains that are failing MX checks are elon.edu, aapa.org, and gci.net. Sorry I don't have the full headers at the moment. I went ahead and forced the messages through.
|
|
Clator
Guest Group |
Post Options
Thanks(0)
|
Some more potential false positives ... Received: from 205.188.139.137 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 08:01:11 -0500 |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
and another ... personall details removed again to keep the bots from picking it up. Received: from 204.127.192.82 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Mon, 6 Mar 2006 19:34:08 -0500 |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
more. several legit mails from Amazon.com getting caught. Here's an example. Received: from 207.171.160.42 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:33:57 -0500 |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
a legit one from turner.com ... Received: from 64.236.240.147 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:52:01 -0500 |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
and a legit one from ebay Received: from 66.135.209.211 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 09:32:18 -0500 |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
and lastly, three of these were caught. All of the above postings were just in the past six hours and were only to me (not being an actual ISP, it's jsut me and the missus using the clator.com domain). Hopefully these will point to some issues. In case it hasn't been said, thanks for any help you can provide. Received: from 68.230.240.34 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:56:40 -0500 |
|
Clator
Guest Group |
Post Options
Thanks(0)
|
P.S. Humble apologies. I posted these into the wrong thread. My intent was for them to go into the MX filter thread. I'll try to move them. |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.217 seconds.