Bayesian Filter questions amd problems
Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=5489
Printed Date: 05 February 2025 at 9:51am
Topic: Bayesian Filter questions amd problems
Posted By: robj
Subject: Bayesian Filter questions amd problems
Date Posted: 06 February 2006 at 2:06pm
How long does this take to "learn" our emails? It kicked in on Friday and I've been releasing every email it seems from the quaratine. Once I release it it does seem to learn, mostly. But a first time email to a user and it's always held. It's running me ragged. Any help?
I understand I can disable it by putting the value to 0. I don't want to do that but I want to make sure I'm doing everything correctly.
I have also a created a white-to.txt with a list of only valid emails in our system to try and cut out a lot of unwanted junk. Is this advisable?
Thanks for the help,
Rob
|
Replies:
Posted By: Guests
Date Posted: 06 February 2006 at 8:51pm
I would like some more info on this as well. I recently purchased the software and am testing it on 1 domain currenlty that received quite a bit of spam. So far the system has taken 10,000 emails and only 1,800 of them have been passed.
I have noticed that the Bayesian Filter stilll shows that everything is at 0% and passes the email that is clearly spam.
Any help oy mabye I just have something setup incorrectly.
|
Posted By: LogSat
Date Posted: 06 February 2006 at 9:07pm
Jason, Robj,
The Bayesian filter kicks in after SpamFilter has received and processed 5,000 good emails and 5,000 spam emails. Before those limits are reached, SpamFilter will only build its internal statistical database.
Please note that the statistical analysis only occurs after all other filters have failed to catch spam, so that even the the Bayesian filter becomes active, the number of emails it will block will be very small when compared to others.
As a comparison, the following shows the current number of emails blocked by the various filter on our own SpamFilter installation. You'll see that the Bayesian filter has a very low count, but that is normal as that simply means that all the other filters combined allowed 107 emails to slip thru...
64937 IP found in MAPS search 16230 IP address is from a blacklisted country 15726 SPF Sender Policy Framework match 12604 Exceeded maximum number of RCPT TO 9693 Invalid sender domain MX record 6434 URL in email found in SURBL search 3399 Keywords found in content 588 Mail From and Mail To domains are equal 486 IP blocked by honeypot entry 395 Virus Found in email 107 Statistical filter match 9 Mail From and Mail To are equal 1 Domain is in local blacklist file
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: Guests
Date Posted: 06 February 2006 at 9:21pm
But my problem is everything is coming in at 100%, tonight I'm still pulling out good emails. I can get some counts tomorrow (I'm at home now), but my quaratine DB is 175meg, (starts at 3800, ends about 18000), when I sort on the reject over half is the statistical 100% spam. so that's several thousand. My counts are
connections 82000, forwarded 9300, blocked 44700, attempts 40500.
This was an upgrade from a 2+ year old V 1.0. What am I missing. I guess I'd rather have it working less than more because of all the calls I'm getting.
Thanks, Rob
|
Posted By: Guests
Date Posted: 06 February 2006 at 9:23pm
Posted By: LogSat
Date Posted: 06 February 2006 at 10:33pm
Robj,
It may be better to start with a fresh/clean statistical corpus database in case the one you have became corrupted.
To do so, can you please stop SpamFilter, then delete or rename the SpamFilter\corpus directory, and then restart SpamFilter.
Please note that the Bayesian filter will again need to process the initial 5,000 good and 5,000 spam emails.
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: robj
Date Posted: 07 February 2006 at 9:22am
Ok Stopped and reset this morning. Usual junk is flowing again. My Bays number on the pie chart was 6600. Not sure how to reset this but I did reset the main counters. Stuff showing 0% spam match getting through.
Any interest in checking out the corpus DB? It was full, lots of the tokens had the same number, many were different.
Rob
|
Posted By: LogSat
Date Posted: 07 February 2006 at 4:17pm
Sure, go ahead and zip us the whole corpus subdirectory at support at logsat dot com.
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: Lee
Date Posted: 01 March 2006 at 11:39pm
Roberto I noticed that all of the sudden my Bayesian filter is all of the sudden stopping emails from friends that I could receive on a few days ago.
It is possible that there is a problem with my Corpus database but is there any way to repair it without losing all of the tokens that it has collected over such a long period of time ?
Lee
|
Posted By: LogSat
Date Posted: 02 March 2006 at 8:35pm
Lee,
You can't directly modify the corpus database, but you can cheat... If you have the original, unmodified source of the emails that you received from them, you could forward them so that SpamFilter processes them again, ensuring that when you re-send them you are whitelisting them. This way the Bayesian filter will "learn" that they are good emails and will adapt. You should also make sure you force-delivery of the good quarantined emails as this will cause SF to "undo" the entries it added to the Baysian database, and will actually additionally update to "heavily mark" those tokens as good for the future.
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: Guests
Date Posted: 03 March 2006 at 9:16am
Well since I corrected the cache blacklist thing I've been working fine. But my Bays filter hasn't caught a single email.
Rob
|
Posted By: LogSat
Date Posted: 03 March 2006 at 4:12pm
Have you upgraded to the latest versions of SpamFilter? One of the most visible improvements in the new version is
a greater effectiveness of the Bayesian filter. Its spam catch rate has,
sometimes, increased 100-fold.
This is the release note that applies:
// New to VersionNumber
= '2.7.1.526'; {TODO -cNew : Added DoNotStartWithoutAV option in
SpamFilter.ini file to prevent SpamFilter from running unless the antivirus is
working} {TODO -cFix : Greatly improved Bayesian filter
accuracy}
------------- Roberto Franceschetti
http://www.logsat.com" rel="nofollow - LogSat Software
http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP
|
Posted By: Guests
Date Posted: 06 March 2006 at 11:30am
Per Dan's suggestion, Ive created a new corpus directory. Meanwhile some of the domains that are failing MX checks are elon.edu, aapa.org, and gci.net. Sorry I don't have the full headers at the moment. I went ahead and forced the messages through.
|
Posted By: Guests
Date Posted: 07 March 2006 at 8:13am
Some more potential false positives ...
Received: from 205.188.139.137 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 08:01:11 -0500 Received: from [...] by imo-d23.mx.aol.com (mail_out_v38_r7.3.) id 3.214.1439293c (3657) for < mailto:ellen@clator.com - ... >; Tue, 7 Mar 2006 08:01:07 -0500 (EST) From: mailto:Dudekkandj@aol.com - [...] Message-ID: < mailto:...@aol.com - ...@aol.com > Date: Tue, 7 Mar 2006 08:01:07 EST Subject: Club Mom info from [...] To: [...]MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="part1_214.1439293c.313ede13_boundary" X-Mailer: 9.0 SE for Windows sub 5021 X-Spam-Flag: NO X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy X-SF-RX-Return-Path: < mailto:Dudekkandj@aol.com - ... > X-SF-HELO-Domain: imo-d23.mx.aol.com
|
Posted By: Guests
Date Posted: 07 March 2006 at 8:18am
and another ... personall details removed again to keep the bots from picking it up.
Received: from 204.127.192.82 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Mon, 6 Mar 2006 19:34:08 -0500 Received: from mack ([node].hsd1.va.comcast.net[69.143.209.237]) by comcast.net (rwcrmhc12) with SMTP id <20060307003406m12001ichve>; Tue, 7 Mar 2006 00:34:07 +0000 Message-ID: < mailto:000a01c6417e$9c3c9aa0$6401a8c0@mack - 000a01c6417e$9c3c9aa0$6401a8c0@mack > Reply-To: "..." < mailto:...@comcast.net - ...@comcast.net > From: "..." < mailto:...@comcast.net - ...@comcast.net > To: < mailto:clator@clator.com - ... > Subject: Simpsons video Date: Mon, 6 Mar 2006 19:32:26 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0007_01C64154.B2EC5990" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2741.2600 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2742.200 X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy X-SF-RX-Return-Path: < mailto:...@comcast.net - ...@comcast.net > X-SF-HELO-Domain: rwcrmhc12.comcast.net
|
Posted By: Guests
Date Posted: 07 March 2006 at 2:06pm
more. several legit mails from Amazon.com getting caught. Here's an example.
Received: from 207.171.160.42 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:33:57 -0500 Received: from na-rte-app-5102.iad5.amazon.com ([10.216.250.37]) by mm-notify-out-2103.amazon.com with ESMTP; 07 Mar 2006 10:32:55 -0800 Received: by na-rte-app-5102.iad5.amazon.com id AAA-notification-29959,8591; 7 Mar 2006 10:32:37 -0800 Date: 7 Mar 2006 10:32:37 -0800 Message-ID: <...@na-rte-app-5102.iad5.amazon.com> X-AMAZON-TRACK: notification To: ... mailto:...@clator.com - @clator.com From: "Amazon.com Payments" < mailto:gameowner@msn.com - gameowner@msn.com > Subject: Your Amazon Marketplace Purchase Cc: mailto:payments-mail@amazon.com - payments-mail@amazon.com Bounces-to: mailto:...@bounces.amazon.com - ...@bounces.amazon.com Content-Type: text/plain MIME-Version: 1.0 X-AMAZON-MAIL-RELAY-TYPE: notification X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy X-SF-RX-Return-Path: < mailto:...@bounces.amazon.com - ...@bounces.amazon.com > X-SF-HELO-Domain: mm-notify-out-2103.amazon.com
|
Posted By: Guests
Date Posted: 07 March 2006 at 2:08pm
a legit one from turner.com ...
Received: from 64.236.240.147 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:52:01 -0500 Received: from CNNCIMSS05.turner.com (cnncimss05.turner.com [10.188.171.204]) by smtpgw2.turner.com (8.12.10/8.12.11) with ESMTP id k27Ipw4c020541 for < mailto:...@clator.com - ...@clator.com >; Tue, 7 Mar 2006 13:51:59 -0500 (EST) Received: from ATLBH01.turner.com ([10.188.157.231]) by CNNCIMSS05.turner.com with InterScan Messaging Security Suite; Tue, 07 Mar 2006 13:51:58 -0500 Received: from ATLPF02.turner.com ([10.188.156.206]) by ATLBH01.turner.com with Microsoft SMTPSVC(6.0.3790.211); Tue, 7 Mar 2006 13:51:58 -0500 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C64218.363A93AC" X-MimeOLE: Produced By Microsoft Exchange V6.5 Subject: Top Stories Date: Tue, 7 Mar 2006 13:51:58 -0500 Message-ID: < mailto:BA010952EAB04749A22AE36AB1A1037C0265C3FF@ATLPF02.turner.com - BA010952EAB04749A22AE36AB1A1037C0265C3FF@ATLPF02.turner.com > X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: General Comments Thread-Index: AcZCGDY1/nirXhlcS1SlO6wZI18H7gAAAAAK From: "News in General" < mailto:Topstories4@turner.com - Topstories4@turner.com > To: < mailto:...@clator.com - ...@clator.com > X-OriginalArrivalTime: 07 Mar 2006 18:51:58.0531 (UTC) FILETIME=[365C6130:01C64218] X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy X-SF-RX-Return-Path: < mailto:Topstories4@turner.com - Topstories4@turner.com > X-SF-HELO-Domain: smtpgw2.turner.com
|
Posted By: Guests
Date Posted: 07 March 2006 at 2:09pm
and a legit one from ebay
Received: from 66.135.209.211 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 09:32:18 -0500 Received: from sjcbat03.sjc.ebay.com (sjcbat03.sjc.ebay.com [10.6.37.42]) by mx33.smf.ebay.com (8.13.5/8.13.5) with ESMTP id k27EWGwX002986 for < mailto:...@clator.com - ...@clator.com >; Tue, 7 Mar 2006 06:32:17 -0800 DomainKey-Signature: a=rsa-sha1; s=dk; d=ebay.com; c=nofws; q=dns; h=x-ebay-mailtracker:to:from:mime-version:content-type :subject:date:message-id; b=kjHoXTu0OJkocdhD7jpQb8TeK6un9jhK4UxfXbeZjRj+AAUds5rm Q29+lSiLB0UDc KnHdvzITNMFJjcx/TXXMQIO3w3Xhzz4xaquXIxz4hfiGPb+hku9yyY oH5LH0L+nCW7L wXWkFETSXy0yraUwJz996/G35Bg3ywRHDUq1bCo= X-eBay-MailTracker: 10102.425.0.0 To: mailto:...@clator.com - ...@clator.com From: mailto:favoritesellers@ebay.com - favoritesellers@ebay.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=3635219.1141741916156.JavaMail.ebba.sjcbat03 Subject: Your eBay Favorite Seller's New Items! Date: Tue, 7 Mar 2006 06:31:56 PST Message-ID: < mailto:2904989.1141741916198.JavaMail.ebba@sjcbat03 - 2904989.1141741916198.JavaMail.ebba@sjcbat03 > X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy X-SF-RX-Return-Path: < mailto:favoritesellers@ebay.com - favoritesellers@ebay.com > X-SF-HELO-Domain: mx33.smf.ebay.com
|
Posted By: Guests
Date Posted: 07 March 2006 at 2:13pm
and lastly, three of these were caught. All of the above postings were just in the past six hours and were only to me (not being an actual ISP, it's jsut me and the missus using the clator.com domain).
Hopefully these will point to some issues. In case it hasn't been said, thanks for any help you can provide.
Received: from 68.230.240.34 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:56:40 -0500 Received: from willowoffice ([70.187.202.109]) by eastrmmtao05.cox.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id < mailto:...eastrmmtao05.cox.net@willowoffice - ...eastrmmtao05.cox.net@willowoffice > for < mailto:...@clator.com - ...@clator.com >; Tue, 7 Mar 2006 13:54:21 -0500 From: "..." < mailto:...@willowtreemedia.com - ...@willowtreemedia.com > To: < mailto:...@clator.com - ...@clator.com > Subject: FW: BIOS and Photos for Fertility C.A.R.E Date: Tue, 7 Mar 2006 13:50:19 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0033_01C641EE.12D79CE0" X-Mailer: Microsoft Office Outlook, Build 11.0.6353 Thread-Index: AcYQqJIy9GiFULfASdW33ufOnhb1nQxboDEg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506 Message-Id: < mailto:....eastrmmtao05.cox.net@willowoffice - ....eastrmmtao05.cox.net@willowoffice > X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy X-SF-RX-Return-Path: < mailto:...@willowtreemedia.com - ...@willowtreemedia.com > X-SF-HELO-Domain: eastrmmtao05.cox.net
|
Posted By: Guests
Date Posted: 07 March 2006 at 2:15pm
P.S. Humble apologies. I posted these into the wrong thread. My intent was for them to go into the MX filter thread. I'll try to move them.
|
|