Print Page | Close Window

Bayesian Filter questions amd problems

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=5489
Printed Date: 05 February 2025 at 9:51am


Topic: Bayesian Filter questions amd problems
Posted By: robj
Subject: Bayesian Filter questions amd problems
Date Posted: 06 February 2006 at 2:06pm

How long does this take to "learn"
our emails?  It kicked in on Friday and I've been releasing every email it seems from the quaratine.  Once I release it it does seem to learn, mostly.  But a first time email to a user and it's always held.  It's running me ragged.  Any help?

I understand I can disable it by putting the value to 0.  I don't want to do that but I want to make sure I'm doing everything correctly.

I have also a created a white-to.txt with a list of only valid emails in our system to try and cut out a lot of unwanted junk.  Is this advisable?

Thanks for the help,

Rob




Replies:
Posted By: Guests
Date Posted: 06 February 2006 at 8:51pm
I would like some more info on this as well.  I recently purchased the software and am testing it on 1 domain currenlty that received quite a bit of spam.   So far the system has taken 10,000 emails and only 1,800 of them have been passed. 

I have noticed that the Bayesian Filter stilll shows that everything is at 0% and passes the email that is clearly spam.

Any help oy mabye I just have something setup incorrectly.


Posted By: LogSat
Date Posted: 06 February 2006 at 9:07pm
Jason, Robj,

The Bayesian filter kicks in after SpamFilter has received and processed 5,000 good emails and 5,000 spam emails. Before those limits are reached, SpamFilter will only build its internal statistical database.

Please note that the statistical analysis only occurs after all other filters have failed to catch spam, so that even the the Bayesian filter becomes active, the number of emails it will block will be very small when compared to others.

As a comparison, the following shows the current number of emails blocked by the various filter on our own SpamFilter installation. You'll see that the Bayesian filter has a very low count, but that is normal as that simply means that all the other filters combined allowed 107 emails to slip thru...

64937  IP found in MAPS search
16230  IP address is from a blacklisted country
15726  SPF Sender Policy Framework match
12604  Exceeded maximum number of RCPT TO
9693   Invalid sender domain MX record
6434   URL in email found in SURBL search
3399   Keywords found in content
588    Mail From and Mail To domains are equal
486    IP blocked by honeypot entry
395    Virus Found in email
107    Statistical filter match
9      Mail From and Mail To are equal
1    Domain is in local blacklist file


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: Guests
Date Posted: 06 February 2006 at 9:21pm

But my problem is everything is coming in at 100%, tonight I'm still pulling out good emails.  I can get some counts tomorrow (I'm at home now), but my quaratine DB is 175meg, (starts at 3800, ends about 18000),  when I sort on the reject over half is the statistical 100% spam.  so that's several thousand.  My counts are

connections 82000,  forwarded 9300, blocked 44700, attempts 40500.

This was an upgrade from a 2+ year old V 1.0.  What am I missing.  I guess I'd rather have it working less than more because of all the calls I'm getting.

Thanks, Rob



Posted By: Guests
Date Posted: 06 February 2006 at 9:23pm
oh good emails are 9300


Posted By: LogSat
Date Posted: 06 February 2006 at 10:33pm
Robj,

It may be better to start with a fresh/clean statistical corpus database in case the one you have became corrupted.

To do so, can you please stop SpamFilter, then delete or rename the SpamFilter\corpus directory, and then restart SpamFilter.

Please note that the Bayesian filter will again need to process the initial 5,000 good and 5,000 spam emails.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: robj
Date Posted: 07 February 2006 at 9:22am

Ok Stopped and reset this morning.  Usual junk is flowing again.  My Bays number on the pie chart was 6600.  Not sure how to reset this but I did reset the main counters.  Stuff showing 0% spam match getting through.

Any interest in checking out the corpus DB?  It was full, lots of the tokens had the same number, many were different.

Rob



Posted By: LogSat
Date Posted: 07 February 2006 at 4:17pm
Sure, go ahead and zip us the whole corpus subdirectory at support at logsat dot com.

-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: Lee
Date Posted: 01 March 2006 at 11:39pm

Roberto I noticed that all of the sudden my Bayesian filter is all of the sudden stopping emails from friends that I could receive on a few days ago.

It is possible that there is a problem with my Corpus database but is there any way to repair it without losing all of the tokens that it has collected over such a long period of time ?

Lee



Posted By: LogSat
Date Posted: 02 March 2006 at 8:35pm
Lee,

You can't directly modify the corpus database, but you can cheat... If you have the original, unmodified source of the emails that you received from them, you could forward them so that SpamFilter processes them again, ensuring that when you re-send them you are whitelisting them. This way the Bayesian filter will "learn" that they are good emails and will adapt. You should also make sure you force-delivery of the good quarantined emails as this will cause SF to "undo" the entries it added to the Baysian database, and will actually additionally update to "heavily mark" those tokens as good for the future.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: Guests
Date Posted: 03 March 2006 at 9:16am

Well since I corrected the cache blacklist thing I've been working fine.  But my Bays filter hasn't caught a single email. 

 

Rob



Posted By: LogSat
Date Posted: 03 March 2006 at 4:12pm
Have you upgraded to the latest versions of SpamFilter? One of the most visible improvements in the new version is a greater effectiveness of the Bayesian filter. Its spam catch rate has, sometimes, increased 100-fold.
 
This is the release note that applies:
 
 // New to VersionNumber = '2.7.1.526';
{TODO -cNew : Added DoNotStartWithoutAV option in SpamFilter.ini file to prevent SpamFilter from running unless the antivirus is working}
{TODO -cFix : Greatly improved Bayesian filter accuracy}



-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: Guests
Date Posted: 06 March 2006 at 11:30am
Per Dan's suggestion, Ive created a new corpus directory.  Meanwhile some of the domains that are failing MX checks are elon.edu, aapa.org, and gci.net.  Sorry I don't have the full headers at the moment.  I went ahead and forced the messages through.


Posted By: Guests
Date Posted: 07 March 2006 at 8:13am

Some more potential false positives ...

Received: from 205.188.139.137 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 08:01:11 -0500
Received: from [...] by imo-d23.mx.aol.com (mail_out_v38_r7.3.) id 3.214.1439293c (3657)
  for < mailto:ellen@clator.com - ... >; Tue, 7 Mar 2006 08:01:07 -0500 (EST)
From: mailto:Dudekkandj@aol.com - [...] Message-ID: < mailto:...@aol.com - ...@aol.com >
Date: Tue, 7 Mar 2006 08:01:07 EST
Subject: Club Mom info from [...]
To: [...]MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="part1_214.1439293c.313ede13_boundary"
X-Mailer: 9.0 SE for Windows sub 5021
X-Spam-Flag: NO
X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy
X-SF-RX-Return-Path: < mailto:Dudekkandj@aol.com - ... >
X-SF-HELO-Domain: imo-d23.mx.aol.com



Posted By: Guests
Date Posted: 07 March 2006 at 8:18am

and another ...  personall details removed again to keep the bots from picking it up.

Received: from 204.127.192.82 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Mon, 6 Mar 2006 19:34:08 -0500
Received: from mack ([node].hsd1.va.comcast.net[69.143.209.237])
          by comcast.net (rwcrmhc12) with SMTP
          id <20060307003406m12001ichve>; Tue, 7 Mar 2006 00:34:07 +0000
Message-ID: < mailto:000a01c6417e$9c3c9aa0$6401a8c0@mack - 000a01c6417e$9c3c9aa0$6401a8c0@mack >
Reply-To: "..." < mailto:...@comcast.net - ...@comcast.net >
From: "..." < mailto:...@comcast.net - ...@comcast.net >
To: < mailto:clator@clator.com - ... >
Subject: Simpsons video
Date: Mon, 6 Mar 2006 19:32:26 -0500
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="----=_NextPart_000_0007_01C64154.B2EC5990"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2741.2600
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2742.200
X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy
X-SF-RX-Return-Path: < mailto:...@comcast.net - ...@comcast.net >
X-SF-HELO-Domain: rwcrmhc12.comcast.net



Posted By: Guests
Date Posted: 07 March 2006 at 2:06pm

more.  several legit mails from Amazon.com getting caught.  Here's an example.

Received: from 207.171.160.42 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:33:57 -0500
Received: from na-rte-app-5102.iad5.amazon.com ([10.216.250.37])
  by mm-notify-out-2103.amazon.com with ESMTP; 07 Mar 2006 10:32:55 -0800
Received: by na-rte-app-5102.iad5.amazon.com
 id AAA-notification-29959,8591; 7 Mar 2006 10:32:37 -0800
Date: 7 Mar 2006 10:32:37 -0800
Message-ID: <...@na-rte-app-5102.iad5.amazon.com>
X-AMAZON-TRACK: notification
To: ... mailto:...@clator.com - @clator.com
From: "Amazon.com Payments" < mailto:gameowner@msn.com - gameowner@msn.com >
Subject: Your Amazon Marketplace Purchase
Cc: mailto:payments-mail@amazon.com - payments-mail@amazon.com
Bounces-to: mailto:...@bounces.amazon.com - ...@bounces.amazon.com
Content-Type: text/plain
MIME-Version: 1.0
X-AMAZON-MAIL-RELAY-TYPE: notification
X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy
X-SF-RX-Return-Path: < mailto:...@bounces.amazon.com - ...@bounces.amazon.com >
X-SF-HELO-Domain: mm-notify-out-2103.amazon.com



Posted By: Guests
Date Posted: 07 March 2006 at 2:08pm

a legit one from turner.com ...

Received: from 64.236.240.147 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:52:01 -0500
Received: from CNNCIMSS05.turner.com (cnncimss05.turner.com [10.188.171.204])
 by smtpgw2.turner.com (8.12.10/8.12.11) with ESMTP id k27Ipw4c020541
 for < mailto:...@clator.com - ...@clator.com >; Tue, 7 Mar 2006 13:51:59 -0500 (EST)
Received: from ATLBH01.turner.com ([10.188.157.231]) by CNNCIMSS05.turner.com with InterScan Messaging Security Suite; Tue, 07 Mar 2006 13:51:58 -0500
Received: from ATLPF02.turner.com ([10.188.156.206]) by ATLBH01.turner.com with Microsoft SMTPSVC(6.0.3790.211);
  Tue, 7 Mar 2006 13:51:58 -0500
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="----_=_NextPart_001_01C64218.363A93AC"
X-MimeOLE: Produced By Microsoft Exchange V6.5
Subject: Top Stories
Date: Tue, 7 Mar 2006 13:51:58 -0500
Message-ID: < mailto:BA010952EAB04749A22AE36AB1A1037C0265C3FF@ATLPF02.turner.com - BA010952EAB04749A22AE36AB1A1037C0265C3FF@ATLPF02.turner.com >
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: General Comments
Thread-Index: AcZCGDY1/nirXhlcS1SlO6wZI18H7gAAAAAK
From: "News in General" < mailto:Topstories4@turner.com - Topstories4@turner.com >
To: < mailto:...@clator.com - ...@clator.com >
X-OriginalArrivalTime: 07 Mar 2006 18:51:58.0531 (UTC) FILETIME=[365C6130:01C64218]
X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy
X-SF-RX-Return-Path: < mailto:Topstories4@turner.com - Topstories4@turner.com >
X-SF-HELO-Domain: smtpgw2.turner.com



Posted By: Guests
Date Posted: 07 March 2006 at 2:09pm

and a legit one from ebay

Received: from 66.135.209.211 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 09:32:18 -0500
Received: from sjcbat03.sjc.ebay.com (sjcbat03.sjc.ebay.com [10.6.37.42])
 by mx33.smf.ebay.com (8.13.5/8.13.5) with ESMTP id k27EWGwX002986
 for < mailto:...@clator.com - ...@clator.com >; Tue, 7 Mar 2006 06:32:17 -0800
DomainKey-Signature: a=rsa-sha1; s=dk; d=ebay.com; c=nofws; q=dns;
 h=x-ebay-mailtracker:to:from:mime-version:content-type :subject:date:message-id;
 b=kjHoXTu0OJkocdhD7jpQb8TeK6un9jhK4UxfXbeZjRj+AAUds5rm Q29+lSiLB0UDc
 KnHdvzITNMFJjcx/TXXMQIO3w3Xhzz4xaquXIxz4hfiGPb+hku9yyY oH5LH0L+nCW7L
 wXWkFETSXy0yraUwJz996/G35Bg3ywRHDUq1bCo=
X-eBay-MailTracker: 10102.425.0.0
To: mailto:...@clator.com - ...@clator.com
From: mailto:favoritesellers@ebay.com - favoritesellers@ebay.com
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=3635219.1141741916156.JavaMail.ebba.sjcbat03
Subject: Your eBay Favorite Seller's New Items!
Date: Tue, 7 Mar 2006 06:31:56 PST
Message-ID: < mailto:2904989.1141741916198.JavaMail.ebba@sjcbat03 - 2904989.1141741916198.JavaMail.ebba@sjcbat03 >
X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy
X-SF-RX-Return-Path: < mailto:favoritesellers@ebay.com - favoritesellers@ebay.com >
X-SF-HELO-Domain: mx33.smf.ebay.com



Posted By: Guests
Date Posted: 07 March 2006 at 2:13pm

and lastly, three of these were caught.   All of the above postings were just in the past six hours and were only to me (not being an actual ISP, it's jsut me and the missus using the clator.com domain).

Hopefully these will point to some issues.  In case it hasn't been said, thanks for any help you can provide.

Received: from 68.230.240.34 by clator.com (LogSat Software SMTP Server - Unlicensed Evaluation Copy) Tue, 7 Mar 2006 13:56:40 -0500
Received: from willowoffice ([70.187.202.109]) by eastrmmtao05.cox.net
          (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP
          id < mailto:...eastrmmtao05.cox.net@willowoffice - ...eastrmmtao05.cox.net@willowoffice >
          for < mailto:...@clator.com - ...@clator.com >; Tue, 7 Mar 2006 13:54:21 -0500
From: "..." < mailto:...@willowtreemedia.com - ...@willowtreemedia.com >
To: < mailto:...@clator.com - ...@clator.com >
Subject: FW: BIOS and Photos for Fertility C.A.R.E
Date: Tue, 7 Mar 2006 13:50:19 -0500
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----=_NextPart_000_0033_01C641EE.12D79CE0"
X-Mailer: Microsoft Office Outlook, Build 11.0.6353
Thread-Index: AcYQqJIy9GiFULfASdW33ufOnhb1nQxboDEg
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506
Message-Id: < mailto:....eastrmmtao05.cox.net@willowoffice - ....eastrmmtao05.cox.net@willowoffice >
X-Server: LogSat Software SMTP Server - Unlicensed Evaluation Copy
X-SF-RX-Return-Path: < mailto:...@willowtreemedia.com - ...@willowtreemedia.com >
X-SF-HELO-Domain: eastrmmtao05.cox.net



Posted By: Guests
Date Posted: 07 March 2006 at 2:15pm

P.S. Humble apologies.  I posted these into the wrong thread.   My intent was for them to go into the MX filter thread.  I'll try to move them.




Print Page | Close Window