Print Page | Close Window

Corpus is 600MB !!!!!

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=6374
Printed Date: 14 March 2025 at 7:38pm


Topic: Corpus is 600MB !!!!!
Posted By: StevenJohns
Subject: Corpus is 600MB !!!!!
Date Posted: 30 January 2008 at 9:39am
Roberto,
 
In the Spamfilter\Corpus folder I can see 286 files...nearly all of them are temp files and are several days old. Do I need them?? they total to over 600MB.
 
If I don't need them, howcome SF isn't deleting them when it's finished with them ?
 
Cheers
 



Replies:
Posted By: LogSat
Date Posted: 30 January 2008 at 8:21pm
SpamFilter will normally delete the temp files in the \SpamFilter\corpus directory. However if there's ever any problems with updating the corpus, we are leaving the temp files left over from that operation there for troubleshooting purposes. You can safely delete any leftover ones.

In regards to the large database size, I'd recommend resetting the Bayesian corpus database to start with a fresh one. To do so, please stop SpamFilter, delete or rename the \SpamFilter\corpus directory, and restart SpamFilter.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP


Posted By: StevenJohns
Date Posted: 31 January 2008 at 4:47am
Roberto,
 
>>>  In regards to the large database size, I'd recommend resetting the Bayesian corpus database to start with a fresh one. To do so, please stop SpamFilter, delete or rename the \SpamFilter\corpus directory, and restart SpamFilter.
 
I thing you misunderstood. The corpus is not 600MB, that is the total size of all the tmp files that SF left behind.
 
The corpus is 3MB. I do not want to delete the corpus as it has several months worth of data in it. The last time this happened you told me to delete the corpus. I would rather find out why SF is having issues with the corpus and fix the problem rather than using a "work around".
 
Do you have any suggestions as to where I could start to look to find the cause of this??
 
 
Cheers


Posted By: LogSat
Date Posted: 31 January 2008 at 3:42pm
3MB is not an issue then, sorry I misunderstood :-)
If the rest of the temp files are created in an interval of several days, it should not be a huge concern, even though their number is at first sight a bit excessive. If the "interval" from first to last is 2-4 months, I would not worry. If it is in the order of 2-3 weeks, then we may want to look at the SpamFilter's logs to ensure there's no issues. If you have a case were there are multiple temp files with a datestamp for the same day, we'd start there.


-------------
Roberto Franceschetti

http://www.logsat.com" rel="nofollow - LogSat Software

http://www.logsat.com/sfi-spam-filter.asp" rel="nofollow - Spam Filter ISP



Print Page | Close Window