.TOKEN files |
Post Reply |
Author | |
Ric Marques
Guest Group |
Post Options
Thanks(0)
Posted: 20 November 2003 at 8:13pm |
Roberto, Is it normal for the files in the /corpus/queue directory to be as large as 273k? I did the upgrade to .263, deleted the files in the /corpus and /corpus/queue directories and restarted. The same issue is happening. Right now there are 65 files in that directory, 7 of which are 273k each. The others range in size from 1k-5k and there are some 0k .tmp files there as well. -Ric |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Ric, No, actually it's not normal. There's an upper limit on the msg size scanned for tokens, and for performance reasons that default to 64KB. Furthermore binary attachments are not scanned, only text/html content in the emails is. Can you check the spamfilter.ini file for the line: MaxMsgSizeForKeywordScan and ensure the value is 64 or less? Roberto F. |
|
Ric Marques
Guest Group |
Post Options
Thanks(0)
|
Roberto - The key is there. I was only seeing the large files created from messages that were released from quarantine. The .TOKEN file was flagged as '.falsepositive' in the first line and the entire file content appeared as mime encoded gobbletygook - here's the first few lines: <snip> It looks like messages that are released from quarantine are being parsed for tokens in the binary area possibly??? I re-installed .263 this morning, and I haven't seen anything like this appear yet - but there haven't been any messages bouncing back with attachments like yesterday. Unfortunately, it also looks like there's another problem - the corpus file isn't saving. I ran for a couple of hours this AM and there were no changes to the corpus.ini or db.dat files. I stopped the service and restarted - the two files updated at that time, but only showed 1 good/1 spam in the .ini.... and SpamFilterISP had processed over a thousand messages. The logfile shows the corpus file being saved at startup, but not again: At startup: subsequent log entries: I hope this is helpful... I'll keep a close eye on what's happening... -Ric |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Ric, Thanks for the reports. The msgs released form the quarantine are scanned and tokenized, since this allows the filter to learn that message like them are not spam. Subsequent emails with similar content will be much less likely to be stopped. We're taking a look now if there a bug with scanning the binary attachments in them, as they also should only be scanned for text/html. The corpus database is saved much less frequently now, every 2 hours if traffic is not high, and then when SpamFilter shuts down, this is normal. Roberto F. |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.217 seconds.