Print Page | Close Window

.TOKEN files

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=2454
Printed Date: 26 December 2024 at 7:03pm


Topic: .TOKEN files
Posted By: Guests
Subject: .TOKEN files
Date Posted: 20 November 2003 at 8:13pm

Roberto,

Is it normal for the files in the /corpus/queue directory to be as large as 273k?

I did the upgrade to .263, deleted the files in the /corpus and /corpus/queue directories and restarted.  The same issue is happening.  Right now there are 65 files in that directory, 7 of which are 273k each. The others range in size from 1k-5k and there are some 0k .tmp files there as well.

-Ric




Replies:
Posted By: LogSat
Date Posted: 21 November 2003 at 3:31pm

Ric,

No, actually it's not normal. There's an upper limit on the msg size scanned for tokens, and for performance reasons that default to 64KB. Furthermore binary attachments are not scanned, only text/html content in the emails is.

Can you check the spamfilter.ini file for the line:

MaxMsgSizeForKeywordScan

and ensure the value is 64 or less?

Roberto F.
LogSat Software



Posted By: Guests
Date Posted: 21 November 2003 at 3:52pm

Roberto -

The key is there.  I was only seeing the large files created from messages that were released from quarantine. The .TOKEN file was flagged as '.falsepositive' in the first line and the entire file content appeared as mime encoded gobbletygook - here's the first few lines:

<snip>
.falsepositive
&
_
_nextpart_002_01c3afab
0
00
000c05a6
0080iahdo4qy45zpugu6g75fw1er4ykbuwicv2iasvxoocr5dpk9aiofdcq1l7madpg
00bmwd7joomh9rc6jypazit1hswms2klfhmmhniycoragwwp1
00jg59vgc
00tk5nbj0burvixfhhp3zmou7v1zznpt8masgj3gmbmvx
00wkzw
02bpslnq7
02l1oiyxe44a60
033y8goesdoiijp6u
0344ns5h7eyoxij58yvgswjczzc5pxfbesn018tufpxjjo3ncwpikl3bwryzwohcnig
039k
03bpfj
03y8mqyavhj0wjkklbjaa5rhebyb8
04arwuinm8xreji6ggbdnvuplkksr20l0a6vighpwicvlipgouau
04mobtwnzyn6x5ig8vwwjb9
04tj0mg30twjpe9oisgqbia
04v5h8xae
05
05239q5tw82f4dnrklc3mc5zjcpapqv2j2qctlek8
05ewsrs7mhxdn
05naa6fetku5
05oipxkt5lmgmuklkha3pynaabgmc7f8ul1st6vql8nmw4o1uohkgafxfkyub7b5j4smsjl40hxi
</snip>

It looks like messages that are released from quarantine are being parsed for tokens in the binary area possibly???

I re-installed .263 this morning, and I haven't seen anything like this appear yet - but there haven't been any messages bouncing back with attachments like yesterday.

Unfortunately, it also looks like there's another problem - the corpus file isn't saving.  I ran for a couple of hours this AM and there were no changes to the corpus.ini or db.dat files.  I stopped the service and restarted - the two files updated at that time, but only showed 1 good/1 spam in the .ini.... and SpamFilterISP had processed over a thousand messages.

The logfile shows the corpus file being saved at startup, but not again:

At startup:
<snip>
11/21/03 11:59:07:320 -- Listening on xx.xx.xx.xx:25,
11/21/03 11:59:08:271 -- (3060) Connection from: 200.63.157.190  -  Originating country : Argentina
11/21/03 11:59:09:173 -- Starting to process queue directory...
11/21/03 11:59:10:685 -- ***Memory info*******
11/21/03 11:59:10:685 -- TotalAddrSpace = 1,048,576
11/21/03 11:59:10:685 -- TotalUncommitted = 196,608
11/21/03 11:59:10:685 -- TotalCommitted = 851,968
11/21/03 11:59:10:685 -- TotalAllocated = 806,988
11/21/03 11:59:10:685 -- TotalFree = 14,852
11/21/03 11:59:10:685 -- FreeSmall = 14,852
11/21/03 11:59:10:685 -- FreeBig =
11/21/03 11:59:10:685 -- Unused =
11/21/03 11:59:10:685 -- Overhead = 30,128
11/21/03 11:59:10:685 -- HeapErrorCode =
11/21/03 11:59:10:685 -- AllocMemCount = 7,559
11/21/03 11:59:10:685 -- AllocMemSize = 808,500
11/21/03 11:59:10:685 -- **********
11/21/03 11:59:10:685 -- Begin Cleanup of Corpus.db
11/21/03 11:59:10:695 -- End Cleanup of Corpus.db
11/21/03 11:59:10:695 -- Begin Sync Corpus.db
11/21/03 11:59:10:695 -- Sync Corpus.db - 1 - 0
11/21/03 11:59:10:695 -- Sync Corpus.db pass 1 (0)
11/21/03 11:59:10:695 -- Sync Corpus.db pass 2 (0)
11/21/03 11:59:10:695 -- Sync Corpus.db pass 3 (0)
11/21/03 11:59:10:695 -- Sync Corpus.db pass 4 (0)
11/21/03 11:59:10:695 -- Begin Saving Corpus.db
11/21/03 11:59:10:895 -- End Saving Corpus.db (200)
11/21/03 11:59:10:895 -- End Sync Corpus.db (200)
11/21/03 11:59:11:906 -- (3060) Resolving 200.63.157.190 - Not found
11/21/03 11:59:11:916 -- (3060) - Reverse DNS not found -
</snip>

subsequent log entries:
<snip>
11/21/03 12:44:50:464 -- Begin Sync Corpus.db
11/21/03 12:44:50:464 -- Sync Corpus.db - 9864 - 594
11/21/03 12:44:50:595 -- Sync Corpus.db pass 1 (130)
11/21/03 12:44:50:595 -- Sync Corpus.db pass 2 (130)
11/21/03 12:44:50:595 -- Sync Corpus.db pass 3 (130)
11/21/03 12:44:50:605 -- Sync Corpus.db pass 4 (141)
11/21/03 12:44:50:605 -- End Sync Corpus.db (141)
</snip>

I hope this is helpful... I'll keep a close eye on what's happening...

-Ric



Posted By: LogSat
Date Posted: 21 November 2003 at 3:59pm

Ric,

Thanks for the reports. The msgs released form the quarantine are scanned and tokenized, since this allows the filter to learn that message like them are not spam. Subsequent emails with similar content will be much less likely to be stopped. We're taking a look now if there a bug with scanning the binary attachments in them, as they also should only be scanned for text/html.

The corpus database is saved much less frequently now, every 2 hours if traffic is not high, and then when SpamFilter shuts down, this is normal.

Roberto F.
LogSat Software




Print Page | Close Window