Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Memory leak or a feature of Windows 2008?
  FAQ FAQ  Forum Search   Register Register  Login Login

Memory leak or a feature of Windows 2008?

 Post Reply Post Reply
Author
Neolisk View Drop Down
Newbie
Newbie


Joined: 13 July 2009
Location: Toronto, ON
Status: Offline
Points: 27
Post Options Post Options   Thanks (0) Thanks(0)   Quote Neolisk Quote  Post ReplyReply Direct Link To This Post Topic: Memory leak or a feature of Windows 2008?
    Posted: 17 July 2009 at 10:54am
After about 2 days runtime, memory usage of SpamFilterSvc.exe is 728MB. Isn't it too much for such a small program? Will it grow even bigger?

Edited by Neolisk - 17 July 2009 at 10:54am
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 17 July 2009 at 3:57pm
Neolisk,

All the blacklists/whitelists and the Bayesian database are kept in RAM to optimize lookups. Usually the bayesian database is the one that grows the larger. Can you please let us know what the filesize is for the corpus.data and the corpus.dat.prb files in the \SpamFilter\corpus directory?
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Neolisk View Drop Down
Newbie
Newbie


Joined: 13 July 2009
Location: Toronto, ON
Status: Offline
Points: 27
Post Options Post Options   Thanks (0) Thanks(0)   Quote Neolisk Quote  Post ReplyReply Direct Link To This Post Posted: 17 July 2009 at 4:15pm
db.dat ~ 127 MB
db.dat.prb ~ 103 MB

Current memory consumption = 829 MB. If it goes up that fast, we'll have to restart the service quite often.

P.S. I wanted to post a screenshot of the whole folder, but the forum engine would not allow that.


Edited by Neolisk - 17 July 2009 at 4:17pm
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 17 July 2009 at 5:54pm
Actually the Bayesian database should be loaded within about a minute or two after SpamFilter is started, so you should see the RAM usage go up within a few minutes.

Please note that the Bayesian filter is the last one to be used by SpamFilter, and thus will catch a very small percentage of spam compared to the other filters. In our own ISP for example, the Bayesian filter used to catch only about 0.1% of spam, compared to 99.9% of the other filters (we disabled this filter about a year ago on our own live server). Adding to this, the Bayesian filters were “the thing” 5 years ago, and for a while this was the “star” filter in our SpamFilter. However the spammers have since learned how to easily bypass them, making the Bayesian filter even less effective.

As the Bayesian filter is the one that uses the most CPU and the most RAM, if that is affecting your server you may want to consider disabling it as well.
If you wish to instead reset the Bayesian database, you can stop SpamFilter, delete (or rename) the SpamFilter/corpus directory, and then restart SpamFilter. Please note however that the database size will grow again. You can however have SpamFilter cleanup the database more frequently, thus reducing in size, by reducing this parameter in the SpamFilter.ini file:

;Remove any stale token in the corpus db.dat file that did not appear in incoming emails for the past n days
CleanUpCorpusIntervalDays=7
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Neolisk View Drop Down
Newbie
Newbie


Joined: 13 July 2009
Location: Toronto, ON
Status: Offline
Points: 27
Post Options Post Options   Thanks (0) Thanks(0)   Quote Neolisk Quote  Post ReplyReply Direct Link To This Post Posted: 20 July 2009 at 9:43am
CPU hovers around 20% while running on 1 core. Spam gate is the only role of that server. So it's acceptable.

It looks like we had a power outage at the weekend (or the server randomly rebooted or the program did something to its memory consumption), so now it's just 67MB.

Anyway, thanks for a suggestion. We'll see its behavior this week and decide if we really need to do anything. But even if one service restart per week is necessary, I think it's not a big problem. We're not an ISP and don't need Exchange running 24/7.
Back to Top
Neolisk View Drop Down
Newbie
Newbie


Joined: 13 July 2009
Location: Toronto, ON
Status: Offline
Points: 27
Post Options Post Options   Thanks (0) Thanks(0)   Quote Neolisk Quote  Post ReplyReply Direct Link To This Post Posted: 23 July 2009 at 10:22am
Now it eats 1.18 GB! Even after I restarted the service. Corpus files are smaller than 400MB, if taken together. What's wrong with it?
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 23 July 2009 at 3:52pm
As we mentioned in the previous post, SpamFilter keeps the bayesian database in memory (we're actually storing two copies of the database to optimize performance as both read/write access are required on it at any time). If the files total 400MB, that would add up to 800MB of RAM, plus a small percentage for overhead memory swaps. In this case, 1GB of RAM are thus legitimate.

The issue is thus not with a memory leak, but rather to see if this database size is justified. The more emails/day are received, and the longer the statistical tokens (words) in the emails are kept, will cause the database to increase.

Can you please let us know ballpark how many emails per day you receive, and what value the setting above mentioned in the SpamFilter.ini file - CleanUpCorpusIntervalDay - has?

In addition, if you can please zip and email us one of your latest SpamFilter's activity logfile for an entire day we'll debug them to ensure there are no errors occurring while cleaning up the bayesian database. IF the zip is over 8MB in size, I'll send you a PM with the login info for our FTP site.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Neolisk View Drop Down
Newbie
Newbie


Joined: 13 July 2009
Location: Toronto, ON
Status: Offline
Points: 27
Post Options Post Options   Thanks (0) Thanks(0)   Quote Neolisk Quote  Post ReplyReply Direct Link To This Post Posted: 23 July 2009 at 4:11pm
Now it's even bigger: 1287MB! And it wasn't like this yesterday: only ~70MB was occupied although the corpus DB was about the same size. That's weird!

We can allocate any reasonable amount of memory. The question is: How much do we need to forget about this problem?

CleanUpCorpusIntervalDays=7

I uploaded today's current logs on your FTP.

Another question: since we don't manage spam emails and delete everything that is considered spam at any level of protection, is there any point to store Bayesian database? How does it work after all?
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 23 July 2009 at 7:06pm
The Bayesian database holds statistical information about the various "tokens" (words/symbols) that your incoming emails contain. As email arrives, and it's categorized by the other filters as "spam" or "clean", the bayesian filter "learns" about the various token patterns, and is, statistically, able to distinguish similar patterns in the future to help identify emails as spam. As spam emails are dynamic (the type of spam you receive today will usually be different from the one you'll receive next week), these statistical "tokens" are set to expire after a few days if during that time new emails doe not contain them any more.

....however

The answer to your question is there any point to store Bayesian database was in a way already answered earlier in this thread

Originally posted by LogSat LogSat wrote:

the Bayesian filters were "the thing" 5 years ago, and for a while this was the "star" filter in our SpamFilter. However the spammers have since learned how to easily bypass them, making the Bayesian filter even less effective.
As the Bayesian filter is the one that uses the most CPU and the most RAM, if that is affecting your server you may want to consider disabling it as well.


We've received your log and will be looking over it shortly.



Edited by LogSat - 23 July 2009 at 7:07pm
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
Neolisk View Drop Down
Newbie
Newbie


Joined: 13 July 2009
Location: Toronto, ON
Status: Offline
Points: 27
Post Options Post Options   Thanks (0) Thanks(0)   Quote Neolisk Quote  Post ReplyReply Direct Link To This Post Posted: 24 July 2009 at 9:45am
Well, until it was using the most CPU and RAM, but the values were generally low, I didn't care about it being enabled. Any additional protection is never odd, but now... again, it depends on how big it will grow. Currently 75% of memory is occupied. I also noticed that when it comes to 90%, huge lags happen, so the antispam gateway is in kind of critical state.

Anyway, thanks for looking!
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.219 seconds.