100% cpu usage |
Post Reply |
Author | ||
Fred Dickey
Guest Group |
Post Options
Thanks(0)
Posted: 27 May 2005 at 9:34am |
|
I'm trying to figure out an issue I've been having with spamfilter the
past couple of weeks. It seems that our "current inbound
connections" has grown exponentially over the past couple of
weeks. We tried clustering two boxes together to handle this
addtional volume, but one of the boxes was also our primary mail server
and we experienced performance issues with that.
So now we have reloaded a new server box, reinstalled spamfilter, moved our white/blacklists and quarantine database over to it. Spamfilter installed with a default of 100 simultaneous connections this time. Gradually during a 12 to 24 hour period, the number of simultaneous connections will build up to the max of 100. Before a few weeks ago, this number never got over 10 and our max was 20 concurrent sessions. During all this, our CPU usage is maxed out at 100%, however spamfilter seems to be able to accept and pass through new email sessions fairly quickly despite task manager saying the cpu is maxed. Currently we have spamfilter on a 1.8GHZ Pentium 4 system with 512MB of RAM running Windows 2003 Server Standard Edition. We have one domain that we host that seems to be a spam and virus magnet and is creating the most volume of traffic, averaging about 15,000 legitimate emails per month As of 05/27/2005 at 2:39AM, our new install of Spamfilter has processed 3,428 inbound connections, blocked 3,348 emails and forwarded 612 emails. I don't believe that this is a very high volume of email so I'm a bit puzzeled as to the high cpu usage I'm seeing. Also puzzled as to why this has become an issue in the past few weeks as it has. Any insight from the spamfilter/mail server gurus out there would be greatly appreciated. Thanks. |
||
Desperado
Senior Member Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
Post Options
Thanks(0)
|
|
Fred, About how many message / day do you handle? Also can you set the following in your ini file? IdleDisconnectMinutesTimeout=15 This will force connections that are not actually doing anything to close. Regards, |
||
The Desperado
Dan Seligmann. Work: http://www.mags.net Personal: http://www.desperado.com |
||
Fred Dickey
Guest Group |
Post Options
Thanks(0)
|
|
I would estimate about 6,000 to 7,000 per day given the current stats on the spamfilter. I checked the ini file and there is already and idledisconnect set to 15 minutes.
|
||
Desperado
Senior Member Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
Post Options
Thanks(0)
|
|
Fred, I will not be responding for a bit as I am heading down to Indiana for the weekend ... BUT, I am doing over 150,000 / day and do not see the same issue. However, What version / build are you running? Regards, |
||
The Desperado
Dan Seligmann. Work: http://www.mags.net Personal: http://www.desperado.com |
||
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
|
Fred,
Could you email us at support@logsat.com your SpamFilter.ini file and a couple of SpamFilter's activity logfiles so we can try to see what's going on? Can you please also let us know what version of SpamFilter you're using? In the meantime, what database platform are you using? Could you check to see if the database is able to handle the incoming load of emails without problems? We've seen *one* case where MySQL was freezing when quarantining large emails, and the MySQL ODBC driver that SpamFilter used, in turn, was also freezing the connections. |
||
Fred Dickey
Guest Group |
Post Options
Thanks(0)
|
|
I'm running the latest build, v 2.5.2.457. I had upgraded to v 2.5.1.441 when we first began experiencing issues with concurrent sessions maxing out and bogging down the cpu. At first I thought maybe we were getting SMTP flooded or something. I boosted our max sessions up and configured two server boxes with spamfilter into a load balancing cluster in hopes this would resolve the issue. Then we began seeing both machines max out cpu's and max sessions. So I went back to a single, faster server box with v 2.5.2.457 assuming that perhaps it was a cpu capacity issue. We are currently using the default Access database and I'm wondering if that could be the issue now. I'm also thinking about flushing the database to see if that may resolve this issue. The current size of the qurantine database is 449MB. I'll email my .ini file and some of the log files to support tommorow or Monday one when I get a chance.
|
||
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
|
Fred,
You *definetly* do not want to use MS Access for that kind of traffic. Access was designed for single user applications, it is able to handle multi-users, but only to a certain extent, as performance is very bad. It is not designed for high usage applications. We include support for it in SpamFilter so users can easily test their implementations, and possibly use it in low volume installations, but we do not recommend it's use in live environments with multiple concurrent incoming connections as in your case. |
||
Fred Dickey
Guest Group |
Post Options
Thanks(0)
|
|
Ok, that may explain the 100% cpu usage. I've always noticed spikes in the cpu whenever traffic was coming in. In watching my activity though, it seems like it's the same emails that keep coming back and building up the concurrent sessions to max. It's as if the sending mail server(s) never realize that the email was actually sent. This is happening intermittently with some emails we are receiving and is causing some of our end users to receive multiple copies of the same email. I'm sending you a copy of some of my logfiles and my ini file now to see if there's something else involved here besides the access database. Thanks for all of your feedback.
|
||
vrspock
Newbie Joined: 31 May 2005 Location: United States Status: Offline Points: 16 |
Post Options
Thanks(0)
|
|
Today I wiped the corpus database and wiped the quarantine
database. I've confirmed with some of my end users that senders
are getting NDR's from their mail servers despite the fact that their
email is being delivered to the recipient's mailbox. This may
explain why users are getting multiple copies of the same emails.
I activated the connections grid and the sessions that appear "hung" are sitting at the RCPT TO status. When doing an SMTP debug of one of these sessions it seems to send the first few SMTP commands followed by the message body then just sits there indefinetly until the spamfilter service is restarted. As I said before, this suddenly became an issue just a couple of weeks ago. I'm at a loss as to what could be going on. Going to try to reboot our firewall just to see if there is something weird going on with it. |
||
JimMeredith
Newbie Joined: 27 January 2005 Location: United States Status: Offline Points: 28 |
Post Options
Thanks(0)
|
|
About two weeks ago... did you add the "German Spam" subject lines to your keywords filter around that time? Or do you make regular additions to your keywords list? Either way, this thread might apply: http://www.logsat.com/spamfilter/forums/forum_posts.asp?TID= 5093
|
||
vrspock
Newbie Joined: 31 May 2005 Location: United States Status: Offline Points: 16 |
Post Options
Thanks(0)
|
|
Thanks for the post. I'll take a very careful look at our keyword
list. Most of our keywords are URLS of known spam, so the SURBL
may be an answer to significantly reducing the size of our keyword list
and thus make our spam filter more efficient. We will have to
experiment with it to find the right balance.
I think I fixed the hanging sessions issue...at least, with a work around for now. It seems to be the same from addresses that were hanging with multiple sessions all the time, so I added them to the exclfrom white list to see what that would do and it seems to have worked. No more hung sessions from those people...at least, no more 100+ sessions all sitting at the RCPT TO status. Just 2 to 8 simultaneous sessions at any given time...whew...back to normal....I think. Would still like to track down the cause of this issue as I'm sure I'll run into more sessions that I will have to apply the same work around to.
|
||
vrspock
Newbie Joined: 31 May 2005 Location: United States Status: Offline Points: 16 |
Post Options
Thanks(0)
|
|
Haven't found anything obvious in the keywords filter that may be
causing this. Sessions seems to be hanging at about the time it
checks the corpus database, however wiping that database didn't fix the
issue.
I'm still running into sessions hanging intermittently and the only temporary fix seems to be to white list the from address and restart the spam filter service so that the next time their mail server attempts to resend the email it passes without incident. Some of our users are still sporadically gettings duplicate emails due to this issue. |
||
vrspock
Newbie Joined: 31 May 2005 Location: United States Status: Offline Points: 16 |
Post Options
Thanks(0)
|
|
I found something in the white lists that didn't look too kosher tonight. Not sure if it has anything to do with my issue or not but it may have showed up about the time this issue started. In my autowhitelistforcedelivery.txt file was an entry for "System Administrator <>|nospam@v-sources.com" This is obviously not a valid address so I'm wondering if it may be what has been causing some smtp sessions to hang like they have been. I'll continue to monitor my server and post an update if that fixed it. |
||
Marcus
Guest Group |
Post Options
Thanks(0)
|
|
Any official way to fix this? Yesterday 4 different servers running v 2.5.1.441 since it was released. No problems since install. Suddenly yesterday at about the same time all 4 started freezing connections in the "rcpt to" state & maxing cpu as described above. Removed all filters (keyword, blocked ips, etc.) one by one - same Removed the use of Quaratine db - same If I enter the addresses in "Excluded from emails" ( as vrspock states above ) it does allow those to pass correctly. This is not happening on all email, some aol.com and some list servers in particular. Ideas? Thanks, Marcus |
||
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
|
Marcus,
To-date we hae not been able to reproduce the problem. Any info you may have can be useful. In your case we can probably use: If you have copies of the emails that made it thru after adding them to the whitelist that would be of great help (we'd need to original email, headers and body) One of SpamFilter's activity log for a day this occurred The email addresses you think are causing the problem Your SpamFilter.ini file and all your local blacklist/whitelist and keyword files Could you zip everything and email it to us at support_at_logsat.com? |
||
Marcus
Guest Group |
Post Options
Thanks(0)
|
|
I will gather as much data as i can and email to you. Marcus |
||
Dan B
Senior Member Joined: 09 February 2005 Location: United States Status: Offline Points: 105 |
Post Options
Thanks(0)
|
|
A few weeks ago we had the same thing only with 2 of the 4 servers. The 2 that was having an issue were are primary domain that receives a couple hundred thousand per day on each server While it was peg at 100% CPU the memory was 200Mb Plus. The only way I was able to get it back to normal was to set the IdleDisconnectMinutesTimeout=5 Then within mins it went back to normal. I have left the setting to 5 mins and we havn't had any issues since. We are using MySQL with 4 SF servers. Thanks,
|
||
Labsy
Guest Group |
Post Options
Thanks(0)
|
|
Same problem here with 2.5.1.441 version. I get only 1-5 concurent connections, and around 10.000 mails per day. Running SpamFilter on Win2003 OS on Dual Xeon 3.06 GHz with 2 GB RAM. SpamFilter configured to use MSSQL database, and uses from 50-90% CPU!!! ...until I figured out what consumes so much CPU - SpamFilter's AntiVirus I removed AntiVirus and CPU consumption dropped down to normal 1-9% |
||
Marcus
Guest Group |
Post Options
Thanks(0)
|
|
Dan I have tried as low as IdleDisconnectMinutesTimeout=3 and doesn't seem to have any effect. Logsat I sent you some data via email 7/13 10:30pm cst and 7/14 12:15pm cst from Site-1M and Site-1C recspectively. A complete redownload -reinstall-reconfig .ini at site-1M and problem has not recurred, same for site-1B, and site-1A. Qdb is not active at these sites currently. Site-1T has shown no instances of the problem & recieves the most email and spam. Site-1J has shown no instances of the problem. Site-1C a complete different story. Have done complete redownload -reinstall-reconfig .ini without and with Qdb and problem keeps recurring. Lots of info in the email. Marcus |
||
vrspock
Newbie Joined: 31 May 2005 Location: United States Status: Offline Points: 16 |
Post Options
Thanks(0)
|
|
I haven't had any more issues with this since I removed that keyword entry that appeared to be causing the issue.
|
||
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
|
With Marcus we found a specific keyword RegEx that was causing 100% CPU
with a specific email's body. We're trying to determine what is the
cause of the issue and will hopefully have a fix soon now that the
problem is reproduceable.
|
||
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
|
As mentioned before thanks to Marcus' email samples and help we were
able to find a bug in the RegEx processor that caused the CPU to peak
at 100%. We've posted a fix for it with build 2.6.3.473, now available
in the registered user area. While the build appears very stable, it
has not received all the testing we would have liked to perform before
releasing it. However due to the seriousness of the problem we decided
to make it available now for users who wish to deploy it immediately.
|
||
Marcus
Guest Group |
Post Options
Thanks(0)
|
|
Please let me correct one of my statements above. Evidently I did not (due to old-age disease) remove my keyword file. I found the keyword problem about the same time Roberto did. Thanks to Roberto and the team at LogSat. Excellent support of your software. Marcus |
||
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.271 seconds.