Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - MySQL commands for cleanup
  FAQ FAQ  Forum Search   Register Register  Login Login

MySQL commands for cleanup

 Post Reply Post Reply
Author
leeH View Drop Down
Newbie
Newbie


Joined: 22 October 2007
Status: Offline
Points: 7
Post Options Post Options   Thanks (0) Thanks(0)   Quote leeH Quote  Post ReplyReply Direct Link To This Post Topic: MySQL commands for cleanup
    Posted: 17 June 2010 at 1:54pm

I would like to schedule my own database cleanup after hours and I was wondering what the command is to do it. I know it starts out like "delete tblsmsgs from tblsmsgs .....

Any help would be appreciated,
 
Lee


Edited by leeH - 17 June 2010 at 1:55pm
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 17 June 2010 at 10:55pm
To remove old emails from the database we use the following: 
  
UPDATE tblQuarantine SET Expire = 1 WHERE MsgDate <= ADate 
  
Where ADate is a parameter and the syntax depends on the DB platform used. 
  
This marks the old records to be deleted. 

Then we issue the actual delete query: 
  
DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0 
  
That deletes most of rows from the tblQuarantine (and due to the database constraints, the related records in the tblMsgs), but may leave behind some "orphaned" rows in the tblMsgs. So we then issue the following as a backup to ensure all orphans are deleted as well:

DELETE tblMsgs FROM tblMsgs LEFT JOIN tblQuarantine 
ON tblMsgs.MsgID = tblQuarantine.MsgID WHERE (tblQuarantine.MsgID IS NULL)

Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
morten44 View Drop Down
Groupie
Groupie


Joined: 07 March 2008
Status: Offline
Points: 74
Post Options Post Options   Thanks (0) Thanks(0)   Quote morten44 Quote  Post ReplyReply Direct Link To This Post Posted: 03 July 2010 at 7:20am
Hi
This sounds Interesting
I would like to do this as well but as I am not very much into Mysql I am not sure on the precise codes.
 
I am using Mysql 5 with my Spamfilter
in the Spamfilter Application, i set to keep spam for 14days
In the field how often to clean out i set to 0 to disactivate it
 
Now I activate a script:
UPDATE tblQuarantine SET Expire = 1 WHERE MsgDate <= ADate 
and then
DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0
 
Would that work. Where do I define within this code to only affect spam more than 14 days old?
 
 
Hope anyone can advice
 
Regards
Morten
 
 
 
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2010 at 12:45pm
Morten,

This should work for MySQL 5:

UPDATE tblQuarantine SET Expire = 1 WHERE MsgDate <= DATE_SUB(NOW(), INTERVAL 14 DAY); 

DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0

Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
morten44 View Drop Down
Groupie
Groupie


Joined: 07 March 2008
Status: Offline
Points: 74
Post Options Post Options   Thanks (0) Thanks(0)   Quote morten44 Quote  Post ReplyReply Direct Link To This Post Posted: 06 July 2010 at 6:34am
Hi Roberto
 
Thanks again for your support.
I know the question is a little outside the scope of Spamfilter support, so thanks again for taking time to answer
 
 
Kind Regards
Morten
Back to Top
morten44 View Drop Down
Groupie
Groupie


Joined: 07 March 2008
Status: Offline
Points: 74
Post Options Post Options   Thanks (0) Thanks(0)   Quote morten44 Quote  Post ReplyReply Direct Link To This Post Posted: 24 July 2010 at 6:35pm
Hi
Hope you still see this old post.
I was today trying the script to delete all spam mail in quarantene older than 7 days.
 
It does not seem to work.
The database files are 6GB before and after i run the script, and there are currenlty spam mail for 18days. As I sat it to delete all older than 7 days, I expected the database files to get much smaller, so i dont think it works
 
You can see a printscreen of the script I ran and how it looked like when completed.
Can you see what the problem is?
 
 
Kind Regards
Morten
Back to Top
leeH View Drop Down
Newbie
Newbie


Joined: 22 October 2007
Status: Offline
Points: 7
Post Options Post Options   Thanks (0) Thanks(0)   Quote leeH Quote  Post ReplyReply Direct Link To This Post Posted: 24 July 2010 at 6:46pm
I too was not sure what was going on but it turned out that you need to optimize the databases too in order to shrink the physical size down.
 
Lee
Back to Top
morten44 View Drop Down
Groupie
Groupie


Joined: 07 March 2008
Status: Offline
Points: 74
Post Options Post Options   Thanks (0) Thanks(0)   Quote morten44 Quote  Post ReplyReply Direct Link To This Post Posted: 26 July 2010 at 12:57pm
Hi
Thanks for your reply
At a closer look I can see that after Optemizing the database the tblquarantene is about 120MB and that is acceptable. I cant remember what it was before because my focus was on the tblmsgs table. That is about 5GB and that does not seem to get smaller.
 
Do you know what the table is used for?
Is there a way to empty that one or make it smaller?
 
I cant even open it as it freezes, probably because its to big.

At the moment I have managed to get Spamfilter up and running again by disable the database, but its not optimal.
 
I seem to have this issue every time i setup this system. I have tried to setup ISP spamfilter 3 times on 2 servers and it runs fine for 2-3 weeks, then it starts to behave strange and customers can not connect and send/receive. At the same time server starts to be very slow, to a point of freezing.
If i broswe using Windows Explorer inside the spamfilter homedir and click on quarantene Explorer freezes.
 
Hope there are some with simular issues who has a solution to the freezing or how to make the tblmsgs smaller
 
 
Regards
Morten
Back to Top
yapadu View Drop Down
Senior Member
Senior Member


Joined: 12 May 2005
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote yapadu Quote  Post ReplyReply Direct Link To This Post Posted: 27 July 2010 at 7:22am
tblmsgs stores the actual email that has been placed in quarantine.  You can only make this table smaller by storing the messages for fewer days.  If you currently store for 14 days, if you reduce it to 7 days the table size would reduce by about half.

I see from this thread you are doing your own cleanup, and not relying on spamfilter to do it so you might want to make sure it is working.

How many messages do you process a day that you have a 5gb tblmsgs table?
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk.
Back to Top
morten44 View Drop Down
Groupie
Groupie


Joined: 07 March 2008
Status: Offline
Points: 74
Post Options Post Options   Thanks (0) Thanks(0)   Quote morten44 Quote  Post ReplyReply Direct Link To This Post Posted: 27 July 2010 at 5:58pm
Hi
Thanks for reply
We get about 120.000-150.000 incomming mails a day
About 90% is spam
I think our problem has to do with the size of the mysql database as when it gets big spamfilter freezes when mysql is running. When i stop mysql, spamfiler start to work ok again.
Our spamfilter program is working good as long as we dont the quarantene database. IT works well for 3 weeks after a new install and then the problems starts
 
 
Back to Top
yapadu View Drop Down
Senior Member
Senior Member


Joined: 12 May 2005
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote yapadu Quote  Post ReplyReply Direct Link To This Post Posted: 27 July 2010 at 6:13pm
So if you keep the email for two weeks, you end up with maybe a little over 2 million messages in that table.

It should not be a big issue for mysql to hold that many records in a database.  What type of hardware is the database running on?
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk.
Back to Top
yapadu View Drop Down
Senior Member
Senior Member


Joined: 12 May 2005
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote yapadu Quote  Post ReplyReply Direct Link To This Post Posted: 26 December 2010 at 6:56am
Originally posted by LogSat LogSat wrote:

Then we issue the actual delete query: 
  
DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0 
  
That deletes most of rows from the tblQuarantine (and due to the database constraints, the related records in the tblMsgs), but may leave behind some "orphaned" rows in the tblMsgs. So we then issue the following as a backup to ensure all orphans are deleted as well:

DELETE tblMsgs FROM tblMsgs LEFT JOIN tblQuarantine 
ON tblMsgs.MsgID = tblQuarantine.MsgID WHERE (tblQuarantine.MsgID IS NULL)



I was just searching this forum to see what cleanup functions are being run as I found a bunch of old messages in the tblquarantine.

You mentioned a foreign key constraint, neither my tblquarantine or tblmsgs has any.  I did some searching of the database scripts and found this one:

ALTER TABLE `tblQuarantine` ADD
    CONSTRAINT `FK_tblQuarantine_tblMsgs` FOREIGN KEY `FK_tblQuarantine_tblMsgs`
    (`MsgID`) REFERENCES `tblMsgs` (`MsgID`) ON DELETE CASCADE ;


This creates a constraint from tblQuarantine -> tblMsgs, so I assume that spamfilter inserts the message into tblMsgs first to generate the msgID needed for tblQuarantine.

A message record could not exists in tblQuarantine if there was no matching record in tblMsgs, except you delete messages via the tblQuarantine as tblMsgs has no date information.  So that constraint doesn't really do anything, it would actually need to be the other way around, no?
   


--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk.
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 26 December 2010 at 11:26pm
Good observations Smile

The foreign key constraints is however not enforced on inserts, just on deletes. This means that if a record in the tblMsgs table is deleted, then all records in the tblQuarantine table that point to it are automatically deleted by the database as well. Do not forget that if a spam email is sent to multiple users, we only store one record with the actual email contents in the tblMsgs, while we save multiple "headers" for the individual recipients in the tblQuarantine. This allows us to save lots of disk space as we only save the actual email once.

In reality however, to optimize the routine cleanup process, we do not rely on the database's foreign key constraint as that is actually cause of slowdowns, since for each delete on the tblMsgs the database has to lookup and individually delete all records from the tblQuarantine. We find it more efficient to first delete in bulk all old records from the tblQuarantine. There are no cascading deletes here, so the process is very fast. Once this is done, we delete all orphaned records in the tblMsgs table. As here there are now more "linked" records (we just deleted them all before), this process is very fast as well.

We left the foreign key cascade delete constraint as it's always better to have a good cleanup in place in case the entries in the tblMsgs table are deleted by an external process and our routine scheduled cleanup has been disabled...
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
yapadu View Drop Down
Senior Member
Senior Member


Joined: 12 May 2005
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote yapadu Quote  Post ReplyReply Direct Link To This Post Posted: 27 December 2010 at 1:33am
How long do you estimate that query (DELETE tblMsgs FROM tblMsgs) would take to execute on say 750k rows?  And how often is spamfilter running that query?

From the looks of it, out of the box the tblMsgs has an index on msgid as it is the primary key for the table.

The msgid field in tblquarantine does not have an index (or is mine missing it?).

I made a copy of my spamfilter database and tried to run the command and it took so long I just gave up at about 45 minutes.

What is the INI entry to disable the housekeeping?

I'm thinking of putting an index on tblquarantine.msgid.  The resulting join is faster but still quite slow on my server (about 3 mintutes).  Something like following is much faster, but does it create some other issue I might be overlooking?

delete from tblmsgs
where msgid < ( select min(msgid) from tblquarantine )
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk.
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 27 December 2010 at 11:19pm
The time would depend on the database server's speed and type of database (SQL Server 2005 and higher for example are much faster than MySQL). Guest-imating I'd say anywhere between 10 and 60 minutes. SpamFilter runs that query by default every 60 minutes, but that can be of course be customized and/or disabled (from the "Database Setup" tab).

Are you 100% certain that the MsgID field in the tblQuarantine does not have an index? There should indeed be one, along with several others for other fields as well.

Are you running MySQL or Microsoft SQL Server? If you're running Microsoft SQL Server, you could disable SpamFilter's cleanup procedure (see above) and follow the thread at:
http://www.logsat.com/SpamFilter/Forums/forum_posts.asp?TID=6745&PID=13167
to schedule the cleanup with an optimized stored procedure.

If you're running MySQL, we sometimes suggest creating the extra indexes to improve performance during the cleanup. Both are made up of multiple fields:

Index for: emailto, msgid, msgdate, deliver, expire 
and one for: MsgID, Deliver, Expire, ServerID 

This may help increase performance while deleting records.

You can create the indexes easily using the MySQL Administrator, or if you wish, you can use execute the SQL statements below: 

ALTER TABLE `SpamFilter`.`tblquarantine` ADD INDEX Optimize_1(`emailto`, `msgid`, `msgdate`, `deliver`, `expire`);
ALTER TABLE `SpamFilter`.`tblquarantine` ADD INDEX Optimize_2(`msgid`, `deliver`, `expire`, `serverid`);


In regards to your suggested query, I'd recommend against it. Deleting entries from the tblMsgs causes the database's built-in triggers we added as a safety-net to kick in to perform cascade deletes from the related records in the tblQuarantine, which should make the query run slowly. In addition, there isn't a parameter in it to specify how old the messages have to be before being deleted.
Roberto Franceschetti

LogSat Software

Spam Filter ISP
Back to Top
yapadu View Drop Down
Senior Member
Senior Member


Joined: 12 May 2005
Status: Offline
Points: 297
Post Options Post Options   Thanks (0) Thanks(0)   Quote yapadu Quote  Post ReplyReply Direct Link To This Post Posted: 28 December 2010 at 7:13am
Thanks for the detailed feedback Roberto, we do have an index on that field.  I am just surprised how long it takes even with an index.

On my server with 512,000 records in tblquarantine and 360,000 records in tblmsgs the process takes between 2 - 4 minutes with an index.  With no index, you can forget about it.  I tried it, but aborted after 24 hours.

Nothing further can be placed in quarantine causing an immediate backlog for servers that might try and place items in quarantine when this housekeeping is going on.

If you have multiple spamfilter servers running, I don't see any need for both (you have a primary and secondary right?) of the servers to be executing the cleanup every hour.

We have made some changes to our system and can run two different routes to do our housekeeping.  Our 'regular' housekeeping now takes less than 4 seconds.  A few times a day we can do a 'deep' housekeeping and it takes about 30 - 40 seconds.

Will try it like this for a while and see how it goes, our spamfilter actually did not
have a problem, just trying to make it faster as we did notice the servers getting backlogged from time to time.  The backlog lasted a couple of minutes each time, so I suspect it was the housekeeping that was locking the tblmsgs table.
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.188 seconds.