MySQL commands for cleanup |
Post Reply |
Author | |
leeH
Newbie Joined: 22 October 2007 Status: Offline Points: 7 |
Post Options
Thanks(0)
Posted: 17 June 2010 at 1:54pm |
I would like to schedule my own database cleanup after hours and I was wondering what the command is to do it. I know it starts out like "delete tblsmsgs from tblsmsgs ..... Any help would be appreciated,
Lee Edited by leeH - 17 June 2010 at 1:55pm |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
To remove old emails from the database we use the following: UPDATE tblQuarantine SET Expire = 1 WHERE MsgDate <= ADate Where ADate is a parameter and the syntax depends on the DB platform used. This marks the old records to be deleted. Then we issue the actual delete query: DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0 That deletes most of rows from the tblQuarantine (and due to the database constraints, the related records in the tblMsgs), but may leave behind some "orphaned" rows in the tblMsgs. So we then issue the following as a backup to ensure all orphans are deleted as well: DELETE tblMsgs FROM tblMsgs LEFT JOIN tblQuarantine ON tblMsgs.MsgID = tblQuarantine.MsgID WHERE (tblQuarantine.MsgID IS NULL) |
|
morten44
Groupie Joined: 07 March 2008 Status: Offline Points: 74 |
Post Options
Thanks(0)
|
Hi
This sounds Interesting
I would like to do this as well but as I am not very much into Mysql I am not sure on the precise codes.
I am using Mysql 5 with my Spamfilter
in the Spamfilter Application, i set to keep spam for 14days
In the field how often to clean out i set to 0 to disactivate it
Now I activate a script:
UPDATE tblQuarantine SET Expire = 1 WHERE MsgDate <= ADate
and then
DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0
Would that work. Where do I define within this code to only affect spam more than 14 days old?
Hope anyone can advice
Regards
Morten
|
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Morten, This should work for MySQL 5: UPDATE tblQuarantine SET Expire = 1 WHERE MsgDate <= DATE_SUB(NOW(), INTERVAL 14 DAY); DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0 |
|
morten44
Groupie Joined: 07 March 2008 Status: Offline Points: 74 |
Post Options
Thanks(0)
|
Hi Roberto
Thanks again for your support.
I know the question is a little outside the scope of Spamfilter support, so thanks again for taking time to answer
Kind Regards
Morten
|
|
morten44
Groupie Joined: 07 March 2008 Status: Offline Points: 74 |
Post Options
Thanks(0)
|
Hi
Hope you still see this old post.
I was today trying the script to delete all spam mail in quarantene older than 7 days.
It does not seem to work.
The database files are 6GB before and after i run the script, and there are currenlty spam mail for 18days. As I sat it to delete all older than 7 days, I expected the database files to get much smaller, so i dont think it works
You can see a printscreen of the script I ran and how it looked like when completed.
Can you see what the problem is?
Kind Regards
Morten
|
|
leeH
Newbie Joined: 22 October 2007 Status: Offline Points: 7 |
Post Options
Thanks(0)
|
I too was not sure what was going on but it turned out that you need to optimize the databases too in order to shrink the physical size down.
Lee
|
|
morten44
Groupie Joined: 07 March 2008 Status: Offline Points: 74 |
Post Options
Thanks(0)
|
Hi
Thanks for your reply
At a closer look I can see that after Optemizing the database the tblquarantene is about 120MB and that is acceptable. I cant remember what it was before because my focus was on the tblmsgs table. That is about 5GB and that does not seem to get smaller.
Do you know what the table is used for?
Is there a way to empty that one or make it smaller?
I cant even open it as it freezes, probably because its to big.
At the moment I have managed to get Spamfilter up and running again by disable the database, but its not optimal. I seem to have this issue every time i setup this system. I have tried to setup ISP spamfilter 3 times on 2 servers and it runs fine for 2-3 weeks, then it starts to behave strange and customers can not connect and send/receive. At the same time server starts to be very slow, to a point of freezing.
If i broswe using Windows Explorer inside the spamfilter homedir and click on quarantene Explorer freezes.
Hope there are some with simular issues who has a solution to the freezing or how to make the tblmsgs smaller
Regards
Morten
|
|
yapadu
Senior Member Joined: 12 May 2005 Status: Offline Points: 297 |
Post Options
Thanks(0)
|
tblmsgs stores the actual email that has been placed in quarantine. You can only make this table smaller by storing the messages for fewer days. If you currently store for 14 days, if you reduce it to 7 days the table size would reduce by about half.
I see from this thread you are doing your own cleanup, and not relying on spamfilter to do it so you might want to make sure it is working. How many messages do you process a day that you have a 5gb tblmsgs table? |
|
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk. |
|
morten44
Groupie Joined: 07 March 2008 Status: Offline Points: 74 |
Post Options
Thanks(0)
|
Hi
Thanks for reply
We get about 120.000-150.000 incomming mails a day
About 90% is spam
I think our problem has to do with the size of the mysql database as when it gets big spamfilter freezes when mysql is running. When i stop mysql, spamfiler start to work ok again.
Our spamfilter program is working good as long as we dont the quarantene database. IT works well for 3 weeks after a new install and then the problems starts
|
|
yapadu
Senior Member Joined: 12 May 2005 Status: Offline Points: 297 |
Post Options
Thanks(0)
|
So if you keep the email for two weeks, you end up with maybe a little over 2 million messages in that table.
It should not be a big issue for mysql to hold that many records in a database. What type of hardware is the database running on? |
|
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk. |
|
yapadu
Senior Member Joined: 12 May 2005 Status: Offline Points: 297 |
Post Options
Thanks(0)
|
I was just searching this forum to see what cleanup functions are being run as I found a bunch of old messages in the tblquarantine. You mentioned a foreign key constraint, neither my tblquarantine or tblmsgs has any. I did some searching of the database scripts and found this one: ALTER TABLE `tblQuarantine` ADD CONSTRAINT `FK_tblQuarantine_tblMsgs` FOREIGN KEY `FK_tblQuarantine_tblMsgs` (`MsgID`) REFERENCES `tblMsgs` (`MsgID`) ON DELETE CASCADE ; This creates a constraint from tblQuarantine -> tblMsgs, so I assume that spamfilter inserts the message into tblMsgs first to generate the msgID needed for tblQuarantine. A message record could not exists in tblQuarantine if there was no matching record in tblMsgs, except you delete messages via the tblQuarantine as tblMsgs has no date information. So that constraint doesn't really do anything, it would actually need to be the other way around, no? |
|
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Good observations
The foreign key constraints is however not enforced on inserts, just on deletes. This means that if a record in the tblMsgs table is deleted, then all records in the tblQuarantine table that point to it are automatically deleted by the database as well. Do not forget that if a spam email is sent to multiple users, we only store one record with the actual email contents in the tblMsgs, while we save multiple "headers" for the individual recipients in the tblQuarantine. This allows us to save lots of disk space as we only save the actual email once. In reality however, to optimize the routine cleanup process, we do not rely on the database's foreign key constraint as that is actually cause of slowdowns, since for each delete on the tblMsgs the database has to lookup and individually delete all records from the tblQuarantine. We find it more efficient to first delete in bulk all old records from the tblQuarantine. There are no cascading deletes here, so the process is very fast. Once this is done, we delete all orphaned records in the tblMsgs table. As here there are now more "linked" records (we just deleted them all before), this process is very fast as well. We left the foreign key cascade delete constraint as it's always better to have a good cleanup in place in case the entries in the tblMsgs table are deleted by an external process and our routine scheduled cleanup has been disabled...
|
|
yapadu
Senior Member Joined: 12 May 2005 Status: Offline Points: 297 |
Post Options
Thanks(0)
|
How long do you estimate that query (DELETE tblMsgs FROM tblMsgs) would take to execute on say 750k rows? And how often is spamfilter running that query?
From the looks of it, out of the box the tblMsgs has an index on msgid as it is the primary key for the table. The msgid field in tblquarantine does not have an index (or is mine missing it?). I made a copy of my spamfilter database and tried to run the command and it took so long I just gave up at about 45 minutes. What is the INI entry to disable the housekeeping? I'm thinking of putting an index on tblquarantine.msgid. The resulting join is faster but still quite slow on my server (about 3 mintutes). Something like following is much faster, but does it create some other issue I might be overlooking? delete from tblmsgs where msgid < ( select min(msgid) from tblquarantine ) |
|
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
The time would depend on the database server's speed and type of database (SQL Server 2005 and higher for example are much faster than MySQL). Guest-imating I'd say anywhere between 10 and 60 minutes. SpamFilter runs that query by default every 60 minutes, but that can be of course be customized and/or disabled (from the "Database Setup" tab).
Are you 100% certain that the MsgID field in the tblQuarantine does not have an index? There should indeed be one, along with several others for other fields as well. Are you running MySQL or Microsoft SQL Server? If you're running Microsoft SQL Server, you could disable SpamFilter's cleanup procedure (see above) and follow the thread at: http://www.logsat.com/SpamFilter/Forums/forum_posts.asp?TID=6745&PID=13167 to schedule the cleanup with an optimized stored procedure. If you're running MySQL, we sometimes suggest creating the extra indexes to improve performance during the cleanup. Both are made up of multiple fields: Index for: emailto, msgid, msgdate, deliver, expire and one for: MsgID, Deliver, Expire, ServerID This may help increase performance while deleting records. You can create the indexes easily using the MySQL Administrator, or if you wish, you can use execute the SQL statements below: ALTER TABLE `SpamFilter`.`tblquarantine` ADD INDEX Optimize_1(`emailto`, `msgid`, `msgdate`, `deliver`, `expire`); ALTER TABLE `SpamFilter`.`tblquarantine` ADD INDEX Optimize_2(`msgid`, `deliver`, `expire`, `serverid`); In regards to your suggested query, I'd recommend against it. Deleting entries from the tblMsgs causes the database's built-in triggers we added as a safety-net to kick in to perform cascade deletes from the related records in the tblQuarantine, which should make the query run slowly. In addition, there isn't a parameter in it to specify how old the messages have to be before being deleted.
|
|
yapadu
Senior Member Joined: 12 May 2005 Status: Offline Points: 297 |
Post Options
Thanks(0)
|
Thanks for the detailed feedback Roberto, we do have an index on that field. I am just surprised how long it takes even with an index.
On my server with 512,000 records in tblquarantine and 360,000 records in tblmsgs the process takes between 2 - 4 minutes with an index. With no index, you can forget about it. I tried it, but aborted after 24 hours. Nothing further can be placed in quarantine causing an immediate backlog for servers that might try and place items in quarantine when this housekeeping is going on. If you have multiple spamfilter servers running, I don't see any need for both (you have a primary and secondary right?) of the servers to be executing the cleanup every hour. We have made some changes to our system and can run two different routes to do our housekeeping. Our 'regular' housekeeping now takes less than 4 seconds. A few times a day we can do a 'deep' housekeeping and it takes about 30 - 40 seconds. Will try it like this for a while and see how it goes, our spamfilter actually did not have a problem, just trying to make it faster as we did notice the servers getting backlogged from time to time. The backlog lasted a couple of minutes each time, so I suspect it was the housekeeping that was locking the tblmsgs table. |
|
--------------------------------------------------------------
I am a user of SF, not an employee. Use any advice offered at your own risk. |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.188 seconds.