Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - RegEx filter
  FAQ FAQ  Forum Search   Register Register  Login Login

RegEx filter

 Post Reply Post Reply
Author
Erik Reed View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Erik Reed Quote  Post ReplyReply Direct Link To This Post Topic: RegEx filter
    Posted: 27 June 2003 at 12:01pm

I tried using the "magic" block suggested previously in this forum of:

(<[!--]*[a-zA-Z0-9]{11,})

Immediatly upon applying this EVERY email that was received was blocked... What did I do wrong here?  And no, the messages were perfectly valid emails.  I immediatly removed this line.  Upon inspecting the messages I noticed there were a lot of !--fjkdfdfkjdjf like stuff in the headers but none preceeded by a <.   Most of the messages were from AOL accounts, which appears to add a TON of junk into the headers...

any help would be appreciated as I want to eliminate the junk with invisible (invalid) html tags...

Thanks

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 27 June 2003 at 12:33pm

Erik,

If you actually are using the following:

(<[!--]*[a-zA-Z0-9]{11,})  It should work .... However, it does produce some "False Positives" but not ALL messages. An earlier posting, and I can not remember by whom, made the following very subtle change"

(<[!--]+[a-zA-Z0-9]{11,})

This is the one I am using and have not, seen or received comments about ANY "false positives"

Double check your actual entered expression (and change the "*" to "+") and see if it works then.

Dan S.

 

Back to Top
Erik Reed View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Erik Reed Quote  Post ReplyReply Direct Link To This Post Posted: 27 June 2003 at 2:02pm

Thanks.

I just applied the "subtle" change, it seems to be working...

Back to Top
Dan B View Drop Down
Senior Member
Senior Member
Avatar

Joined: 09 February 2005
Location: United States
Status: Offline
Points: 105
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dan B Quote  Post ReplyReply Direct Link To This Post Posted: 27 June 2003 at 9:26pm

We had that in our keywords black list but I had to remove it.  It was catching messages from anyone who was using IncrediMail.  IncrediMail inserts a comment in the body of the message <!--IncrediMail-->  It triggers that due to IncrediMail is 11 chars long.

It's too bad that we couldn't do logic within the black list like if, then, else statements.  If it finds the <!--IncrediMail--> in a message bypass the regex filter.

If anyone has a fix for this, please send it my way.

Thanks,
Dan

 

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 27 June 2003 at 11:12pm

2 things ... I guess you could change the 11 to a 12 and I do not think I am getting false positives on the expression.  Got a good way to search a 2GIG DB for incredimail ? A way that I won't grow old waiting for?

Dan S.

 

Back to Top
Dan B View Drop Down
Senior Member
Senior Member
Avatar

Joined: 09 February 2005
Location: United States
Status: Offline
Points: 105
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dan B Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 12:58am

Dan,

Question for you.
How many SF servers do you have online?
How is the performance of your SF server?
What database are you using?

We are having problems with memory consumption.  Here are the specs of our 4 servers.

4 servers are the same.
AMD 2200XP single processor
512MB DDR ram
20GB HD
Windows 2000
SP4

Database server
AMD 2500XP single processor
1GB DDR ram
4x40GB striped HD
Windows 2000
SP4
MS SQL 7 w/SP4

Every time the SF does a quarantine auto refresh the CPU on that server spikes 100%. The CPU spike lasts for about 4 mins.  I monitor the system resources and while it's at 100% the memory usage history starts to climb.  After the CPU comes back to normal the memory does not get released. I do not know what it interval is that the quarantine auto refreshes but when it does the CPU again is 100% and the memory climbs more.  This continues until there is no more memory available. At that time either the SF stops responding or it throws an application error on the screen.

The database has 2 days of quarantine data in it. About 366,000 records in the tblquarantine and 296,000 records in the tblmsgs.

I feel that if there was an option to disable that auto refresh it may solve the issue.
Roberto, Can you add the feature to disable the auto quarantine refresh?  We do not use the GUI to delete quarantine messages.  We have a scheduled task to delete the tblquarantine & tblmsgs records it’s much faster deleting them then SF is doing it.
I’m still waiting on the feature in SF to disable the interval to delete expired records and I do have it set to zero and the error “Exception occurred during TimerMinuteTimer: Division by zero” does display.  Because what that was running we were getting more memory consumption more often.

We were using MySQL to handle the load until the tblquarantine reaches about 600,000 records.  Once that was reached we started getting more table read & write locks.  When that occurred all the spamfilter servers just stopped responding since it couldn’t insert or select records.  We actually had to stop and restart the MySQL service to get the spamfilters back to normal.  I prefer to us MySQL instead of MS SQL not because of price. But I feel that it’s faster on select & write statements.

Do you or anyone out there have any solutions on this matter?
I guess I’m tired of babysitting these servers every min of the day.

Thanks,
Dan B

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4106
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 1:39am

Dan B, Dan S.,

The large memory consumption is mainly caused by the quarantine grid display. We're testing a new build internally with some enhancements, one of which is to clear the grid display, which releases all the memory used by it. I was not aware you had such large databases, this should have a great decrease of RAM, possibly down to using just a few dozens Megs.

We've also added the "0" option to disable the quarantine delete as requested.

The release notes for this build are the following:

// New to VersionNumber = '1.2.0.171';
{TODO -cNew : Added button to clear quarantine grid to conserve memory}
{TODO -cNew : Setting to 0 the delete expired quarantine interval diables such interval}


// New to VersionNumber = '1.2.0.169';
{TODO -cFix : RegEx searches in black/white lists incorrect}
{TODO -cFix : Possibly solved issues with SpamFilter settings being cleared}
{TODO -cNew : IP Blacklist now allows .0.0 and .0.0.0 matches for class B & class A address matches}

I'll be sending you both a private email with the link to the updated EXEs if you wish to test it before we release it officially

Roberto Franceschetti
LogSat Software

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 4:36am

Dan,

I have been up for a VERY long time.  I will get more detailed after I get some rest but I will address some of your questions. All this is PRIOR to the "Private Build"


>>>How many SF servers do you have online?
I am running 2 SpamFilters .. One is my Primary and it takes about 85% of the traffic and the other in my secondary


>>>How is the performance of your SF server?
I don't seem to have as many issues as you but as I told Robert, I just happy that it is offloading my in-line Anti Virus server.  It was my real issue prior to installing SpamFilter-ISP

>>>What database are you using?
I am running MS SQL 2000

>>>We are having problems with memory consumption.
Memory was fairly high (300MB) but did not seem to cause me an issue.  With the new build it seems MUCH lower so far but this is a somewhat low traffic period for us.

My Primary Server Sits on the same Ethernet Segment as my DB server.

Primary Specs:
Dell PE 1400
Dual PIII 866 Procs
2GB RAM
PERC RAID 5 Array = 3 18GB  (For 36 GB Total w/ RAID-5)
Win 2K SP3 (With full patches)

Secondary Server is 20 Miles away in my home on a T1
HP LC3 Net Server  (No comments!)
Dual PIII 500 procs
1GB RAM
PERC RAID 5 Array = 3 9GB  (For 18 GB Total w/ RAID-5)
40GB non redundant Aux HD
Same OS

Database server
Dell PE 1400
Dual PIII 866 Procs
2GB RAM
PERC RAID 5 Array = 3 18GB  (For 36 GB Total w/ RAID-5)
Win 2K SP3 (With full patches)
MS SQL 2000 SP3 (plus MS's "oops patch")
MDAC 2.7 SP1

>>>Every time the SF does a quarantine auto refresh the CPU on that server spikes 100%....
I see higher CPU but not 100%  I also have 11 other DB's running under the same instance of SQL and they are doing real time webserver log statistics for about 75 websites.  That is where most of my SQL server loading is.  SpamFilter has crashed once in the last 5 days and my service recovery setting restarted it normally.  I did not even get an alarm because it wasn't down long enough. I believe my higher memory capacity is the major difference here.

>>>>The database has 2 days of quarantine data in it.
I have a 14 Day expire.  My customers also tend to "Prune" their quarantines very often.  We have built some very customized ASP pages to make their clean up tasks painless.  I clear over 3000 a day just from my personal accounts

>>>>About 366,000 records in the tblquarantine and 296,000 records in the tblmsgs.
I hover around the same values so my traffic must be lower than yours.  About 50,000 total inbound messages a day.  My Database is around 2GB


>>>I feel that if there was an option to disable that auto refresh it may solve the issue.
I agree and it looks like the custom build won/t refresh unless you ask it to.  I like that because I rarely use the GUI and don't want the extra load when I do use it.

>>>We have a scheduled task to delete the tblquarantine & tblmsgs records it’s much faster deleting them then SF is doing it.
I am not sure what your issue is here.  I let my users and my expire timeout do the deleting and don't see that as a problem.  Am I missing something with what you are saying?


>>>I’m still waiting on the feature in SF to disable the interval to delete expired records and I do have it set to zero and the error “Exception occurred during TimerMinuteTimer: Division by zero” does display.  Because what that was running we were getting more memory consumption more often.

Again,  I am not seeing this ... perhaps the higher RAM and perhaps the dual procs is helping.  I have my delete interval set to (the default?) 60 minutes.

>>>We were using MySQL .....
I have just enough experience with MySql to get into trouble.  I actually like MS SQL but agree on the pricing.  We have found MS SQL 2000 to be MUCH better than V7


>>>I guess I’m tired of babysitting these servers every min of the day.
I have bee fortunate ... I have not needed to baby-sit the system any more than I normally do any other system.  I have a very good set of monitors that will page me if something horrible happens and it hasn't yet.

As a side note, I am almost worried about how Little memory this test build is using.  I am still only at 15 Meg.  I am running about 10Meg on my secondary.

SOME KEY NOTES:  I have the MS SQL set to use 1GB of RAM and NOT DYNAMIC but static.  Also, and this is real critical, The db transaction logging is set to "SIMPLE". I also do not allow my system PageFile to resize .... If I run out of memory ... well, I guess I crash but I have sized it so that does not happen.  The data partition for the databases ABSOLUTELY CAN NOT USE NTFS COMPRESSION.  Unless you like corrupt data and want SQL to try to constantly repair it.  My SQL "Maintenance plan" DOES NOT re-index or reorganize the data.

I have to get home now. More later.  I asked Roberto to give you my e-mail address if you want to shout at me directly.

Regards,

Dan S.

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 4:41am

Dan,

 

Here are my current, as of "Now" DB Statistics:

Current Database Contents


Current Msg. total: 333144 Marked for deletion: 232
No Reverse DNS: 120827 IP in MAPS B/L: 158633
Email From B/L: 39 Exc'd max RCPT TO: 562
Banned Keywords: 51897 Email TO B/L: 0
Domain B/L: 954 IP B/L: 0
Blocked by Bluto: 285771 Blocked by Raptor: 47373

DNSBL Distribution

relays.osirusoft.com: 51597 blackholes.easynet.nl 53778
list.dsbl.org 5399 dnsbl.njabl.org: 48020

Dan S

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 4:46am

Roberto,

So far, the memory ussage is VERY low around 13 meg or so.  I will let you know what happens when traffic picks up.  The lack of quarantine information is "Just what the Dr. ordered".  I like that better.  Question ... What effect will lowering or disabling the Delete interval?  I am not aware of it causing any problem ...

Regards,

Dan S.

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 5:56am

Roberto,

Some stats:

SpamFilter Mem after 2 hours running  ~ 12MB

SpamFilter Mem after refreshing DB ~ 255MB

SpamFilter Mem after clearing Data Grid ~ 12MB

The more I think about it, the more I feel that 99% of the problems Dan is having is directly related to high Memory useage.  I didn't see it because I have 4 times as much RAM.

Lookin' good.

Dan S.

 

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4106
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 11:31am

Yes, 10MB-20MB usage is much more like it! Again I was not aware of the DB size, and didn't realize the memory consumption it could cause.

Our experience with the Quarantine Delete Interval is the following.

The larger the interval, the more emails (records) will be deleted from the database when it kicks off. The more records, the slower and more memory intensive this operation is, both for SpamFilter and for the database. In SpamFilter most "things" are performed in separate threads, so it should not affect normal operations, but the CPU does go high when this is done.

Lowering the interval makes this process happen more often, but makes it also less resource intensive.

We found that 60 minutes is a good balance for us.

Roberto F.
LogSat Software

Back to Top
George View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote George Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 12:55pm

Dan,
Did you create a script to get all those nice stats or did you just create them on the fly.

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 28 June 2003 at 3:23pm

George,

It is a bunch of SQL Queries running from ASP.  I just came in from mowing to get H2O ... I can provide the information when I come in for real ... 5 Acres to mow and I need to "Hog" another 2.  Takes a while so be patient.  Much to my dismay, I lost the cooler part of the day servicing my side bar.  However, it IS good to get away from computers and networks for a bit!

Dan

Back to Top
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Posted: 30 June 2003 at 1:13pm

Yes please do share your coding with the rest, Dan.

 

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 30 June 2003 at 1:56pm

All,

A new "Thread" should be started for this ... it has nothing to do with RegEX ... we kinda drifted a little!  The ASP code I have is specific to my own menu system but the SQL Queries are "general".  Do you want just the SQL Queries or a copy of the 2 ASP pages I have that use them?  I have 2 pages because one just gets the general stats and is very fast.  The second one adds the "spread" of the DNSBL's that I use and with around 350K messages in my database, it takes just under a minute to get 4 of them so I only do that when I am trying to see if my order of bl's is optimal (what ever optimal is).

Dan S

Back to Top
Alan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alan Quote  Post ReplyReply Direct Link To This Post Posted: 30 June 2003 at 3:05pm
Why not supply both so others can simply get what they need?
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.166 seconds.