Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - RegEx Versus KeyWords
  FAQ FAQ  Forum Search   Register Register  Login Login

RegEx Versus KeyWords

 Post Reply Post Reply
Author
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Topic: RegEx Versus KeyWords
    Posted: 04 July 2003 at 2:43am

All, (esp George)

My understanding is that a few of you have very extensive Keyword lists.  I am using Regular Expressions ONLY and at this point only have 6 entries.  I was wondering if someone who is using a very large keyword list could give me a comparison of percent blocked by keyword (comparing to my stats below).  Also, I do not seem to be getting ANY False positives in my own mail and none of my customers seem to be complaining about false positives ... how does that compare with the long keyword list.  Any information will be very useful to me. Please see my current Quarantine Stats "Snap Shot" below.

Current Database Contents


Current Msg. total: 367654 Marked for deletion: 0
No Reverse DNS: 119583 IP in MAPS B/L: 150683
Email From B/L: 167 Exc'd max RCPT TO: 927
Banned Keywords: 96123 Email TO B/L: 4
Domain B/L: 167 IP B/L: 0
Blocked by Bluto: 320028 Blocked by Raptor: 47626

Primary SMTP Statistics   (From May 25 2003)


Total Inbound Conn.: 1391086 EMail Attempts: 1393792
EMails Forwarded: 436543 EMails Blocked: 1291733

Dan S.

Back to Top
RBarrow View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote RBarrow Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2003 at 9:30am

I see you have no-reverse-dns turned on... we get hundreds of false positives from that feature and had to disable it... Are you not seeing the same kind of problem ???

 

Roy

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2003 at 9:45am

We do not consider No EDNS to be "False Positive". We consider that to be a configuration problem on the senders side.  They either fix it, or we do not receive their mail. We have never accepted connections with no RDNS even before SpamFilter ISP so our customers are used to having the sender contact their carrier to get it fixed.  We have a very clear policy on that and soon, ALL US ISP's will be doing the same (we hope).

There are always cases where the sender's carrier are jerks or laze but we usually shame them into fixing their DNS.

Dan S.

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 04 July 2003 at 10:03am

SORRY  NO "RDNS" ... I am very tired!

Dan

Back to Top
dcook View Drop Down
Senior Member
Senior Member
Avatar

Joined: 31 January 2005
Location: United States
Status: Offline
Points: 174
Post Options Post Options   Thanks (0) Thanks(0)   Quote dcook Quote  Post ReplyReply Direct Link To This Post Posted: 05 July 2003 at 9:17am

Hi Dan,

Could you share your regex list.  I am still gathering blocking tips.  

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 05 July 2003 at 12:01pm

Dwight,

 

I answered this in detail and then got a "script Timeout" when posting.  (ARRRRRGGGG!) I will try to generate the text locally and paste it in.

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 05 July 2003 at 12:39pm
Dwight,
 
Now I am lazy ... here is a "not so detailed version of what I wrote the first time.
 
I am still working on my VERY short list and am also working on the order of things in my other lists.  When you look at my "Stats", you will see that relays.osirusoft.com has very few "hits" That is because I killed off all the messages that had been blocked by that list so I could see if it is even worth having now that I changed the order.  relays.osirusoft.com is a very good list but is slow to respond at times and other lists carry the same blocks.
 
Back on the topic:
 
(<[!--]+[a-zA-Z0-9]{11,})     >>>>> Roberto's (modified) "magic" html comment blocker

(http://\w{0,6}%[\d])     >>>>> Blocks href's to partially obscured URL's.  WARNING!  Until I work something out with my buddy at "PayPal"  (he is one of their developers), I had to add *@paypal.com to my allowed FROM list.

(href="http://+[\d])     >>>>> Blocks any href to "Dotted IP's"

(content-type: text/html\r\ncontent-transfer-encoding: base64)   >>>>>>  This one is supposed to block Base 64 encoded HTML that isn't tagged as an inline image.  It seems to work sometimes and not others.  I have not found the difference between the ones blocked and the ones not blocked yet.
 
The "Clear Text" keywords below are only in until I find the "Correct" pattern to us a RegEx.

Mobutu Sese Seko
banhi.com
text-decoration: blink
 
I am working on an SQL query that will run once every 24 hours to give me a breakdown on how many of each expression blocked in the 24 hour period.  This will help me determine if a keyword is even worth having in or not.
 
Here are my latest stats
 

Current Database Contents


Current Msg. total: 362076 Marked for deletion: 0
No Reverse DNS: 117088 IP in MAPS B/L: 139252
Email From B/L: 254 Exc'd max RCPT TO: 1916
Banned Keywords: 103535 Email TO B/L: 25
Domain B/L: 7 IP B/L: 0
Blocked by Bluto: 315470 Blocked by Raptor: 46607

Primary SMTP Statistics   (Last Cleared May 25 2003)


Total Inbound Conn.: 1421750 EMail Attempts: 1423184
EMails Forwarded: 443379 EMails Blocked: 1327058

Secondary SMTP Statistics   (Last Clear

Back to Top
dcook View Drop Down
Senior Member
Senior Member
Avatar

Joined: 31 January 2005
Location: United States
Status: Offline
Points: 174
Post Options Post Options   Thanks (0) Thanks(0)   Quote dcook Quote  Post ReplyReply Direct Link To This Post Posted: 06 July 2003 at 3:48pm

Thanks Dan!

You have some good ideas.  I have put some into practice. Are you getting many false positives for dotted ip addresses?

I also wanted to ask if you could provide a link to your adminlogin.asp page for the stats.  That page is missing from your stats download post.

Dwight

 

 

Back to Top
Dan Seligmann View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dan Seligmann Quote  Post ReplyReply Direct Link To This Post Posted: 06 July 2003 at 8:02pm

Dwight,

Second question first ... The stats were a copy from our administrator page so I can't really give access to the actual web site.  Only our admins have that.  What stats were you looking for?

First question:  I guess we have to define "False Positive" for that one.  As an ISP, we have taken the stance that there is no excuse for "dotted IP's" in a URL.  If you can't put a DNS host name in, and validate some traceable ownership of that IP, we view that as obfuscation. By our definition, that's Spam.  So, by our definition, we have no false positives.

We have had cases where perfectly valid email had dotted IP's in a url and our customer informed their sending party and they fixed it.

I am not sure that was what you wanted to hear but it is our policy.  Again, what stats are you looking for?  I can try to give you a "snapshot" of our stats but I can't give public access to the site itseld.

Dan S.

Back to Top
Keizersozay View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Keizersozay Quote  Post ReplyReply Direct Link To This Post Posted: 09 July 2003 at 12:17pm
Is that a snapshot of the asp web pages provided by logsat? I haven't attempted to set it up yet but thats looks very cool.
Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 09 July 2003 at 12:30pm

No.  I have written a complete application in ASP for my customere, my "advanced" customers, and our internal Administrators.  The snapshot came from my advanced manager.  I see that the snapshot was "cut off" at the bottom.  I may put a link up of some sample screen shots to give users some ideas on what can be easily done to get some of this information out of the system.

Dan S.

Back to Top
john1 View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote john1 Quote  Post ReplyReply Direct Link To This Post Posted: 16 July 2003 at 2:53pm

Dan,

Reading your stats, what is "Bluto"? Also, presume Raptor is the firewall.

John

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 16 July 2003 at 4:56pm

John,

Actually, we have a primary and a secondary mail server.  The primary is "Bluto" and the secondary is "Raptor".  They are located in 2 separate locations but "report" to the same MS SQL server.

Dan S.

 

Back to Top
Dean View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dean Quote  Post ReplyReply Direct Link To This Post Posted: 30 August 2003 at 2:15am

Hello,

Could you please post a list of the regular expressions that you use so that I may implement them here? Any help would be greatly appreciated.

Dean

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 30 August 2003 at 2:34am

Dean,

The list I CURRENTLY use is as below HOWEVER ..... my list blocks a bunch of "Lists" that use a lot of extra garbage in their messages so I also have included a list of "Excluded From Addresses"

KEYWORDS:

((http|3dhttp)://.{0,26}(((%.+%))|@|:)[(\d|\w)])
(http://+[\d]{1,3}\.{1}[\d]{1,3}\.{1}[\d]{1,3}\.{1}[\d]{1,3})
((<[!--]+[\x20]{0,1}[a-zA-Z0-9]{10,}[\x20]{0,1}[!--](.+)){2,})
(<[!--]+[a-zA-Z0-9]{2}(-->))
((http://http:/\w)|(<(\w){3,10}(\x20/>)|(\*http://w)))
(<(!-- )+[a-zA-Z0-9=]{28,}( -->))
(content\-type:\x20text/(html|plain)(;{0,1})\r\ncontent-transfer\-encoding:\x20base64\r\n)
((limited time (special|offer)))
(((arge your p)|(3 - 5 inches\!)|(herbalpillsonline)|(herbaltrials\.com)|(pillsavings)|(gsc\-100)))
((text\-decoration: blink)|(click here to start))
((your privacy is extremely important to us)|(this is not spam))
(http://www.(\w){1,20}(4u).(biz|com|net))
((application\.zip|details\.zip|document_9446\.zip|document_all\.zip|movie0045\.zip|thank_you\.zip|your_details\.zip|your_document\.zip|wicked_scr\.zip))
((re: )+(wicked screensaver|details|approved|thank you!|that movie|your application|re: my details))
(See the attached file for details)
(this email address will be expiring)
selectgroupmedia.com
getit4less.biz
generic viagra
Unsubscribe By Postal
debtelimination


Excluded From Addresses:

*@paypal.com
*@listproc.pcworld.com
*@industryweek.com
*@gpsadvantage.com
*@gwbakeries.com
*@peoples.com
*@*.lga2.nytimes.com
*@*.*.nytimes.com
*@softshare.com
*@regulusgroup.com
*.*@dell.com
*@e-news.fsonline.com
*@lists.n-email.net
*@lists.techtarget.com
*@lyris.stockupticks.com
*@multexinvestornetwork.com
*@newsletter.online.com
*@insightmedia.info
*@nhfairfield.com
*@newhorizons.com
*@rootsweb.com
*@*.rootsweb.com
*@returns.groups.yahoo.com
*@cygnuspub.com
*@*.classmates.com
*@listserv.usairways.com

Regards,

Dan S.

Back to Top
DigitalMan View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote DigitalMan Quote  Post ReplyReply Direct Link To This Post Posted: 02 September 2003 at 2:11pm

Dan,

Getting some false positives with this latest post.  I previously was using your regex from about a month ago, which I think was producing fewer false positives, yet letting more spam in.  Any ideas?

--Clator

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.422 seconds.