Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed -
  FAQ FAQ  Forum Search   Register Register  Login Login

"RegEx for fun & profit"

 Post Reply Post Reply
Author
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Topic: "RegEx for fun & profit"
    Posted: 11 July 2003 at 8:45am
 
All,
 
I thought I would throw out a few ideas to convince some of the SpamFilter ISP users of the potential power of RegEx's (Regular Expressions).  Perhaps this will help when you are trying to nail some of the more ingenious Spam techniques without "throwing out the baby with the bath water". 
 
Let me preface this with a "Disclaimer".  I am no expert with Regular Expressions but having a fair amount of experience with Perl, I have been forced to learn and use them over the years.  Each software package has it's own "Engine" to interpret the expressions so you always have to "Play" with them to get them right.  I make no claims whatsoever about the accuracy of the information below. DO NOT USE the expressions I have here ... use them only as a starting point. I should also state that I am in no way affiliated with LogSat and as such, LogSat can not take any responsibility for any of my stupid mistakes!
 
I feel that anything that knocks out a few Spams here and a few Spams there eventually adds up to help but it is important to make sure that any filter is actually doing something useful because the longer your black lists are, the harder the software has to work.  I do a log parse run each day to see if my filters are effective and I take anything out that is not helping.
 
OK ... One thing I did was come up with a "standard" expression that will describe a generic email address construct as:
 
(([\-a-zA-Z0-9_\.\+])+@([\-a-zA-Z0-9_\.\+]+\.)+[a-z]{2,6})
 
Once you have this, you should be able to use the format to kill off "Bad" addresses.  As an example, Hotmail has announced that any address starting with a digit, is not valid.  Therefore, I can construct an expression such as:
 
(\b[\d+]+([\-a-zA-Z0-9_\.\+])+@hotmail\.com)  to detect and block it.  WARNING:  I believe that if there is one bad address in the "TO" field, the entire message gets blocked so this should only be used in the "From" field.
 
Here is a list I have come up with that describes some know "Bad" email constructs:
 
  • numeric-only localparts aol.com, msn.com, bellsouth.net, brandeis.edu
  • localparts starting with a digit from juno.com and hotmail.com
  • localparts longer than 16 characters from aol or hotbot or canada.com
  • localparts w/ _ and longer than 16 characters and at least 1 digit @(hotbot|juno|rocketmail|excite|hotmail|mail).com
  • test*@test.com
For a good laugh,  This is the regular expression that I used in my Sendmail Server to attempt to slow the flood down.  I AM NOT RECOMENDING THIS!  This EXACT RegEx does, in fact, work with "ActiveState" Perl!
 
 ^(mailer\-daemon[0-9]+.*<@.*|.*([0-9].*prsesly|discounts|software[0-9])<@yahoo\.com|.*(saveonink|printsupplies|inkjet|toner_).*<@.*|subscriber_services[0-9]+<@.*|test.*<@test.*\.com|[0-9]+<@(aol\.com|msn\.com|bellsouth\.net|brandeis\.edu)|[0-9][^<]*<@(hotmail|juno)\.com|.{16}[^<]+<@(canada|aol|hotbot)\.com|.{10}.*_.{2}.*[0-9].{2}.*<@(hotmail|juno|rocketmail|hotbot|excite|yahoo|msn|mail)\.com|.*free4you<@.*|.*_...._._._.<@.*brandeis\.edu|INVESTMENT_ALERT-.*|xtrafreeporn.*|Nasdaq_Newsdesk.*|ListsOnSale.*|InvestorInsights__.*|subscriptionssavings_.*|MarketingLists.*[0-9].*<@.*)\.?>
 
Wasn't that fun?
 
Dan S.
 
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.219 seconds.