"RegEx for fun & profit" |
Post Reply |
Author | |
Desperado
Senior Member Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
Post Options
Thanks(0)
Posted: 11 July 2003 at 8:45am |
All,
I thought I would throw out a few ideas to convince some of the SpamFilter ISP users of the potential power of RegEx's (Regular Expressions). Perhaps this will help when you are trying to nail some of the more ingenious Spam techniques without "throwing out the baby with the bath water".
Let me preface this with a "Disclaimer". I am no expert with Regular Expressions but having a fair amount of experience with Perl, I have been forced to learn and use them over the years. Each software package has it's own "Engine" to interpret the expressions so you always have to "Play" with them to get them right. I make no claims whatsoever about the accuracy of the information below. DO NOT USE the expressions I have here ... use them only as a starting point. I should also state that I am in no way affiliated with LogSat and as such, LogSat can not take any responsibility for any of my stupid mistakes!
I feel that anything that knocks out a few Spams here and a few Spams there eventually adds up to help but it is important to make sure that any filter is actually doing something useful because the longer your black lists are, the harder the software has to work. I do a log parse run each day to see if my filters are effective and I take anything out that is not helping.
OK ... One thing I did was come up with a "standard" expression that will describe a generic email address construct as:
(([\-a-zA-Z0-9_\.\+])+@([\-a-zA-Z0-9_\.\+]+\.)+[a-z]{2,6})
Once you have this, you should be able to use the format to kill off "Bad" addresses. As an example, Hotmail has announced that any address starting with a digit, is not valid. Therefore, I can construct an expression such as:
(\b[\d+]+([\-a-zA-Z0-9_\.\+])+@hotmail\.com) to detect and block it. WARNING: I believe that if there is one bad address in the "TO" field, the entire message gets blocked so this should only be used in the "From" field.
Here is a list I have come up with that describes some know "Bad" email constructs:
For a good laugh, This is the regular expression that I used in my Sendmail Server to attempt to slow the flood down. I AM NOT RECOMENDING THIS! This EXACT RegEx does, in fact, work with "ActiveState" Perl!
^(mailer\-daemon[0-9]+.*<@.*|.*([0-9].*prsesly|discounts|software[0-9])<@yahoo\.com|.*(saveonink|printsupplies|inkjet|toner_).*<@.*|subscriber_services[0-9]+<@.*|test.*<@test.*\.com|[0-9]+<@(aol\.com|msn\.com|bellsouth\.net|brandeis\.edu)|[0-9][^<]*<@(hotmail|juno)\.com|.{16}[^<]+<@(canada|aol|hotbot)\.com|.{10}.*_.{2}.*[0-9].{2}.*<@(hotmail|juno|rocketmail|hotbot|excite|yahoo|msn|mail)\.com|.*free4you<@.*|.*_...._._._.<@.*brandeis\.edu|INVESTMENT_ALERT-.*|xtrafreeporn.*|Nasdaq_Newsdesk.*|ListsOnSale.*|InvestorInsights__.*|subscriptionssavings_.*|MarketingLists.*[0-9].*<@.*)\.?>
Wasn't that fun?
Dan S.
|
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.219 seconds.