Print Page | Close Window

RegEx, Line breaks, and Case Sensitive Keywords

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1530
Printed Date: 13 March 2025 at 3:58pm


Topic: RegEx, Line breaks, and Case Sensitive Keywords
Posted By: Guests
Subject: RegEx, Line breaks, and Case Sensitive Keywords
Date Posted: 29 July 2003 at 7:09pm

I've been reading over the Regular_Expressions.htm file that installs with SpamFilter and have been trying to figure out how to make my keywords do two things, but I keep failing miserably (due to being a novice and not great with programming).  Any help would be grand.  I think I just don't know how to construct regular expressions at all.

1) I'd like to make a keyword string be case insensitive.  Currently, mixed case is getting through.  For example, if I have "human growth hormone" as a keyword string and "Human Growth Hormone" is a string in the email, it goes through because the cases don't match.

2) Similarly, if a string has a line break in it, it too is getting through.  For example:

human
growth
hormone

is getting through because the message hard coded line breaks.

Before you all flame me, I did read the Regular Expressions file several times and spent a couple hours trying to do these otherwise simple operations.  Apparently I just don't get it so I beg your collective forgiveness and earnestly request whatever help you can give.

Thanks,
--DM




Replies:
Posted By: Desperado
Date Posted: 29 July 2003 at 7:55pm

DM,

First, my view is just that ... my view.  You may want to read through all the recent posts on RexEx for some ideas but, if the RegEx has a "literal" word in all lower case, it will detect ANY case in the scanned message.  My experience says that this behavior is a function of the specific RegEx "Engine" ... in this case, and I do not know this for a fact, but it is acting so close to Delphi's engine that it must be modeled after that compilers interpreter.

Now that I have possibly made a fool of myself ... My "View" is that you do not want to look for specific words but rather the techniques that the spammers use to obscure the text itself.  If you look at some of my more recent posts, you will see that that is what I am trying to do ... and for the most part, it does a good job.

If you take a look at the actual source of a message, NOT the "rendered" version that you see in your mail client, you will see that most spam is riddled with strange html comments, and %'s and all sorts of crap.  That's what you want to build a filter to find.

Again ... my opinion only.  Now everyone can focus on shooting me down ... rather than you!

Dan S.

 



Posted By: Guests
Date Posted: 30 July 2003 at 1:33pm

Dan et. al,

Thanks for your reply.  I'm definitely going to start implementing some of the more advanced techniques as seen here on the site.  I've put one filter in place that works with the eleven-character comment tags.  However, some crap keeps coming in.

I'd still like to know how to make keywords case-insensitive though, as a lot of spam that reaches my inbox has certain keywords in the subject line that, if filtered case-insensitive, may reduce spam further.

Thanks again,
--DM



Posted By: Desperado
Date Posted: 30 July 2003 at 8:50pm

DM,

 

Actually, I did answer it ... If you use all lower case in your RegEx (except for special chars that require caps), then the match will work for BOTH upper and lower case.  Example:

(<html>)  will match <html> or <Html> or <HtMl>  etc.

Dan S.




Print Page | Close Window