RegEx, Line breaks, and Case Sensitive Keywords |
Post Reply ![]() |
Author | |
DigitalMan ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() Posted: 29 July 2003 at 7:09pm |
I've been reading over the Regular_Expressions.htm file that installs with SpamFilter and have been trying to figure out how to make my keywords do two things, but I keep failing miserably (due to being a novice and not great with programming). Any help would be grand. I think I just don't know how to construct regular expressions at all. 1) I'd like to make a keyword string be case insensitive. Currently, mixed case is getting through. For example, if I have "human growth hormone" as a keyword string and "Human Growth Hormone" is a string in the email, it goes through because the cases don't match. 2) Similarly, if a string has a line break in it, it too is getting through. For example: human is getting through because the message hard coded line breaks. Before you all flame me, I did read the Regular Expressions file several times and spent a couple hours trying to do these otherwise simple operations. Apparently I just don't get it so I beg your collective forgiveness and earnestly request whatever help you can give. Thanks, |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
DM, First, my view is just that ... my view. You may want to read through all the recent posts on RexEx for some ideas but, if the RegEx has a "literal" word in all lower case, it will detect ANY case in the scanned message. My experience says that this behavior is a function of the specific RegEx "Engine" ... in this case, and I do not know this for a fact, but it is acting so close to Delphi's engine that it must be modeled after that compilers interpreter. Now that I have possibly made a fool of myself ... My "View" is that you do not want to look for specific words but rather the techniques that the spammers use to obscure the text itself. If you look at some of my more recent posts, you will see that that is what I am trying to do ... and for the most part, it does a good job. If you take a look at the actual source of a message, NOT the "rendered" version that you see in your mail client, you will see that most spam is riddled with strange html comments, and %'s and all sorts of crap. That's what you want to build a filter to find. Again ... my opinion only. Now everyone can focus on shooting me down ... rather than you! Dan S. |
|
![]() |
|
DigitalMan ![]() Guest Group ![]() |
![]() ![]() ![]() ![]() ![]() |
Dan et. al, Thanks for your reply. I'm definitely going to start implementing some of the more advanced techniques as seen here on the site. I've put one filter in place that works with the eleven-character comment tags. However, some crap keeps coming in. I'd still like to know how to make keywords case-insensitive though, as a lot of spam that reaches my inbox has certain keywords in the subject line that, if filtered case-insensitive, may reduce spam further. Thanks again, |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
DM,
Actually, I did answer it ... If you use all lower case in your RegEx (except for special chars that require caps), then the match will work for BOTH upper and lower case. Example: (<html>) will match <html> or <Html> or <HtMl> etc. Dan S. |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.215 seconds.