Help with multiple RegEx search strings and using |
Post Reply ![]() |
Author | |
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() Posted: 27 December 2004 at 5:15pm |
Ok let me setup a simple example to ask my question. If I want to block "bob" you put that in your keyword list. Now if I wanted to block "bob" and "b o b" I put this line in my keywords: (bs*os*b) the s looks for a space and * looks for zero or more spaces between letters I've tested this and it works well but you have to be very careful using it. Now I wanted to take it one more step using the "|" to stop things like b-o-b so I tried: (b(s|-*)o(s|-*)b) and it worked great BUT it's not only going to block "b o b" but it will block "bob" also. So I tried to use a comma to add a keyword to the front or back of this and I cannot get it to work. eg: (b(s|-*)o(s|-*)b),test So it SHOULD block "bob test" or "b o b test" But it's not working. Am I using the proper syntax or the correct number of brackets or what? I lost my mind this holiday season so any help would be appreciated. Thanks, Bob |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
Hmmm ... I am slightly confused. EXACTLY what do you want blocked and *not* want blocked? As an example: ((?i)((b.o.b))) Will block B O B and b-o b etc but not Bob. Is this the sort of thing you are looking for? WARNING ... this will also block baoab (not that that is a word!) Dan S. |
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Sorry for my rambling I think I'm getting close to an answer but here's another way to say it. Say we are going to block "online drugs" we add: online,drugs to the keyword list but we also need the variations using cheap drugs and online pharmacy so it now would read cheap|online,drugs|pharmacy or we could use brackets (cheap|online),(drugs|pharmacy) now if I wanted to look for upper/lower case ((?i)(cheap|online),(drugs|pharmacy)) At this point everything is ok but if we take to the next level and block anyone putting a space or hypen between letters then I start getting string errors. I'll just use one word to as an example so this doesn't get too long ((?i)(c(s|-*)h(s|-*)e(s|-*)a(s|-*)p|online),(drugs|pharmacy)) When I do this I start getting string errors in my log and I'm pretty sure I'm not using the proper amount of brackets in the correct places? Once I start adding (s|-*) between the letters of "cheap" do I need another set of brackets to enclose the word and then another set to enclose "cheap|online" ???? So back to my original post I have found this to work well between letters (s|-*) as it will look for zero or more of both space and hypen but you have to be careful with it, if you put this in your keywords by itself c(s|-*)h(s|-*)e(s|-*)a(s|-*)p it will not only block any variation of "c h e A p" or "c-h-e-a-p" but it will also block the plain word "cheap" and now you have people upset with you. So I tried to use this rule with "|" and "," to put more keywords together and that's where my troubles started. Sorry for the long post but I hope this makes more sense. Thanks, Bob |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
First, whats with the commas? I do not think you want any commas in your RegEx's Let's start there and I will look at your expressions early this afternoon. Dan
|
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Now we are on the same wave length and I wonder if this is the problem. You use comma's to separate keywords you want the filter to find anywhere in the email body. This is a function of SpamFilter coding not RegEx. So the Million dollar question :-) that I'm trying to get to: Can you use two different RegEx rules on the same line AND separate them by comma's so the rule will find two keywords in an email? Does this help? |
|
![]() |
|
Desperado ![]() Senior Member ![]() ![]() Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
![]() ![]() ![]() ![]() ![]() |
I have been wrong before and I hope Roberto will correct me if I mis-state this but ... *do not* use commas. They will be treated as literal commas and will not have the effect you think. Also, The SpamFilter implementation of RegEx can not "look ahead" so the "AND" function is not valid. What this means is that you can find 2 words in a message using a variety of ways to separate them but *only* in the order they are presented in the expression.
An additional issue is, if the expression is written to look for 2 words separated by too many characters, the expression may fail with a "Loop Stack Exceeded" error. Search your logs for "String Match" and you will locate any RegEx's that are failing. Often, however, a RegEx works in some cases but causes a "Loop Stack Exceeded" error in others. In fact, I just spent many hours "adjusting" 2 expressions that were causing that error way too often. I feel that it is probably a good idea to limit the "Search Scope" of an expression to avoid those errors as with the expression example below: ((?i)Subject:(([\s]|[\!-\xB4]){0,10}[\|]){2}) I would prefer to *not* use the "{0,10} clause but if I don't, I get many failures. BTW, if anyone has a less complex way of doing the above RELIABLY, please chime in. This expression very simply catches any instance of 2 or more "Pipe" characters anywhere in the subject even if the pipes are separated by any of the first 115 ASCII characters. Dan S. |
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Roberto, Can you give us some input here? I think Dan is saying the same this as me but clarify: Is it ok to use comma's when seprating keywords EXCEPT when you use RegEx on both sides of the comma? This is one of my top rules and obviously it's working well: ((?i)cheap|online,(d(\^|\s|-|\.*)r(\^|\s|-|\.*)u(\^|\s|-|\.*)g(\^|\s|-|\.*)s)) This rule first finds cheap or online THEN looks for any variation of the word "drug" d r u g d.r.u.g d-r-u-g or any variation d.r u-g But as soon as I try to add any RegEx to the "cheap or online" the rule stops working? I'm still doing some testing and I think this will work but I'm not getting my escape brackets in the correct spot or enough/too many of them. I'll post more as I find more but Roberto may have some better insight for us? |
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Dan,That was a good call, the comma in a RegEx *will* be interpreted as a literal, it cannot be used to "and" two expressions. RegEx is as powerful as it is complicated, so we did that by design as with RegEx it should usually be possible to construct an expression such as to obtain what is required "and-ing" two or more expressions.Roberto F.
LogSat Software
|
|
![]() |
|
LogSat ![]() Admin Group ![]() ![]() Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
![]() ![]() ![]() ![]() ![]() |
Bob,Dan is correct. Please see http://www.logsat.com/spamfilter/forums/showmessage.asp?messageID=4921 for a confirmation.Roberto F.
LogSat Software
|
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Ok guys thanks for the feedback but through this process I've figured out how to make it work. I have multiple Regex rules separated by comma's including some plain words and it appears to be working and becoming one of my top rules in my "Top Keywords" report. Let me monitor this for the next week or so and I'll report back as it appears to be a very powerful way to use RegEx multiple times in complicated rules. Bob |
|
![]() |
|
CyberBob ![]() Groupie ![]() Joined: 26 January 2005 Status: Offline Points: 43 |
![]() ![]() ![]() ![]() ![]() |
Dan or Roberto, What RegEx would you use to replace a comma in the fashion I'm trying to use it to "AND" two "expressions" or keywords together in an email? I'd love to use (\s*) between each letter of mortgage but using the * will also block the actual word "mortgage." So I need to combine a couple keywords like approved,((m(\s*)o(\s*)r(\s*)t(\s*)g(\s*)a(\s*)g(\s*)e)) What would you use to replace the comma? Thanks in advance, Bob |
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.195 seconds.