Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Problems with keyword filter
  FAQ FAQ  Forum Search   Register Register  Login Login

Problems with keyword filter

 Post Reply Post Reply
Author
sirrar View Drop Down
Groupie
Groupie
Avatar

Joined: 26 January 2005
Location: Denmark
Status: Offline
Points: 44
Post Options Post Options   Thanks (0) Thanks(0)   Quote sirrar Quote  Post ReplyReply Direct Link To This Post Topic: Problems with keyword filter
    Posted: 18 December 2004 at 5:52am

Hi

Is there a way to define in the keyword list that it should only match the word if it stands alone?

Example:

I have "sex" in my keyword list. But if somebody writes msexhange in the e-mail, the mail will be qurantined with a hit on sex ???

 

Best regards

Torsten Christiansen

Back to Top
Lee View Drop Down
Groupie
Groupie


Joined: 04 February 2005
Location: United States
Status: Offline
Points: 50
Post Options Post Options   Thanks (0) Thanks(0)   Quote Lee Quote  Post ReplyReply Direct Link To This Post Posted: 18 December 2004 at 5:43pm
I had the same problem with Cialis and the word specialist. :)
Back to Top
Matt R View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Matt R Quote  Post ReplyReply Direct Link To This Post Posted: 18 December 2004 at 9:19pm

I believe if you want to encapsulate spaces or other non standard characters in the string you can list the keyword entry like this:

( cialis )

Then the match must include the spaces.  This would resolve your issue.

Maybe Robert can confirm or deny on this.  

- Matt R.

 

Back to Top
sirrar View Drop Down
Groupie
Groupie
Avatar

Joined: 26 January 2005
Location: Denmark
Status: Offline
Points: 44
Post Options Post Options   Thanks (0) Thanks(0)   Quote sirrar Quote  Post ReplyReply Direct Link To This Post Posted: 19 December 2004 at 2:36am

Im sure that's a part of the soloution.

There's a litte problem with the

( keyword )

ex:

If keyword is the only word on a line, with no spaces before or after the keyword, it will be let through.

 

Best regards

Torsten Christiansen

Back to Top
sirrar View Drop Down
Groupie
Groupie
Avatar

Joined: 26 January 2005
Location: Denmark
Status: Offline
Points: 44
Post Options Post Options   Thanks (0) Thanks(0)   Quote sirrar Quote  Post ReplyReply Direct Link To This Post Posted: 19 December 2004 at 2:51am

We could then of course make 3 entries:

(keyword ) -no space before

( keyword) -no space after

( keyword ) - space before and after

So that if keyword is on start on line or end of line, or in the middle of a line (with space before or after) it would be caught. But that still doesen't solve when keyword is the only word on a line without spaces before or after.

I used to run a Norton spamfilter, with that we should have the keyword in "" ex:

"keyword" which told the spamfilter to catch keyword in all 4 cases. Without taking it when it was in another word ex

"sex"

Then msexchange would not be caught.

 

Best regards

Torsten Christiansen

 

Back to Top
sirrar View Drop Down
Groupie
Groupie
Avatar

Joined: 26 January 2005
Location: Denmark
Status: Offline
Points: 44
Post Options Post Options   Thanks (0) Thanks(0)   Quote sirrar Quote  Post ReplyReply Direct Link To This Post Posted: 19 December 2004 at 3:58am

To summarize.

Originally I got a lot of false positives when having the word sex in my keyword files.

Words like msexchange would hit on that word.

The another user said to use ( sex ) instead, and I have put a bit of work into that.

Found out to make 3 entries for that (actually 6, also have them with Subject: in front):

( sex ) - take out sex if there is an space before and after (hit on: do you like sex my frind)

(sex ) - take out sex if there is an space after (hit on: sex is a bad thing)

( sex) - take out sex if there is an space before (hit on: du you like sex)

Then I don't get any false positives anymore. But now I have opened up to get potential spam again. Because now when the word sex stands for it self on a line in either subject or body the message gets trough my spamfilter.

Is there a way to get this fixed?

Maybe Logsat would make a change in the spamfilter, so there's a way to define in the keyword list that it's when the keyword stands for itself and not as a part of a word.

Hope You get my point, or please feel free to ask!

 

Best regards...

Torsten Christiansen

Back to Top
Matt R View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Matt R Quote  Post ReplyReply Direct Link To This Post Posted: 19 December 2004 at 8:51am

I understand your dilemma and can see the need. The problem is that if all you have to work with is "sex" on it's own line and no other offending words, you really should not be expecting to use keyword filters to block.  Keyword filters are likely the most processer intensive blocking method used in SpamFilter. If you can't use keyword combinations ie (sex, hardcore) without the parens, you should stick to words that are not really words or that can easily identify spam.  Then focus on other blocking mechanisms in SpamFilter that are less processor intensive.

I do not advocate the continual addition of features and programming because of the additional processing time it already takes to process emails.  SpamFilter is very fast, depending on how you setup, but to keep adding more processing logic naturally is going to force us all to keep upgrading our hardware, etc.

Back to Top
Alexey View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Alexey Quote  Post ReplyReply Direct Link To This Post Posted: 20 December 2004 at 2:25am

Try

(^sex$)

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 20 December 2004 at 10:25pm
Torsten,

The functionality you are requesting should be easily be obtainable by using the following RegEx expression:

(\bsex\b)

the \b RegEx metacharachter is the "word boundary" modifier, used to isolate words in strings. You can find a better explanation than I'll ever manage to provide at: http://www.regular-expressions.info/wordboundaries.html

Roberto F. LogSat Software
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 20 December 2004 at 10:27pm
Alexey,

Thanks for the suggestion. That won't work in all cases I believe though. I've responded with our suggested RegEx expression at http://www.logsat.com/spamfilter/forums/showmessage.asp?messageID=4817

Roberto F. LogSat Software
Back to Top
sirrar View Drop Down
Groupie
Groupie
Avatar

Joined: 26 January 2005
Location: Denmark
Status: Offline
Points: 44
Post Options Post Options   Thanks (0) Thanks(0)   Quote sirrar Quote  Post ReplyReply Direct Link To This Post Posted: 21 December 2004 at 8:32am

Thankyou very much.

That single entry in my keyword list solves my problem.

Instead of 3 entrys with a opening for a single word on one line. The entry (\bsex\b) takes out everything and not when inside other words ex: msexchange.

 

Again Thankyou!!!

A happy christmas and new year to you all (without spam<:-) )

 

Best regards...

Torsten Christiansen

Back to Top
dcook View Drop Down
Senior Member
Senior Member
Avatar

Joined: 31 January 2005
Location: United States
Status: Offline
Points: 174
Post Options Post Options   Thanks (0) Thanks(0)   Quote dcook Quote  Post ReplyReply Direct Link To This Post Posted: 21 December 2004 at 12:28pm
Thanks, good idea, this will help.  Merry Christmas to all!
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.176 seconds.