Print Page | Close Window

Problems with keyword filter

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=4794
Printed Date: 13 March 2025 at 5:40pm


Topic: Problems with keyword filter
Posted By: sirrar
Subject: Problems with keyword filter
Date Posted: 18 December 2004 at 5:52am

Hi

Is there a way to define in the keyword list that it should only match the word if it stands alone?

Example:

I have "sex" in my keyword list. But if somebody writes msexhange in the e-mail, the mail will be qurantined with a hit on sex ???

 

Best regards

Torsten Christiansen




Replies:
Posted By: Lee
Date Posted: 18 December 2004 at 5:43pm
I had the same problem with Cialis and the word specialist. :)


Posted By: Guests
Date Posted: 18 December 2004 at 9:19pm

I believe if you want to encapsulate spaces or other non standard characters in the string you can list the keyword entry like this:

( cialis )

Then the match must include the spaces.  This would resolve your issue.

Maybe Robert can confirm or deny on this.  

- Matt R.

 



Posted By: sirrar
Date Posted: 19 December 2004 at 2:36am

Im sure that's a part of the soloution.

There's a litte problem with the

( keyword )

ex:

If keyword is the only word on a line, with no spaces before or after the keyword, it will be let through.

 

Best regards

Torsten Christiansen



Posted By: sirrar
Date Posted: 19 December 2004 at 2:51am

We could then of course make 3 entries:

(keyword ) -no space before

( keyword) -no space after

( keyword ) - space before and after

So that if keyword is on start on line or end of line, or in the middle of a line (with space before or after) it would be caught. But that still doesen't solve when keyword is the only word on a line without spaces before or after.

I used to run a Norton spamfilter, with that we should have the keyword in "" ex:

"keyword" which told the spamfilter to catch keyword in all 4 cases. Without taking it when it was in another word ex

"sex"

Then msexchange would not be caught.

 

Best regards

Torsten Christiansen

 



Posted By: sirrar
Date Posted: 19 December 2004 at 3:58am

To summarize.

Originally I got a lot of false positives when having the word sex in my keyword files.

Words like msexchange would hit on that word.

The another user said to use ( sex ) instead, and I have put a bit of work into that.

Found out to make 3 entries for that (actually 6, also have them with Subject: in front):

( sex ) - take out sex if there is an space before and after (hit on: do you like sex my frind)

(sex ) - take out sex if there is an space after (hit on: sex is a bad thing)

( sex) - take out sex if there is an space before (hit on: du you like sex)

Then I don't get any false positives anymore. But now I have opened up to get potential spam again. Because now when the word sex stands for it self on a line in either subject or body the message gets trough my spamfilter.

Is there a way to get this fixed?

Maybe Logsat would make a change in the spamfilter, so there's a way to define in the keyword list that it's when the keyword stands for itself and not as a part of a word.

Hope You get my point, or please feel free to ask!

 

Best regards...

Torsten Christiansen



Posted By: Guests
Date Posted: 19 December 2004 at 8:51am

I understand your dilemma and can see the need. The problem is that if all you have to work with is "sex" on it's own line and no other offending words, you really should not be expecting to use keyword filters to block.  Keyword filters are likely the most processer intensive blocking method used in SpamFilter. If you can't use keyword combinations ie (sex, hardcore) without the parens, you should stick to words that are not really words or that can easily identify spam.  Then focus on other blocking mechanisms in SpamFilter that are less processor intensive.

I do not advocate the continual addition of features and programming because of the additional processing time it already takes to process emails.  SpamFilter is very fast, depending on how you setup, but to keep adding more processing logic naturally is going to force us all to keep upgrading our hardware, etc.



Posted By: Guests
Date Posted: 20 December 2004 at 2:25am

Try

(^sex$)



Posted By: LogSat
Date Posted: 20 December 2004 at 10:25pm
Torsten,

The functionality you are requesting should be easily be obtainable by using the following RegEx expression:

(\bsex\b)

the \b RegEx metacharachter is the "word boundary" modifier, used to isolate words in strings. You can find a better explanation than I'll ever manage to provide at: http://www.regular-expressions.info/wordboundaries.html" CLASS="ASPForums" TITLE="WARNING: URL created by poster. - http://www.regular-expressions.info/wordboundaries.html

Roberto F. LogSat Software


Posted By: LogSat
Date Posted: 20 December 2004 at 10:27pm
Alexey,

Thanks for the suggestion. That won't work in all cases I believe though. I've responded with our suggested RegEx expression at http://www.logsat.com/spamfilter/forums/showmessage.asp?messageID=4817" CLASS="ASPForums" TITLE="WARNING: URL created by poster. - http://www.logsat.com/spamfilter/forums/showmessage.asp?messageID=4817

Roberto F. LogSat Software


Posted By: sirrar
Date Posted: 21 December 2004 at 8:32am

Thankyou very much.

That single entry in my keyword list solves my problem.

Instead of 3 entrys with a opening for a single word on one line. The entry (\bsex\b) takes out everything and not when inside other words ex: msexchange.

 

Again Thankyou!!!

A happy christmas and new year to you all (without spam<:-) )

 

Best regards...

Torsten Christiansen



Posted By: dcook
Date Posted: 21 December 2004 at 12:28pm
Thanks, good idea, this will help.  Merry Christmas to all!



Print Page | Close Window