<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="http://syndication.webwiz.co.uk/rss_namespace/">
 <channel>
  <title>Spam Filter ISP Forums : Keyword filter</title>
  <link>https://www.logsat.com/spamfilter/forums/</link>
  <description><![CDATA[This is an XML content feed of; Spam Filter ISP Forums : Spam Filter ISP Support : Keyword filter]]></description>
  <pubDate>Wed, 13 May 2026 18:37:32 +0000</pubDate>
  <lastBuildDate>Wed, 17 Sep 2003 23:54:00 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 11.04</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>https://www.logsat.com/spamfilter/forums/RSS_post_feed.asp?TID=1922</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Spam Filter ISP Forums]]></title>
   <url>https://www.logsat.com/spamfilter/forums/forum_images/web_wiz_forums.png</url>
   <link>https://www.logsat.com/spamfilter/forums/</link>
  </image>
  <item>
   <title><![CDATA[Keyword filter : Richard, Removing the HTML tags...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2018&amp;title=keyword-filter#2018</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=8">LogSat</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 17 September 2003 at 11:54pm<br /><br /><P>Richard,</P><P>Removing the HTML tags requires little code but it does add more overhead we feel comfortable with right now.</P><P>As far as DNA fingerprinting, it is self-adjusting. If spammers decide to spell v1agra or v*i*a*g*r*a, after receiving a few such emails at first, the statistical engine will begin to recognize the new patterns and readjust itself. At least it is in our preliminary tests...</P><P>Roberto F.<BR>LogSat Software</P>]]>
   </description>
   <pubDate>Wed, 17 Sep 2003 23:54:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2018&amp;title=keyword-filter#2018</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : I don&amp;#039;t believe that removing...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2017&amp;title=keyword-filter#2017</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 17 September 2003 at 7:03pm<br /><br /><P>I don't believe that removing HTML tags prior to keyword searching would amount to a lot of code or overhead.</P><P>"DNA fingerprinting" has failed elsewhere because spammers are on to it, and throw random garbage into messages to change the "fingerprint" -- I hope your algorithm is smarter than others.</P><P>Another spammer trick&nbsp;alluded to earlier in this thread is the creative use of "alphabet substitutions" - such as 1 or ! or | for I or l , 0 for O, @ for a, etc., and peculiar spacing, with and without separating characters (-_+=~. are common).</P><P>If viagra is the keyword, then&nbsp;my wishlist is for the&nbsp;software to, in addition to stripping out HTML tags, match V!AGRA and Viagr@ and ###VIAGRA### (embedded in other text), V I A G R A, V-I-A-G-R-A&nbsp; and so on.</P><P>It is interesting to note that (at least I think this is the case) HTML does not allow SRC= or HREF= to be broken up with comments - thus even if VIAGRA is obscured, the website mentioned within is not - thus in the HTML gobbeldygook example a few messages back - this is buried in the otherwise unreadable html: <STRONG>&lt;A href="</STRONG><A HREF="http://www.pure-herbal.biz/sknoc/vp/" CLASS="ASPForums" TITLE="WARNING: URL created by poster. "><A HREF="http://www.pure-herbal.biz/sknoc/vp/" CLASS="ASPForums" TITLE="WARNING: URL created by poster. "><FONT color=#333366 size=2><STRONG><A HREF="http://www.pure-herbal.biz/sknoc/vp/" CLASS="ASPForums" TITLE="WARNING: URL created by poster. ">http://www.pure-herbal.biz/sknoc/vp/</A></STRONG></FONT></A></A><STRONG>" </STRONG></P><P>One should be able to filter out mail containing "pure-herbal.biz" -- if nothing else!</P>]]>
   </description>
   <pubDate>Wed, 17 Sep 2003 19:03:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2017&amp;title=keyword-filter#2017</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : Having read this thread, I think...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2010&amp;title=keyword-filter#2010</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 16 September 2003 at 8:32pm<br /><br /><P>Having read this thread, I think the feature request that is coming out is a function that would render HTML embedded into a subject line back as text, and then accepting/rejecting the message based on the rendered subject line.&nbsp; So "VIAGRA" and "VIA&lt;crap&gt;GRA" would all be caught if "VIAGRA" is in the keyword filter.</P><P>The question to LogSat then becomes, is this feature request feasible and if so, is it sensible?</P>]]>
   </description>
   <pubDate>Tue, 16 Sep 2003 20:32:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2010&amp;title=keyword-filter#2010</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : Allan, We had given thought to...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2009&amp;title=keyword-filter#2009</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=8">LogSat</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 16 September 2003 at 8:30pm<br /><br /><P>Allan,</P><P>We had given thought to your same observations. There is a (big) performance penalty in having to parse incoming email's HTML, so we decided to keep it simple and just work with the email source. </P><P>We're working hard to prepare a new version that does DNA fingerprinting on incoming emails, which will greatly diminish the need to specify multiple keywords. Once this is complete, should it not have the desired accuracy, we may go back to implement HTML parsing as well.</P><P>Roberto F.<BR>LogSat Software</P><P>==========</P><P><FONT face=Verdana size=2>A new message by Allan Poulsen was posted in Support Forum </FONT></P><P class=ASPForums><FONT class=ASPForums><STRONG class=ASPForums>Re: Keyword filter</STRONG></FONT></P><BLOCKQUOTE class=ASPForums><P>I do not quite agree with you. I have added&nbsp;some words in the keyword filter - regardles of their meaning, I want to filter the mails.</P>&gt; <P>If Outlook ( and other mail programs ) is capable to filter out worthless HTML tags when displaying the message, I still cant see why the program can't do the same...</P><P>Greetings</P><P>Allan</P></BLOCKQUOTE>]]>
   </description>
   <pubDate>Tue, 16 Sep 2003 20:30:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2009&amp;title=keyword-filter#2009</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : Valid points, all... Dan, your...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2004&amp;title=keyword-filter#2004</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 16 September 2003 at 12:43pm<br /><br /><P><FONT face="Helvetica, Arial, sans serif">Valid points, all...</FONT></P><P><FONT face=Helvetica>Dan, your RegEx list is always welcome for&nbsp;reading and comparison.&nbsp; You've said in a previous post that you're not a "RegEx expert", but it seems more and more that you're becoming one as a result of this project!&nbsp; Filtering on a COMMON single word might not be such a good idea, but there are plenty of single-word spam traps that are very effective with zero false positives.&nbsp; You'd be surprised how much spam I'm catching by filtering on the slang&nbsp;word&nbsp;"milf".&nbsp; Both "vallum" and "vaiium"&nbsp;(consider the&nbsp;spammer's creative use of capitalization on the i's&nbsp;and l's) are effective at trapping a few dozen spam messages each week.</FONT></P><P><FONT face=Helvetica>But back to my original concern... trapping the embed&lt;crap&gt;ded tags.&nbsp; Unfortunately I don't think that even your most recent RegEx list&nbsp;addresses this scenario.&nbsp; Thus, Allan's post about pre-filtering the HTML tags and then running keyword check does have&nbsp;merit.&nbsp; A sentence that contains the phrase "pe&lt;blurb&gt;nis enl&lt;!--okay--&gt;argem&lt;extract&gt;ent" then becomes "penis enlargement" and it becomes&nbsp;a policy decision&nbsp;for the mail system's administrator whether or not to filter on those&nbsp;two words.</FONT></P><P><FONT face=Helvetica>Still, this doesn't rate an enhancement&nbsp;request, not right now.&nbsp; The statistical scoring enhancement that Roberto has mentioned in previous posts could have a more far-reaching impact on spam control than all of these blacklist/whitelist tweaks combined.&nbsp; So, I'll just sit tight and wait for THAT beta to arrive.</FONT></P>]]>
   </description>
   <pubDate>Tue, 16 Sep 2003 12:43:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=2004&amp;title=keyword-filter#2004</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : There is one trick you can use...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1995&amp;title=keyword-filter#1995</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 15 September 2003 at 7:40pm<br /><br /><P>There is one trick you can use to view the source in Outlook.&nbsp; I use this on most of my mail before opening&nbsp;when it has gotten through SpamFilter but sill looks suspect.</P><OL><LI>disable any preview panes.&nbsp; You should do this anyway because previewing an HTML email can cause an image to be downloaded, in which your IP (and possibly a unique code) will be logged and show your address as legitimate, causing more spam to come your way.</LI><LI><P>Highlight the message but do not open it.</P></LI><LI><P>Under "file" choose "save as ..."&nbsp; If the message is a plain text email, it will add the .txt extension.&nbsp; (Text emails will be safe to open in outlook).&nbsp; HTML emails will be given the .htm extension.&nbsp; I usually just save these to my desktop and then view the source in a HTML editor such as HomeSite.&nbsp; Notepad will suffice.</P></LI></OL></ol HTMLFixup>]]>
   </description>
   <pubDate>Mon, 15 Sep 2003 19:40:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1995&amp;title=keyword-filter#1995</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : I am new to regex and was wanting...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1989&amp;title=keyword-filter#1989</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 15 September 2003 at 11:02am<br /><br /><P>I am new to regex and was wanting to know if you could you give me a break down on what a few of your lines are doing? namely lines 1, 2, 5 and 13. </P><P>Thanks in advance.</P>]]>
   </description>
   <pubDate>Mon, 15 Sep 2003 11:02:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1989&amp;title=keyword-filter#1989</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : I do not quite agree with you....]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1984&amp;title=keyword-filter#1984</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 15 September 2003 at 2:51am<br /><br /><P>I do not quite agree with you. I have added&nbsp;some words in the keyword filter - regardles of their meaning, I want to filter the mails.</P><P>If Outlook ( and other mail programs ) is capable to filter out worthless HTML tags when displaying the message, I still cant see why the program can't do the same...</P><P>Greetings</P><P>Allan</P>]]>
   </description>
   <pubDate>Mon, 15 Sep 2003 02:51:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1984&amp;title=keyword-filter#1984</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter :  Allan, Outlook is not &amp;#034;Filtering&amp;#034;...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1967&amp;title=keyword-filter#1967</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=22">Desperado</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 12 September 2003 at 6:30pm<br /><br /><DIV><FONT face=Arial size=2><P>Allan,</P><P>Outlook is not "Filtering" the html.&nbsp; Spamers are obscuring the message by adding a wide variety of HTML tags and comments and what ever else they come up with.&nbsp; The goal is to locate the METHOD of obfuscation ... not the actual message and if the message has been obscured, then it is most likely Spam.&nbsp; That is what the regular expressions are doing. Having SpamFilter render the code and then go searching for a word or phrase, in my opinion would lead to horrendous amounts of false positives.&nbsp; Just because a message has the word penis in it, does not make it Spam. The same is true for viagra.&nbsp; HOWEVER, if those words are being "hidden" using comments, tags or whatever, then it must be Spam.&nbsp; What other possible reason would there be to mask the actual content.</P><P>Dan S.</P></FONT></DIV>]]>
   </description>
   <pubDate>Fri, 12 Sep 2003 18:30:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1967&amp;title=keyword-filter#1967</guid>
  </item> 
  <item>
   <title><![CDATA[Keyword filter : A suggestion: If Outlook can...]]></title>
   <link>https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1961&amp;title=keyword-filter#1961</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="https://www.logsat.com/spamfilter/forums/member_profile.asp?PF=2">Guests</a><br /><strong>Subject:</strong> 1922<br /><strong>Posted:</strong> 12 September 2003 at 2:33am<br /><br /><P>A <FONT face=Helvetica>suggestion:</FONT></P><P><FONT face=Helvetica>If Outlook can "filter" the unwanted HTML syntax and display the text, why can't SpamFilter when it's checking for keywords ?</FONT></P>]]>
   </description>
   <pubDate>Fri, 12 Sep 2003 02:33:00 +0000</pubDate>
   <guid isPermaLink="true">https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=1922&amp;PID=1961&amp;title=keyword-filter#1961</guid>
  </item> 
 </channel>
</rss>