Print Page | Close Window

Bayesian Question...

Printed From: LogSat Software
Category: Spam Filter ISP
Forum Name: Spam Filter ISP Support
Forum Description: General support for Spam Filter ISP
URL: https://www.logsat.com/spamfilter/forums/forum_posts.asp?TID=3034
Printed Date: 22 January 2025 at 7:55am


Topic: Bayesian Question...
Posted By: Guests
Subject: Bayesian Question...
Date Posted: 01 March 2004 at 5:19pm

Is there anyway to adjust the score of words that are in the Bayesian database? 

Only emails that are already being caught by my keyword filters are getting parsed and marked bad.  However those annoying v!ag^ra (ever changing spelling) emails still get through and the new "scheme" thinks all these words are good, defeating its own purpose. 

I just clicked the "dump" button in the Bayesian dialouge box and would love to be able to put a VERY HIGH score on the obvious spam words...

Thanks!

 




Replies:
Posted By: Desperado
Date Posted: 01 March 2004 at 9:08pm
Erik,
 
You know, I am getting real irritated and frustrated with the "evolution" of spelling to the point that when I find yet another mutation, I end up getting very strange looks from my office manager as I spew out a long string of really horrible explicatives.  I have, however, had very good percentages of hits with some "creative" spelling of my own in regular expressions.  The problem is that just when I think It's just about got it covered, yet another odd string pops up.  What I wish was that I could come up with an "illiteracy" filter ... or a "shear lunacy" filter but, thus far, no dice.
 
Having said all that, believe it or not, I gave my kids the project of coming up with as many forms of spelling the 5 or 6 major drugs as they can.  Once I have this list, I am going to attempt to construct a RegEx that catches them all.  If I succeed, I will update the group.
 
Regards,
 
Dan S


Posted By: LogSat
Date Posted: 01 March 2004 at 11:49pm

Eirk,

We've given much though on how to give emails that are not caught by the various filter a bad Bayesian score. The problem is after an email has been marked as clean, it's forwarded to your SMTP which then forwards it to the end user. At that point, there is no (simple) way we could let the end user submit it to SpamFilter to let it know it's bad.

The main options were (1) a web interface to allow users post email contents to SpamFilter and (2) an Outlook plugin.

We had to discard (1) because many corporate end users are using MS Outlook, which completely alters the content of the email's source. The Bayesian filter MUST work on the original email content to be effective, html tags and rubbish included. Adding modified text to the statistical engine was rendering it inaccurate. We also (for now) discarded (2) for the complexity, both on our end to develop client-software, and for the admin's end so they don't have to deploy additional software to their clients.

If anyone has better ideas, they're welcome!

Regarding the option to modify the corpus, we are going to release a tool that allows to change the token scores soon (we actually need one ourselves as well...)

Roberto F.
LogSat Software



Posted By: Guests
Date Posted: 02 March 2004 at 8:43am

Roberto,

>Regarding the option to modify the corpus, we are going to release a tool that allows to >change the token scores soon (we actually need one ourselves as well...)

That is all we should need.  :)   It is almost the same as adding them to the keyword filter but I assume the Bayesian filter will work much faster then a huge black list of words...

Thanks.

 



Posted By: Guests
Date Posted: 08 March 2004 at 12:52pm

Hello

Would it be possible to copy 24 hours (or user settable amount of time) of all incoming e-mail to a (user settable) location, 

The end user can then forward the spam e-mail to mailto:stopspam@spamfilterserver.whatever" CLASS="ASPForums" TITLE="WARNING: URL created by poster. - stopspam@spamfilterserver.whatever

(I would make it a user defineable address)

The server receiving this e-mail knows to compare the body of text 

From: mailto:blah@blah.com" CLASS="ASPForums" TITLE="WARNING: URL created by poster. - - Helpdesk@mydomain.com
Cc:

Subject: important message 4 U

to the copied cache of e-mails and (could be index subject)

add the original e-mail as 100% spam

There are a few issues with this

Disk space (Disk space it cheap)

Speed (the processing could be done at a slow time)

but in the end it should not forward the same spam message signature again

So only new types of spam may get through, until your users forward it.

 Ok,you can tear it apart now

 Danny



Posted By: Guests
Date Posted: 09 March 2004 at 7:06am

How about having another "quarantine" like db where all good email gets copied to (one to the smtp and one to the good email db) where users can then go in using a web interface so that when they get a spam email in their outlook they can then look for it in the good email db and submit it to spamfilter's Bayesian filter.  It would be the opposite of the spam quarantine where instead of forwarding false positives to themselves they've be sending spam emails to the Bayesian filter.

Hard disk space is not a problem any more :-)




Print Page | Close Window