Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Bayesian Question...
  FAQ FAQ  Forum Search   Register Register  Login Login

Bayesian Question...

 Post Reply Post Reply
Author
Erik Reed View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Erik Reed Quote  Post ReplyReply Direct Link To This Post Topic: Bayesian Question...
    Posted: 01 March 2004 at 5:19pm

Is there anyway to adjust the score of words that are in the Bayesian database? 

Only emails that are already being caught by my keyword filters are getting parsed and marked bad.  However those annoying v!ag^ra (ever changing spelling) emails still get through and the new "scheme" thinks all these words are good, defeating its own purpose. 

I just clicked the "dump" button in the Bayesian dialouge box and would love to be able to put a VERY HIGH score on the obvious spam words...

Thanks!

 

Back to Top
Desperado View Drop Down
Senior Member
Senior Member
Avatar

Joined: 27 January 2005
Location: United States
Status: Offline
Points: 1143
Post Options Post Options   Thanks (0) Thanks(0)   Quote Desperado Quote  Post ReplyReply Direct Link To This Post Posted: 01 March 2004 at 9:08pm
Erik,
 
You know, I am getting real irritated and frustrated with the "evolution" of spelling to the point that when I find yet another mutation, I end up getting very strange looks from my office manager as I spew out a long string of really horrible explicatives.  I have, however, had very good percentages of hits with some "creative" spelling of my own in regular expressions.  The problem is that just when I think It's just about got it covered, yet another odd string pops up.  What I wish was that I could come up with an "illiteracy" filter ... or a "shear lunacy" filter but, thus far, no dice.
 
Having said all that, believe it or not, I gave my kids the project of coming up with as many forms of spelling the 5 or 6 major drugs as they can.  Once I have this list, I am going to attempt to construct a RegEx that catches them all.  If I succeed, I will update the group.
 
Regards,
 
Dan S
Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 01 March 2004 at 11:49pm

Eirk,

We've given much though on how to give emails that are not caught by the various filter a bad Bayesian score. The problem is after an email has been marked as clean, it's forwarded to your SMTP which then forwards it to the end user. At that point, there is no (simple) way we could let the end user submit it to SpamFilter to let it know it's bad.

The main options were (1) a web interface to allow users post email contents to SpamFilter and (2) an Outlook plugin.

We had to discard (1) because many corporate end users are using MS Outlook, which completely alters the content of the email's source. The Bayesian filter MUST work on the original email content to be effective, html tags and rubbish included. Adding modified text to the statistical engine was rendering it inaccurate. We also (for now) discarded (2) for the complexity, both on our end to develop client-software, and for the admin's end so they don't have to deploy additional software to their clients.

If anyone has better ideas, they're welcome!

Regarding the option to modify the corpus, we are going to release a tool that allows to change the token scores soon (we actually need one ourselves as well...)

Roberto F.
LogSat Software

Back to Top
Erik Reed View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Erik Reed Quote  Post ReplyReply Direct Link To This Post Posted: 02 March 2004 at 8:43am

Roberto,

>Regarding the option to modify the corpus, we are going to release a tool that allows to >change the token scores soon (we actually need one ourselves as well...)

That is all we should need.  :)   It is almost the same as adding them to the keyword filter but I assume the Bayesian filter will work much faster then a huge black list of words...

Thanks.

 

Back to Top
Dannyh View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote Dannyh Quote  Post ReplyReply Direct Link To This Post Posted: 08 March 2004 at 12:52pm

Hello

Would it be possible to copy 24 hours (or user settable amount of time) of all incoming e-mail to a (user settable) location, 

The end user can then forward the spam e-mail to stopspam@spamfilterserver.whatever

(I would make it a user defineable address)

The server receiving this e-mail knows to compare the body of text 

From: blah@blah.com
Sent: Monday, March 08, 2004 10:03 AM
To: Helpdesk@mydomain.com
Cc:

Subject: important message 4 U

to the copied cache of e-mails and (could be index subject)

add the original e-mail as 100% spam

There are a few issues with this

Disk space (Disk space it cheap)

Speed (the processing could be done at a slow time)

but in the end it should not forward the same spam message signature again

So only new types of spam may get through, until your users forward it.

 Ok,you can tear it apart now

 Danny

Back to Top
AJ View Drop Down
Guest Group
Guest Group
Post Options Post Options   Thanks (0) Thanks(0)   Quote AJ Quote  Post ReplyReply Direct Link To This Post Posted: 09 March 2004 at 7:06am

How about having another "quarantine" like db where all good email gets copied to (one to the smtp and one to the good email db) where users can then go in using a web interface so that when they get a spam email in their outlook they can then look for it in the good email db and submit it to spamfilter's Bayesian filter.  It would be the opposite of the spam quarantine where instead of forwarding false positives to themselves they've be sending spam emails to the Bayesian filter.

Hard disk space is not a problem any more :-)

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.281 seconds.