Bayesian Question... |
Post Reply |
Author | |
Erik Reed
Guest Group |
Post Options
Thanks(0)
Posted: 01 March 2004 at 5:19pm |
Is there anyway to adjust the score of words that are in the Bayesian database? Only emails that are already being caught by my keyword filters are getting parsed and marked bad. However those annoying v!ag^ra (ever changing spelling) emails still get through and the new "scheme" thinks all these words are good, defeating its own purpose. I just clicked the "dump" button in the Bayesian dialouge box and would love to be able to put a VERY HIGH score on the obvious spam words... Thanks!
|
|
Desperado
Senior Member Joined: 27 January 2005 Location: United States Status: Offline Points: 1143 |
Post Options
Thanks(0)
|
Erik,
You know, I am getting real irritated and frustrated with the "evolution" of spelling to the point that when I find yet another mutation, I end up getting very strange looks from my office manager as I spew out a long string of really horrible explicatives. I have, however, had very good percentages of hits with some "creative" spelling of my own in regular expressions. The problem is that just when I think It's just about got it covered, yet another odd string pops up. What I wish was that I could come up with an "illiteracy" filter ... or a "shear lunacy" filter but, thus far, no dice.
Having said all that, believe it or not, I gave my kids the project of coming up with as many forms of spelling the 5 or 6 major drugs as they can. Once I have this list, I am going to attempt to construct a RegEx that catches them all. If I succeed, I will update the group.
Regards,
Dan S
|
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Eirk, We've given much though on how to give emails that are not caught by the various filter a bad Bayesian score. The problem is after an email has been marked as clean, it's forwarded to your SMTP which then forwards it to the end user. At that point, there is no (simple) way we could let the end user submit it to SpamFilter to let it know it's bad. The main options were (1) a web interface to allow users post email contents to SpamFilter and (2) an Outlook plugin. We had to discard (1) because many corporate end users are using MS Outlook, which completely alters the content of the email's source. The Bayesian filter MUST work on the original email content to be effective, html tags and rubbish included. Adding modified text to the statistical engine was rendering it inaccurate. We also (for now) discarded (2) for the complexity, both on our end to develop client-software, and for the admin's end so they don't have to deploy additional software to their clients. If anyone has better ideas, they're welcome! Regarding the option to modify the corpus, we are going to release a tool that allows to change the token scores soon (we actually need one ourselves as well...) Roberto F. |
|
Erik Reed
Guest Group |
Post Options
Thanks(0)
|
Roberto, >Regarding the option to modify the corpus, we are going to release a tool that allows to >change the token scores soon (we actually need one ourselves as well...) That is all we should need. :) It is almost the same as adding them to the keyword filter but I assume the Bayesian filter will work much faster then a huge black list of words... Thanks.
|
|
Dannyh
Guest Group |
Post Options
Thanks(0)
|
Hello Would it be possible to copy 24 hours (or user settable amount of time) of all incoming e-mail to a (user settable) location, The end user can then forward the spam e-mail to stopspam@spamfilterserver.whatever (I would make it a user defineable address) The server receiving this e-mail knows to compare the body of text From: blah@blah.com Subject: important message 4 U to the copied cache of e-mails and (could be index subject) add the original e-mail as 100% spam There are a few issues with this Disk space (Disk space it cheap) Speed (the processing could be done at a slow time) but in the end it should not forward the same spam message signature again So only new types of spam may get through, until your users forward it. Ok,you can tear it apart now Danny |
|
AJ
Guest Group |
Post Options
Thanks(0)
|
How about having another "quarantine" like db where all good email gets copied to (one to the smtp and one to the good email db) where users can then go in using a web interface so that when they get a spam email in their outlook they can then look for it in the good email db and submit it to spamfilter's Bayesian filter. It would be the opposite of the spam quarantine where instead of forwarding false positives to themselves they've be sending spam emails to the Bayesian filter. Hard disk space is not a problem any more :-) |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.281 seconds.