Spam Filter ISP Support Forum

  New Posts New Posts RSS Feed - Editing Bayesian Corpus Database?
  FAQ FAQ  Forum Search   Register Register  Login Login

Editing Bayesian Corpus Database?

 Post Reply Post Reply
Author
JeffHildebrand View Drop Down
Newbie
Newbie


Joined: 26 January 2005
Location: United States
Status: Offline
Points: 15
Post Options Post Options   Thanks (0) Thanks(0)   Quote JeffHildebrand Quote  Post ReplyReply Direct Link To This Post Topic: Editing Bayesian Corpus Database?
    Posted: 29 December 2004 at 4:40pm

Is there anyway to update or delete tokens from the Bayesian database?  It looks like we had one email that was bounced between two servers about 300 times due to a forwarding loop problem.  Unfortunatly what it looks like it has done is flagged a lot of legitamate tokens as spam and started blocking well over half of our legitimate email.  The only safe way I could see to resolve the blocking was to reinstall a fresh copy of the corpus database and start from scratch.

Below are some of the tokens that were generated, as you can see bye the relatively high spam score it started kicking in a 100% spam detection for many legitimate emails.

*Token,Good,Spam,ProbSpam,ModDate
*6*1,0,306,0.99989998341,12/29/04
*Eudora,0,306,0.99989998341,12/29/04
*FULL,0,306,0.99989998341,12/28/04
*Follows,0,306,0.99989998341,12/27/04
*MAILBOX,0,306,0.99989998341,12/27/04
*Mime,0,306,0.99989998341,12/29/04
*Precedence,0,306,0.99989998341,12/27/04
*QUALCOMM,0,306,0.99989998341,12/29/04
*RCPT,0,306,0.99989998341,12/27/04
*Received*AOL,0,306,0.99989998341,12/27/04
*Received*omr,0,306,0.99989998341,12/27/04
*Received*rly,0,306,0.99989998341,12/27/04
*Received*v98*19,0,306,0.99989998341,12/27/04
*SCOLL,0,306,0.99989998341,12/27/04
*SCORE,0,306,0.99989998341,12/29/04
*Subject*unavailable,0,306,0.99989998341,12/27/04
*URL_COUNT,0,306,0.99989998341,12/27/04
*labeled,0,306,0.99989998341,12/29/04
*unavailable,0,306,0.99989998341,12/27/04
*undeliverable,0,306,0.99989998341,12/29/04
*v3*5,0,306,0.99989998341,12/29/04

Thanks,

-Jeff

Back to Top
LogSat View Drop Down
Admin Group
Admin Group
Avatar

Joined: 25 January 2005
Location: United States
Status: Offline
Points: 4104
Post Options Post Options   Thanks (0) Thanks(0)   Quote LogSat Quote  Post ReplyReply Direct Link To This Post Posted: 29 December 2004 at 10:52pm
Sorry Jeff, that is currently not possible.

Roberto F. LogSat Software
Back to Top
JeffHildebrand View Drop Down
Newbie
Newbie


Joined: 26 January 2005
Location: United States
Status: Offline
Points: 15
Post Options Post Options   Thanks (0) Thanks(0)   Quote JeffHildebrand Quote  Post ReplyReply Direct Link To This Post Posted: 10 January 2005 at 12:08pm

Could you make this a feature request then?  It could be as simple as an import keywords, that would import from a .txt file in the same format as the corpus dump.  It could then overwrite existing entries, or add in keywords of your choice.  At this point the Bayesian filter has become unusable to us, after reseting it less then two weeks ago it is blocking a very high number of legitimate emails even at a 99.9950% setting.  Mainly due to keywords like these:

*Token  , Good ,  Spam ,  ProbSpam ,  ModDate
*Pagnet , 0 , 13 , 0.999899983 , 01/08/05
*org , 0 , 13 , 0.999899983 , 01/10/05
*pagnet , 0 , 13 , 0.999899983 , 01/10/05
*From*org , 0 , 17 , 0.999899983 , 01/10/05
*From*pagnet , 0 , 16 , 0.999899983 , 01/10/05
*http , 0 , 15 , 0.999899983 , 01/10/05
*href , 0 , 14 , 0.999899983 , 01/10/05
*attached , 0 , 13 , 0.999899983 , 01/10/05
*file , 0 , 13 , 0.999899983 , 01/10/05
*Back , 0 , 15 , 0.999899983 , 01/10/05
*Green , 0 , 15 , 0.999899983 , 01/10/05
*before , 0 , 15 , 0.999899983 , 01/10/05
*dollars , 0 , 15 , 0.999899983 , 01/10/05
*original , 0 , 15 , 0.999899983 , 01/10/05
*second , 0 , 15 , 0.999899983 , 01/10/05
*since , 0 , 15 , 0.999899983 , 01/10/05
*difference , 0 , 14 , 0.999899983 , 01/10/05
*highly , 0 , 14 , 0.999899983 , 01/10/05
*GIF , 0 , 13 , 0.999899983 , 01/07/05
*The , 0 , 13 , 0.999899983 , 01/10/05
*big , 0 , 13 , 0.999899983 , 01/10/05
*ebay , 0 , 13 , 0.999899983 , 01/08/05
*tag , 0 , 13 , 0.999899983 , 01/08/05
*details , 0 , 12 , 0.999899983 , 01/10/05
*mail , 0 , 12 , 0.999899983 , 01/10/05

It is probably just a learning curve for the Bayesian filter, but some way to help speed and fine tune that learning process, before legitimate emails are blocked, would be a tremendous help.

Regards,
Jeff

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down



This page was generated in 0.176 seconds.