Beta version of new SpamFilter v2.0 is available |
Post Reply |
Author | |
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
Posted: 08 October 2003 at 11:37pm |
We have released to the public the beta for the new version of SpamFilter ISP v2.0. This release features statistical DNA fingerprinting of emails, which should allow greater accuracy in fighting SPAM. The beta can be obtained from SpamFilter's download page at http://www.logsat.com/sfi-download.asp. Please read the beta notes carefully. The first hundreds/thousands of emails received are critical in obtaining an accurate statistical database. It is important that when building your first database the number of false positives (good emails classified as spam) be kept at a minimum. For this reason it may be a good idea to start running the new version during the day, when there is usually a higher traffic of legitimate emails. Roberto F. |
|
kspare
Senior Member Joined: 26 January 2005 Location: Canada Status: Offline Points: 334 |
Post Options
Thanks(0)
|
Hi Roberto, got the beta version running. I have a question though. Is it possible to have the corpus data populate in mysql instead of on each computer? For companies who have a primary and backup smtp server, they will both be collecting and building a corpus file on each computer instead of using shared info. Just curious. Kevin |
|
eric
Guest Group |
Post Options
Thanks(0)
|
its a beta... :-) is it possible to tweak it into an \\servername\sharename\corpusfilename.name ? and have to servers share that ? |
|
Ric Marques
Guest Group |
Post Options
Thanks(0)
|
Roberto - I'm just thinking out loud here... but as the DNA technology develops, would there be a way to share the statistical corpus between different SpamFilterISP users? Build in a Morpheus or Kazaa type peer-to-peer network (that is optional) between users to share each other's SPAM fingerprints so that every user benefits from the SPAM that we all receive? Again - just thinking out loud... -Ric |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Performing realtime statistical analysis of emails is very process intensive. We fought hard to achieve an acceptable performance, and to do so we had to stay away from storing the corpus in a database. In a certain way, it's not a bad thing to have different corpii for different servers. unless they are load-balanced, the backup smtp servers usually take a lower load, and only certain spammers will send email to them directly bypassing the primary smtp server. This causes the email arriving at the secondary to be statistically different from the email going to the primary. It is better in this case to have separate statistical data, as this will improve accuracy. Roberto F. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Not really... To optimize for speed, each copy of SpamFilter maintains an in-memory copy of the corpus database, which is saved at intervals to disk. The disk file is only ready when SpamFilter starts, not thereafter. So each server would have the same corpus only on startup, but as time goes, the in-memory copy will be different between the various servers. But the beatuty is, being all based on statistics, once the corpus grows to a MB or so, the differences are irrelevant... Roberto F. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Ric, We had thought of that at the beginning, but then discovered that each company/provider receives different emails. We may have 3-4 users who subscribe to a dating service. They're emails will cause the statistical database to "sway" in a certain way to accomodate their needs. This will cause very different results if the same database is used by another company ho instead could be a local government... A very, very effective statistical corpus can be obtained from scratch just by having SpamFilter running for 24 hours. Once it's built, the corpus is tailored for that company and reflect pretty accuratle the kind of email traffic that is expected of it. There should be no need to import other user's results. Roberto F. |
|
kspare
Senior Member Joined: 26 January 2005 Location: Canada Status: Offline Points: 334 |
Post Options
Thanks(0)
|
The dna filtering doesn't seem to be working for me. It shows that it is scanning a message (19ms) for example, and that it is adding it to the bayes corpus file, but every message reads 0% spam and this includes spam messages. I'm sitting at 863 forwarded messages and 1133 blocked. Any ideas? |
|
Trinidad
Guest Group |
Post Options
Thanks(0)
|
From what I am understanding this new version sounds as if everything will be blocked until the users start forwarding in from their quarentine area the emails that are legit, am i reading this correct? |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Kevin, The statistical filter kicks in when you receive 500good+500spam emails. It's accuracy will be very low at the beginning, but will improve dramatically as more emails arrive. This is a beta, so test are still ongoing, but we see than when the corpus reaches 2-4MB in size with a few thousand emails in each group, then it will catch spam at a regime rate. Roberto F. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Not exactly. At first, the statistical engine will determine what is good email and what is bad mail by looking at what the other filters SpamFilter uses do. As more and more emails are received, SpamFilter will adapt to the kind of email traffic and recognize more and more spam as it comes in. But during the initial training period, more or less the first 24 hours, it is important that the number of false positives be reduced to a minimum so the learining process is accurate. When an email is taken out of the quarantine, SpamFilter will know and will learn that similar emails are probably going to be legitimate. Roberto F. |
|
Trinidad
Guest Group |
Post Options
Thanks(0)
|
I have another question. Will the corpus file override any other settings that im using? For example, I have plenty of regex and keyword settings, now if I catch a legit email and the user forwards it in, will it then not get caught by my settings the second time around because of the statistical engine learning that it was a legitimate email?
|
|
Ric Marques
Guest Group |
Post Options
Thanks(0)
|
Roberto - Will you be developing a way to report false negatives? My users are CONSTANTLY wanting to send me the SPAM that currently gets through the filter. -Ric |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
We've given this plenty of thought in the past. It would be very easy if the users could simply forward their spam to a special email address that SpamFilter know about. Unfortunately the problem with that is the Outlook client. It reformats the emails so much that they are completely different at times as the original message. The header information is also stripped out. Our plan is to create an Outlook plugin to be installed on the clients that will allow them the reporting of spam directly to SpamFilter, but it will be a few months before we can have something read for that. Roberto F. |
|
LogSat
Admin Group Joined: 25 January 2005 Location: United States Status: Offline Points: 4104 |
Post Options
Thanks(0)
|
Brian, If an email is quarantined, and a user (or the admin thru the GUI) selects to force-deliver it, it will bypass all checks and be delivered no matter what. Please note that the email will be temporarily save to the queue directory before being delivered. If the email contains a virus, if you have anti-virus software running it may delete it before SpamFilter can send it. SpamFilter caches emails on disk partly for this reason, to allow anti-virus software to catch/clean/delete and infected files. SpamFilter is designed not to "break" should an expected email file "disappear". Roberto F. |
|
Ric
Guest Group |
Post Options
Thanks(0)
|
What about a web based interface? Users could copy and paste the entire message into a page that would then dump the message into a table that SpamFilterISP would check on periodically. It could easily be built into the same web interface that is in use now to check on quarantined messages... just submit the form - and viola! There are a LOT of clients that (similar to Outlook) reformat the messages and strip out the original header information - 95% of my staff would have the same problem - and the plug-in wouldn't help us... (we decided against using ANY MS client years ago - and we have enjoyed NOT having most of the email virus headaches that many others have suffered through...) Just a thought... -Ric |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
This page was generated in 0.195 seconds.