Difference between revisions of "Zarafa Bayesian Learning"

From SME Server
Jump to navigationJump to search
m (→‎Installation: perl-Mail-IMAPClient)
 
(12 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=== Zarafa Bayesian learning ===
+
== Zarafa Bayesian learning ==
  
 
This howto enables SpamAssasin Bayesian learning for [[:Zarafa]]  
 
This howto enables SpamAssasin Bayesian learning for [[:Zarafa]]  
Line 5: Line 5:
 
The DMZS script (LGPL) works over IMAP. It reads the mail from two folders (LearnAsSpam and LearnAsHam) and feeds it to SpamAssasin's sa-learn. This script is implemented here in a way that it makes use of public folders in Zarafa.
 
The DMZS script (LGPL) works over IMAP. It reads the mail from two folders (LearnAsSpam and LearnAsHam) and feeds it to SpamAssasin's sa-learn. This script is implemented here in a way that it makes use of public folders in Zarafa.
  
==== Installation ====
+
=== Installation ===
  yum install perl-Mail-IMAPClient --enablerepo=extras
 
  wget http://www.dmzs.com/tools/files/spam/DMZS-sa-learn.pl
 
  mv DMZS-sa-learn.pl /usr/bin/
 
  
Create a user-account in Zarafa for reading the public spam-folders.
+
====Bayes====
  db method, Replace the <MyPassword> with a proper strong password.
+
yum install perl-Mail-IMAPClient --enablerepo=smecontribs
  zarafa-admin -c 'SpamAdmin' -p '<MyPassword>' -f 'Spam Administration Account' -e root@localhost
 
 
  unix method, if per user
 
  db accounts setprop SpamAdmin zarafa enabled
 
  /etc/e-smith/events/actions/qmail-update-user
 
  
Now we'll edit the script and replace the Server, User and Password values. We will also have to replace two folder names throughout the script:
+
Create a new script-file:
  pico /usr/bin/DMZS-sa-learn.pl
+
nano -w /usr/bin/DMZS-sa-learn.pl
  
Replace the values so it looks like below, replace <MyPassword> for the password you have chosen in a previous step:
+
Paste the code below in this script-file and change the <tt>'SpamAdminPassword'</tt> into a proper (strong) password:
 
+
#!/usr/bin/perl
  my $imap = Mail::IMAPClient->new( Server=> '127.0.0.1:8143',
+
#
                                    User => 'SpamAdmin',
+
# Process mail from imap server shared folder 'Public folders/LearnAsSpam' & 'Public folders/LearnAsHam' through spamassassin sa-learn
                                    Password => '<MyPassword>',
+
# dmz@dmzs.com - March 19, 2004
                                    Debug => $debug);
+
# http://www.dmzs.com/tools/files/spam.phtml
 
+
# http://www.dmzs.com/tools/files/spam/DMZS-sa-learn.pl [modified for SMEServer]
Throughout the script (be aware of the quotes):
+
# LGPL
   replace: 'spam'  ->  with: 'Public folders/LearnAsSpam'
+
   replace: 'not-spam' -> with: 'Public folders/LearnAsHam'
+
use Mail::IMAPClient;
   remove: --showdots
+
 +
my $debug=0;
 +
my $salearn;
 +
 +
# # # # # # # # # # EDIT USER AND PASSWORD  # # # # # # # # # #
 +
 +
my $imap = Mail::IMAPClient->new( Server=> '127.0.0.1:8143',
 +
                                  User => 'SpamAdmin',
 +
                                  Password => 'SpamAdminPassword',
 +
                                  Debug => $debug);
 +
 +
if (!defined($imap)) { die "IMAP Login Failed"; }
 +
 +
# If debugging, print out the total counts for each mailbox
 +
if ($debug) {
 +
   my $spamcount = $imap->message_count('Public folders/LearnAsSpam');
 +
  print $spamcount, " Spam to process\n";
 +
   
 +
  my $nonspamcount = $imap->message_count('Public folders/LearnAsHam');
 +
  print $nonspamcount, " Notspam to process\n" if $debug;
 +
  }
 +
 +
# Process the spam mailbox
 +
$imap->select('Public folders/LearnAsSpam');
 +
my @msgs = $imap->search("ALL");
 +
for (my $i=0;$i <= $#msgs; $i++)
 +
{
 +
  # I put it into a file for processing, doing it into a perl var & piping through sa-learn just didn't seem to work
 +
  $imap->message_to_file("/tmp/salearn",$msgs[$i]);
 +
 +
   # execute sa-learn w/data
 +
  if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync  --spam /tmp/salearn`; }
 +
  else { $salearn = `/usr/bin/sa-learn --no-sync  --spam /tmp/salearn`; }
 +
  print "-------\nSpam: ",$salearn,"\n-------\n" if $debug;
 +
 +
  # delete processed message
 +
  $imap->delete_message($msgs[$i]);
 +
  unlink("/tmp/salearn");
 +
}
 +
$imap->expunge();
 +
$imap->close();
 +
 +
# Process the not-spam mailbox
 +
  $imap->select('Public folders/LearnAsHam');
 +
my @msgs = $imap->search("ALL");
 +
for (my $i=0;$i <= $#msgs; $i++)
 +
{
 +
   $imap->message_to_file("/tmp/salearn",$msgs[$i]);
 +
  # execute sa-learn w/data
 +
  if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync  --ham /tmp/salearn`; }
 +
  else { $salearn = `/usr/bin/sa-learn --no-sync  --ham /tmp/salearn`; }
 +
  print "-------\nNotSpam: ",$salearn,"\n-------\n" if $debug;
 +
 +
  # delete processed message
 +
  $imap->delete_message($msgs[$i]);
 +
  unlink("/tmp/salearn");
 +
}
 +
$imap->expunge();
 +
$imap->close();
 +
 +
$imap->logout();
 +
 +
# integrate learned stuff
 +
my $sarebuild = `/usr/bin/sa-learn --sync`;
 +
print "-------\nRebuild: ",$sarebuild,"\n-------\n" if $debug;
  
 
Set proper permissions on the script:
 
Set proper permissions on the script:
 
   chmod 555 /usr/bin/DMZS-sa-learn.pl
 
   chmod 555 /usr/bin/DMZS-sa-learn.pl
  
Create a file for the script to write some temporary output to:
+
====Zarafa====
   touch /tmp/salearn
+
Create a user-account in Zarafa for reading the public spam-folders.
 +
 
 +
db method, Replace the <MyPassword> with a proper strong password.
 +
  zarafa-admin -c 'SpamAdmin' -p '<MyPassword>' -f 'Spam Administration Account' -e root@localhost
 +
If you have configured Zarafa to use the unix method and if you enable Zarafa usage on a per user base:
 +
   db accounts setprop SpamAdmin zarafa enabled
 +
  /etc/e-smith/events/actions/qmail-update-user
  
 
Login to Zarafa with an account that has admin rights and make two new folders LearnAsSpam and LearnAsHam under: Public folder > Public folders.
 
Login to Zarafa with an account that has admin rights and make two new folders LearnAsSpam and LearnAsHam under: Public folder > Public folders.
Line 54: Line 116:
 
{{Note box| Dropping mail in the public 'LearnAsHam' folder may pose a privacy problem if permissions are set less restrictive as shown above!}}
 
{{Note box| Dropping mail in the public 'LearnAsHam' folder may pose a privacy problem if permissions are set less restrictive as shown above!}}
  
 +
====Cron====
 
Create a new crontab fragment:
 
Create a new crontab fragment:
   pico /etc/e-smith/templates/etc/crontab/91_SpamAssasinLearn
+
   nano -w /etc/e-smith/templates/etc/crontab/91_SpamAssasinLearn
  
 
Add the following to the template (change the execution times to your own likings -- [http://en.wikipedia.org/wiki/Cron Wikipedia on Cron]):
 
Add the following to the template (change the execution times to your own likings -- [http://en.wikipedia.org/wiki/Cron Wikipedia on Cron]):
Line 64: Line 127:
 
   expand-template /etc/crontab
 
   expand-template /etc/crontab
  
==== Configuration ====
+
=== Configuration ===
 
Spamassassin has to be enabled in the Email Panel
 
Spamassassin has to be enabled in the Email Panel
  
Line 70: Line 133:
  
 
  config setprop spamassassin UseBayes 1  
 
  config setprop spamassassin UseBayes 1  
  config setprop spamassassin BayesAutoLearnThresholdSpam 4.00  
+
  config setprop spamassassin BayesAutoLearnThresholdSpam 6.00  
 
  config setprop spamassassin BayesAutoLearnThresholdNonspam 0.10  
 
  config setprop spamassassin BayesAutoLearnThresholdNonspam 0.10  
 
  expand-template /etc/mail/spamassassin/local.cf  
 
  expand-template /etc/mail/spamassassin/local.cf  
Line 81: Line 144:
 
These commands will:
 
These commands will:
 
* enable bayesian filter
 
* enable bayesian filter
* 'autolearn' as SPAM any email with a score above 4.00
+
* 'autolearn' as SPAM any email with a score above 6.00
 +
Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body
 +
to auto-learn as spam.
 +
Therefore, the minimum working value for this option is 6, to be changed in increments of 3,
 +
12 considered to be a good working value..
 
* 'autolearn' as HAM any email with a score below 0.10
 
* 'autolearn' as HAM any email with a score below 0.10
  
==== Usage ====
+
=== Usage ===
 
{{Warning box| All mail dropped in the LearnAsSpam and LearnAsHam folders will be automatically deleted !!}}
 
{{Warning box| All mail dropped in the LearnAsSpam and LearnAsHam folders will be automatically deleted !!}}
  
Line 91: Line 158:
  
 
After the messages have been processed they will be deleted to save your valuable space.
 
After the messages have been processed they will be deleted to save your valuable space.
 +
 +
 +
[[Category:Howto]]
 +
[[Category:Groupware]]

Latest revision as of 17:17, 26 November 2013

Zarafa Bayesian learning

This howto enables SpamAssasin Bayesian learning for Zarafa

The DMZS script (LGPL) works over IMAP. It reads the mail from two folders (LearnAsSpam and LearnAsHam) and feeds it to SpamAssasin's sa-learn. This script is implemented here in a way that it makes use of public folders in Zarafa.

Installation

Bayes

yum install perl-Mail-IMAPClient --enablerepo=smecontribs

Create a new script-file:

nano -w /usr/bin/DMZS-sa-learn.pl

Paste the code below in this script-file and change the 'SpamAdminPassword' into a proper (strong) password:

#!/usr/bin/perl
#
# Process mail from imap server shared folder 'Public folders/LearnAsSpam' & 'Public folders/LearnAsHam' through spamassassin sa-learn
# dmz@dmzs.com - March 19, 2004
# http://www.dmzs.com/tools/files/spam.phtml
# http://www.dmzs.com/tools/files/spam/DMZS-sa-learn.pl [modified for SMEServer]
# LGPL

use Mail::IMAPClient;

my $debug=0;
my $salearn;

# # # # # # # # # # EDIT USER AND PASSWORD  # # # # # # # # # #

my $imap = Mail::IMAPClient->new( Server=> '127.0.0.1:8143',
                                  User => 'SpamAdmin',
                                  Password => 'SpamAdminPassword',
                                  Debug => $debug);

if (!defined($imap)) { die "IMAP Login Failed"; }

# If debugging, print out the total counts for each mailbox
if ($debug) {
 my $spamcount = $imap->message_count('Public folders/LearnAsSpam');
 print $spamcount, " Spam to process\n";

 my $nonspamcount = $imap->message_count('Public folders/LearnAsHam');
 print $nonspamcount, " Notspam to process\n" if $debug;
}

# Process the spam mailbox
$imap->select('Public folders/LearnAsSpam');
my @msgs = $imap->search("ALL");
for (my $i=0;$i <= $#msgs; $i++)
{
 # I put it into a file for processing, doing it into a perl var & piping through sa-learn just didn't seem to work
 $imap->message_to_file("/tmp/salearn",$msgs[$i]);

 # execute sa-learn w/data
 if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync  --spam /tmp/salearn`; } 
 else { $salearn = `/usr/bin/sa-learn --no-sync  --spam /tmp/salearn`; }
 print "-------\nSpam: ",$salearn,"\n-------\n" if $debug;

 # delete processed message
 $imap->delete_message($msgs[$i]);
 unlink("/tmp/salearn");
}
$imap->expunge();
$imap->close();

# Process the not-spam mailbox
$imap->select('Public folders/LearnAsHam');
my @msgs = $imap->search("ALL");
for (my $i=0;$i <= $#msgs; $i++)
{
 $imap->message_to_file("/tmp/salearn",$msgs[$i]);
 # execute sa-learn w/data
 if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync  --ham /tmp/salearn`; }
 else { $salearn = `/usr/bin/sa-learn --no-sync  --ham /tmp/salearn`; }
 print "-------\nNotSpam: ",$salearn,"\n-------\n" if $debug; 

 # delete processed message
 $imap->delete_message($msgs[$i]);
 unlink("/tmp/salearn");
}
$imap->expunge();
$imap->close();

$imap->logout();

# integrate learned stuff
my $sarebuild = `/usr/bin/sa-learn --sync`;
print "-------\nRebuild: ",$sarebuild,"\n-------\n" if $debug;

Set proper permissions on the script:

 chmod 555 /usr/bin/DMZS-sa-learn.pl

Zarafa

Create a user-account in Zarafa for reading the public spam-folders.

db method, Replace the <MyPassword> with a proper strong password.

 zarafa-admin -c 'SpamAdmin' -p '<MyPassword>' -f 'Spam Administration Account' -e root@localhost

If you have configured Zarafa to use the unix method and if you enable Zarafa usage on a per user base:

 db accounts setprop SpamAdmin zarafa enabled
 /etc/e-smith/events/actions/qmail-update-user

Login to Zarafa with an account that has admin rights and make two new folders LearnAsSpam and LearnAsHam under: Public folder > Public folders. Set the permissions (right-click folder > Properties > Permission-tab) on both these new folders to:

 Spam administration account
 * Folder visible
 * Read items
 * Edit items: all
 * Delete items: all
 
 Everyone (and/or other users/groups you've added at least need:)
 * Folder visible
 * Create items
 * Edit items: none
 * Delete items: none
Important.png Note:
Dropping mail in the public 'LearnAsHam' folder may pose a privacy problem if permissions are set less restrictive as shown above!


Cron

Create a new crontab fragment:

 nano -w /etc/e-smith/templates/etc/crontab/91_SpamAssasinLearn

Add the following to the template (change the execution times to your own likings -- Wikipedia on Cron):

 # Running the Spamassasin Bayesian SPAM learning script every hour from 8:00 to 22:00 during weekdays
 0 8-22 * * 1-5 root /usr/bin/DMZS-sa-learn.pl

Make the new fragment active by expanding the template:

 expand-template /etc/crontab

Configuration

Spamassassin has to be enabled in the Email Panel

Bayesian learning has to be enabled and configured in SME with

config setprop spamassassin UseBayes 1 
config setprop spamassassin BayesAutoLearnThresholdSpam 6.00 
config setprop spamassassin BayesAutoLearnThresholdNonspam 0.10 
expand-template /etc/mail/spamassassin/local.cf 
sa-learn --sync --dbpath /var/spool/spamd/.spamassassin -u spamd 
chown spamd.spamd /var/spool/spamd/.spamassassin/bayes_* 
chown spamd.spamd /var/spool/spamd/.spamassassin/bayes.mutex 
chmod 640 /var/spool/spamd/.spamassassin/bayes_* 
signal-event email-update

These commands will:

  • enable bayesian filter
  • 'autolearn' as SPAM any email with a score above 6.00
Note: SpamAssassin requires at least 3 points from the header, and 3 points from the body
to auto-learn as spam.
Therefore, the minimum working value for this option is 6, to be changed in increments of 3,
12 considered to be a good working value..
  • 'autolearn' as HAM any email with a score below 0.10

Usage

Warning.png Warning:
All mail dropped in the LearnAsSpam and LearnAsHam folders will be automatically deleted !!


  • Move/copy spam messages that are delivered to your Inbox to the public LearnAsSpam folder.
  • COPY regular messages that end up in your Junk E-mail folder to the public LearnAsHam folder.

After the messages have been processed they will be deleted to save your valuable space.