Difference between revisions of "Zarafa Bayesian Learning"
m (→Installation: perl-Mail-IMAPClient) |
|||
Line 5: | Line 5: | ||
The DMZS script (LGPL) works over IMAP. It reads the mail from two folders (LearnAsSpam and LearnAsHam) and feeds it to SpamAssasin's sa-learn. This script is implemented here in a way that it makes use of public folders in Zarafa. | The DMZS script (LGPL) works over IMAP. It reads the mail from two folders (LearnAsSpam and LearnAsHam) and feeds it to SpamAssasin's sa-learn. This script is implemented here in a way that it makes use of public folders in Zarafa. | ||
− | + | === Installation === | |
− | |||
− | |||
− | |||
+ | ====Bayes==== | ||
+ | yum install perl-Mail-IMAPClient --enablerepo=extras | ||
+ | |||
+ | nano -w /usr/bom/DMZS-sa-learn.pl | ||
+ | |||
+ | #!/usr/bin/perl | ||
+ | # | ||
+ | # Process mail from imap server shared folder 'Public folders/LearnAsSpam' & 'Public folders/LearnAsHam' through spamassassin sa-learn | ||
+ | # dmz@dmzs.com - March 19, 2004 | ||
+ | # http://www.dmzs.com/tools/files/spam.phtml | ||
+ | # http://www.dmzs.com/tools/files/spam/DMZS-sa-learn.pl [modified for SMEServer] | ||
+ | # LGPL | ||
+ | |||
+ | use Mail::IMAPClient; | ||
+ | |||
+ | my $debug=0; | ||
+ | my $salearn; | ||
+ | |||
+ | #EDIT USER AND PASSWORD | ||
+ | my $imap = Mail::IMAPClient->new( Server=> '127.0.0.1:8143', | ||
+ | User => 'SpamAdminjane', | ||
+ | Password => 'SpamAdminPassword', | ||
+ | Debug => $debug); | ||
+ | |||
+ | if (!defined($imap)) { die "IMAP Login Failed"; } | ||
+ | |||
+ | # If debugging, print out the total counts for each mailbox | ||
+ | if ($debug) { | ||
+ | my $spamcount = $imap->message_count('Public folders/LearnAsSpam'); | ||
+ | print $spamcount, " Spam to process\n"; | ||
+ | |||
+ | my $nonspamcount = $imap->message_count('Public folders/LearnAsHam'); | ||
+ | print $nonspamcount, " Notspam to process\n" if $debug; | ||
+ | } | ||
+ | |||
+ | # Process the spam mailbox | ||
+ | $imap->select('Public folders/LearnAsSpam'); | ||
+ | my @msgs = $imap->search("ALL"); | ||
+ | for (my $i=0;$i <= $#msgs; $i++) | ||
+ | { | ||
+ | # I put it into a file for processing, doing it into a perl var & piping through sa-learn just didn't seem to work | ||
+ | $imap->message_to_file("/tmp/salearn",$msgs[$i]); | ||
+ | |||
+ | # execute sa-learn w/data | ||
+ | if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync --spam /tmp/salearn`; } | ||
+ | else { $salearn = `/usr/bin/sa-learn --no-sync --spam /tmp/salearn`; } | ||
+ | print "-------\nSpam: ",$salearn,"\n-------\n" if $debug; | ||
+ | |||
+ | # delete processed message | ||
+ | $imap->delete_message($msgs[$i]); | ||
+ | unlink("/tmp/salearn"); | ||
+ | } | ||
+ | $imap->expunge(); | ||
+ | $imap->close(); | ||
+ | |||
+ | # Process the not-spam mailbox | ||
+ | $imap->select('Public folders/LearnAsHam'); | ||
+ | my @msgs = $imap->search("ALL"); | ||
+ | for (my $i=0;$i <= $#msgs; $i++) | ||
+ | { | ||
+ | $imap->message_to_file("/tmp/salearn",$msgs[$i]); | ||
+ | # execute sa-learn w/data | ||
+ | if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync --ham /tmp/salearn`; } | ||
+ | else { $salearn = `/usr/bin/sa-learn --no-sync --ham /tmp/salearn`; } | ||
+ | print "-------\nNotSpam: ",$salearn,"\n-------\n" if $debug; | ||
+ | |||
+ | # delete processed message | ||
+ | $imap->delete_message($msgs[$i]); | ||
+ | unlink("/tmp/salearn"); | ||
+ | } | ||
+ | $imap->expunge(); | ||
+ | $imap->close(); | ||
+ | |||
+ | $imap->logout(); | ||
+ | |||
+ | # integrate learned stuff | ||
+ | my $sarebuild = `/usr/bin/sa-learn --sync`; | ||
+ | print "-------\nRebuild: ",$sarebuild,"\n-------\n" if $debug; | ||
+ | |||
+ | ====Zarafa==== | ||
Create a user-account in Zarafa for reading the public spam-folders. | Create a user-account in Zarafa for reading the public spam-folders. | ||
db method, Replace the <MyPassword> with a proper strong password. | db method, Replace the <MyPassword> with a proper strong password. | ||
Line 17: | Line 94: | ||
db accounts setprop SpamAdmin zarafa enabled | db accounts setprop SpamAdmin zarafa enabled | ||
/etc/e-smith/events/actions/qmail-update-user | /etc/e-smith/events/actions/qmail-update-user | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Set proper permissions on the script: | Set proper permissions on the script: | ||
chmod 555 /usr/bin/DMZS-sa-learn.pl | chmod 555 /usr/bin/DMZS-sa-learn.pl | ||
− | |||
− | |||
− | |||
Login to Zarafa with an account that has admin rights and make two new folders LearnAsSpam and LearnAsHam under: Public folder > Public folders. | Login to Zarafa with an account that has admin rights and make two new folders LearnAsSpam and LearnAsHam under: Public folder > Public folders. | ||
Line 54: | Line 113: | ||
{{Note box| Dropping mail in the public 'LearnAsHam' folder may pose a privacy problem if permissions are set less restrictive as shown above!}} | {{Note box| Dropping mail in the public 'LearnAsHam' folder may pose a privacy problem if permissions are set less restrictive as shown above!}} | ||
+ | ====Cron==== | ||
Create a new crontab fragment: | Create a new crontab fragment: | ||
pico /etc/e-smith/templates/etc/crontab/91_SpamAssasinLearn | pico /etc/e-smith/templates/etc/crontab/91_SpamAssasinLearn |
Revision as of 11:35, 26 February 2009
Zarafa Bayesian learning
This howto enables SpamAssasin Bayesian learning for Zarafa
The DMZS script (LGPL) works over IMAP. It reads the mail from two folders (LearnAsSpam and LearnAsHam) and feeds it to SpamAssasin's sa-learn. This script is implemented here in a way that it makes use of public folders in Zarafa.
Installation
Bayes
yum install perl-Mail-IMAPClient --enablerepo=extras
nano -w /usr/bom/DMZS-sa-learn.pl #!/usr/bin/perl # # Process mail from imap server shared folder 'Public folders/LearnAsSpam' & 'Public folders/LearnAsHam' through spamassassin sa-learn # dmz@dmzs.com - March 19, 2004 # http://www.dmzs.com/tools/files/spam.phtml # http://www.dmzs.com/tools/files/spam/DMZS-sa-learn.pl [modified for SMEServer] # LGPL use Mail::IMAPClient; my $debug=0; my $salearn; #EDIT USER AND PASSWORD my $imap = Mail::IMAPClient->new( Server=> '127.0.0.1:8143', User => 'SpamAdminjane', Password => 'SpamAdminPassword', Debug => $debug); if (!defined($imap)) { die "IMAP Login Failed"; } # If debugging, print out the total counts for each mailbox if ($debug) { my $spamcount = $imap->message_count('Public folders/LearnAsSpam'); print $spamcount, " Spam to process\n"; my $nonspamcount = $imap->message_count('Public folders/LearnAsHam'); print $nonspamcount, " Notspam to process\n" if $debug; } # Process the spam mailbox $imap->select('Public folders/LearnAsSpam'); my @msgs = $imap->search("ALL"); for (my $i=0;$i <= $#msgs; $i++) { # I put it into a file for processing, doing it into a perl var & piping through sa-learn just didn't seem to work $imap->message_to_file("/tmp/salearn",$msgs[$i]); # execute sa-learn w/data if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync --spam /tmp/salearn`; } else { $salearn = `/usr/bin/sa-learn --no-sync --spam /tmp/salearn`; } print "-------\nSpam: ",$salearn,"\n-------\n" if $debug; # delete processed message $imap->delete_message($msgs[$i]); unlink("/tmp/salearn"); } $imap->expunge(); $imap->close(); # Process the not-spam mailbox $imap->select('Public folders/LearnAsHam'); my @msgs = $imap->search("ALL"); for (my $i=0;$i <= $#msgs; $i++) { $imap->message_to_file("/tmp/salearn",$msgs[$i]); # execute sa-learn w/data if ($debug) { $salearn = `/usr/bin/sa-learn -D --no-sync --ham /tmp/salearn`; } else { $salearn = `/usr/bin/sa-learn --no-sync --ham /tmp/salearn`; } print "-------\nNotSpam: ",$salearn,"\n-------\n" if $debug; # delete processed message $imap->delete_message($msgs[$i]); unlink("/tmp/salearn"); } $imap->expunge(); $imap->close(); $imap->logout(); # integrate learned stuff my $sarebuild = `/usr/bin/sa-learn --sync`; print "-------\nRebuild: ",$sarebuild,"\n-------\n" if $debug;
Zarafa
Create a user-account in Zarafa for reading the public spam-folders.
db method, Replace the <MyPassword> with a proper strong password. zarafa-admin -c 'SpamAdmin' -p '<MyPassword>' -f 'Spam Administration Account' -e root@localhost unix method, if per user db accounts setprop SpamAdmin zarafa enabled /etc/e-smith/events/actions/qmail-update-user
Set proper permissions on the script:
chmod 555 /usr/bin/DMZS-sa-learn.pl
Login to Zarafa with an account that has admin rights and make two new folders LearnAsSpam and LearnAsHam under: Public folder > Public folders. Set the permissions (right-click folder > Properties > Permission-tab) on both these new folders to:
Spam administration account * Folder visible * Read items * Edit items: all * Delete items: all Everyone (and/or other users/groups you've added at least need:) * Folder visible * Create items * Edit items: none * Delete items: none
Cron
Create a new crontab fragment:
pico /etc/e-smith/templates/etc/crontab/91_SpamAssasinLearn
Add the following to the template (change the execution times to your own likings -- Wikipedia on Cron):
# Running the Spamassasin Bayesian SPAM learning script every hour from 8:00 to 22:00 during weekdays 0 8-22 * * 1-5 root /usr/bin/DMZS-sa-learn.pl
Make the new fragment active by expanding the template:
expand-template /etc/crontab
Configuration
Spamassassin has to be enabled in the Email Panel
Bayesian learning has to be enabled and configured in SME with
config setprop spamassassin UseBayes 1 config setprop spamassassin BayesAutoLearnThresholdSpam 4.00 config setprop spamassassin BayesAutoLearnThresholdNonspam 0.10 expand-template /etc/mail/spamassassin/local.cf sa-learn --sync --dbpath /var/spool/spamd/.spamassassin -u spamd chown spamd.spamd /var/spool/spamd/.spamassassin/bayes_* chown spamd.spamd /var/spool/spamd/.spamassassin/bayes.mutex chmod 640 /var/spool/spamd/.spamassassin/bayes_* signal-event email-update
These commands will:
- enable bayesian filter
- 'autolearn' as SPAM any email with a score above 4.00
- 'autolearn' as HAM any email with a score below 0.10
Usage
- Move/copy spam messages that are delivered to your Inbox to the public LearnAsSpam folder.
- COPY regular messages that end up in your Junk E-mail folder to the public LearnAsHam folder.
After the messages have been processed they will be deleted to save your valuable space.