DSPAM

From SME Server
Revision as of 15:37, 12 March 2013 by Knuddi (talk | contribs)
Jump to navigationJump to search

Maintainer

This contrib has been developed by Jesper Knudsen from SME Optimizer

Description

I have for a long time used SME's built-in SpamAssassin with a few custom additions to get rid of most of my spam. Recently I noticed that the DSPAM project was alive again and have since heard from many sources that it did a great job for them. I did not want to get rid of SpamAssassin but wanted to combine the strength of the two spam engines. One of the "weaknesses" of DSPAM is that it requires a significant amount of training before it provides reliable result - this training I am using SpamAssassin scoring to provide.

I have therefore made this DSPAM plug-in which works in co-operation with SpamAssassin to get rid of even more spam.

This contrib consists for most of two items:

  • qpsmtpd plugin which handles the training of the DSPAM engines based on SpamAssassin results and the which also, when training is complete, ensures that emails are classified with DSPAM for later scoring.
  • SpamAssassin plugin which used the DSPAM classification results to provide additional SpamAssassin scoring based on the DSPAM classification.

Installation

The package needs a working DSPAM installation and the sme-dspam contrib.

wget \
http://sme.swerts-knudsen.dk/downloads/DSPAM/sme-dspam-1.0.2-5.noarch.rpm \
http://sme.swerts-knudsen.dk/downloads/DSPAM/dspam-3.9.0-sme7.i386.rpm \
http://sme.swerts-knudsen.dk/downloads/DSPAM/libdspam-3.9.0-sme7.i386.rpm \
http://sme.swerts-knudsen.dk/downloads/DSPAM/libdspam-mysql-3.9.0-sme7.i386.rpm
yum localinstall \
sme-dspam-1.0.2-5.noarch.rpm \
dspam-3.9.0-sme7.i386.rpm \
libdspam-3.9.0-sme7.i386.rpm \
libdspam-mysql-3.9.0-sme7.i386.rpm

Uninstall

You can simply remove the package again with the usual yum command.

yum remove sme-dspam

Configuration

The contrib initially does DSPAM training and will continue to do so until DSPAM claims that training is complete. It monitors the output of "dspam_stats -H" to see when training has completed and will then switch to scoring/tagging mode. When training is complete the admin will receive an email notification. Until it received this mode you will not see any DSPAM benefits.

The training of DSPAM is done based on SpamAssassin scores and by default it will train as SPAM if SpamAssassin rejects the email and score is above 9. It will train as ham (DSPAM terminology innocent) when mail is scores lower than 5 by SpamAssassin.

These two values can be configured by the config system

config setprop dspam hamlevel xx (default: 5)
config setprop dspam spamlevel xx (default: 9)

and then do a:

signal-event email-update

Statistics

DSPAM Specific Statistics

You can follow how DSPAM is doing by use of the dspam_stats command. Below is an example where I started the tagging process before training was complete. Here you can see that 4 emails reported as False Negatives meaning DSPAM claimed they were ham and SpamAssassin scored them as Spam (above spamlevel).

[root@mx]# dspam_stats -H

qpsmtpd:
               TP True Positives:                    71
               TN True Negatives:                    66
               FP False Positives:                    0
               FN False Negatives:                    4
               SC Spam Corpusfed:                  5890
               NC Nonspam Corpusfed:                872
               TL Training Left:                   1562
               SHR Spam Hit Rate                 94.67%
               HSR Ham Strike Rate:               0.00%
               PPV Positive predictive value:   100.00%
               OCA Overall Accuracy:             97.16%


When contrib is in training mode you should see the following type of event in your qpsmptd log when issuing the command:

tail -f /var/log/qpsmtpd/current | tai64nlocal | grep dspam
2010-01-04 16:05:43.495837500 24369 dspam plugin: Training email as spam (32.3 > 9)
2010-01-04 16:06:12.922243500 24460 dspam plugin: Training email as spam (26.2 > 9)
2010-01-04 16:08:30.707928500 24571 dspam plugin: Training email as spam (40.2 > 9)
2010-01-04 16:15:09.209315500 25154 dspam plugin: Training email as spam (28.7 > 9)
2010-01-04 16:15:12.657721500 25093 dspam plugin: Training email as innocent (-2.3 < 5)
2010-01-04 16:15:31.505187500 25230 dspam plugin: Training email as innocent (1.0 < 5)
2010-01-04 16:15:56.084894500 25261 dspam plugin: Training email as spam (33.2 > 9)
2010-01-04 16:16:35.734852500 25302 dspam plugin: Training email as innocent (0.1 < 5)
2010-01-04 16:16:37.373583500 25297 dspam plugin: Training email as spam (39.5 > 9)
2010-01-04 16:17:50.398104500 25284 dspam plugin: Training email as spam (30.2 > 9)
2010-01-04 16:18:13.514300500 25412 dspam plugin: Training email as spam (23.2 > 9)
2010-01-04 16:18:41.653611500 25396 dspam plugin: Training email as spam (35.2 > 9)
2010-01-04 16:20:05.432484500 25486 dspam plugin: Training email as spam (24.6 > 9)
2010-01-04 16:20:07.036783500 25528 dspam plugin: Training email as innocent (1.7 < 5)
2010-01-04 16:21:04.378237500 25766 dspam plugin: Training email as innocent (1.0 < 5)
2010-01-04 16:21:21.849091500 25797 dspam plugin: Training email as innocent (-2.6 < 5)
2010-01-04 16:22:32.693008500 25860 dspam plugin: Training email as spam (30.3 > 9)
2010-01-04 16:28:22.610804500 26245 dspam plugin: Training email as spam (24.3 > 9)

When contrib is in tagging mode you can see the following type of output from the command:

tail -f /var/log/qpsmtpd/current | tai64nlocal | grep dspam
2010-01-04 16:14:27.830989500 21955 dspam plugin: dspam result: Spam with Confidence of 0.99 and Probability of 1.0000 (4b4205d3219672044083174)
2010-01-04 16:15:57.446155500 22065 dspam plugin: dspam result: Spam with Confidence of 0.99 and Probability of 1.0000 (4b42062d220731786917372)
2010-01-04 16:20:55.422770500 22430 dspam plugin: dspam result: Spam with Confidence of 0.99 and Probability of 1.0000 (4b420757224401732614111)
2010-01-04 16:21:05.836167500 22453 dspam plugin: dspam result: Innocent with Confidence of 0.99 and Probability of 0.0000 (4b420761224588618216848)
2010-01-04 16:21:20.033604500 22330 dspam plugin: dspam result: Spam with Confidence of 0.80 and Probability of 1.0000 (4b420770224877713217748)
2010-01-04 16:24:41.615738500 22636 dspam plugin: dspam result: Innocent with Confidence of 0.76 and Probability of 0.0000 (4b420839226414726512081)
2010-01-04 16:24:43.453742500 22636 dspam plugin: Retraining email as spam classification (14.9 > 9)
2010-01-04 16:25:34.647693500 22729 dspam plugin: dspam result: Spam with Confidence of 0.99 and Probability of 1.0000 (4b42086e227377747245261)
2010-01-04 16:25:38.648186500 22743 dspam plugin: dspam result: Spam with Confidence of 0.99 and Probability of 1.0000 (4b420872227551892345671)
2010-01-04 16:26:04.702731500 22773 dspam plugin: dspam result: Innocent with Confidence of 1.00 and Probability of 0.0000 (4b42088c227818922614116)
2010-01-04 16:26:06.441017500 22770 dspam plugin: dspam result: Spam with Confidence of 0.99 and Probability of 1.0000 (4b42088e227882615116573)

Notice the retraining of DSPAM that took place after a DSPAM classification as Innocent but with a total SpamAssassin score of 14.9

SpamAssassin General Statistics

You can monitor with rules are fired by SpamAssassin for both spam and ham with this little script which runs through the /var/log/spamd/current log file.

cd /usr/bin/
wget http://sme.swerts-knudsen.dk/downloads/DSPAM/sa-stats
chmod +x sa-stats
./sa-stats

The output will look something like this.

Email:     2895  Autolearn:  2591  AvgScore:  22.54  AvgScanTime:  3.74 sec
Spam:      2165  Autolearn:  2075  AvgScore:  33.86  AvgScanTime:  3.44 sec
Ham:        730  Autolearn:   516  AvgScore: -11.05  AvgScanTime:  4.64 sec
Time Spent Running SA:         3.01 hours
Time Spent Processing Spam:    2.07 hours
Time Spent Processing Ham:     0.94 hours
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
  1    RCVD_IN_APEWSL2                  1809    67.05   83.56   18.08
  2    RCVD_IN_BRBL                     1789    62.04   82.63    0.96
  3    RAZOR2_CHECK                     1786    61.93   82.49    0.96
  4    BAYES_99                         1780    61.49   82.22    0.00
  5    RAZOR2_CF_RANGE_51_100           1759    61.00   81.25    0.96
  6    DIGEST_MULTIPLE                  1656    57.37   76.49    0.68
  7    DCC_CHECK                        1567    56.93   72.38   11.10
  8    URIBL_BLACK                      1528    53.26   70.58    1.92
  9    RCVD_IN_XBL                      1494    51.64   69.01    0.14
 10    RAZOR2_CF_RANGE_E8_51_100        1485    51.47   68.59    0.68
 11    RCVD_IN_JMF_BL                   1484    51.68   68.55    1.64
 12    PYZOR_CHECK                      1445    50.36   66.74    1.78
 13    RCVD_IN_PBL                      1413    48.95   65.27    0.55
 14    URIBL_JP_SURBL                   1347    46.53   62.22    0.00
 15    URIBL_SBL                        1320    45.60   60.97    0.00
 16    URIBL_WS_SURBL                   1294    44.70   59.77    0.00
 17    DSPAM_SPAM_99                    1147    39.62   52.98    0.00
 18    SEM_URIRED                       1135    39.79   52.42    2.33
 19    SEM_URI                          1002    34.78   46.28    0.68
 20    HTML_MESSAGE                      981    52.92   45.31   75.48
----------------------------------------------------------------------
TOP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
  1    BAYES_00                          715    25.98    1.71   97.95
  2    DSPAM_HAM_99                      696    25.01    1.29   95.34
  3    HTML_MESSAGE                      551    52.92   45.31   75.48
  4    SPF_PASS                          329    13.68    3.09   45.07
  5    RCVD_IN_JMF_W                     145     5.11    0.14   19.86
  6    RCVD_IN_APEWSL2                   132    67.05   83.56   18.08
  7    MIME_HTML_ONLY                    131    14.82   13.76   17.95
  8    SPF_HELO_PASS                      96     3.52    0.28   13.15
  9    DCC_CHECK                          81    56.93   72.38   11.10
 10    RCVD_IN_DNSWL_MED                  63     2.18    0.00    8.63
 11    RCVD_IN_DNSWL_LOW                  62     2.14    0.00    8.49
 12    SARE_SUB_ENC_UTF8                  59     3.56    2.03    8.08
 13    MPART_ALT_DIFF                     55     2.63    0.97    7.53
 14    USER_IN_WHITELIST                  48     1.66    0.00    6.58
 15    MIME_HTML_MOSTLY                   43     2.00    0.69    5.89
 16    MIME_QP_LONG_LINE                  31     2.56    1.99    4.25
 17    EXTRA_MPART_TYPE                   31     1.52    0.60    4.25
 18    MIME_BASE64_BLANKS                 31     1.07    0.00    4.25
 19    HTML_IMAGE_RATIO_06                29     1.04    0.05    3.97
 20    MISSING_MID                        28     1.52    0.74    3.84
----------------------------------------------------------------------

FAQ

Can I force it to start scoring even though training hasn't completed?

Yes, you can do this by changing config:

config setprop dspam action tag
signal-event email-update

Can I alter the score given to DSPAM classified emails?

Yes, you have to manually edit the /etc/mail/spamassassin/dspam.cf file. Notice that an upgrade of sme-dspam later, will overwrite your modifications. When you have made your modification issue an:

signal-event email-update

How do I report a problem or a suggestion?

This contrib has not yet been created in the bugtracker so just send an email to mailto:contribs@swerts-knudsen.dk