Once You Know, You Newegg
Home FAQ SpamAssassin How to enable bayes Autolearning
How to enable bayes Autolearning PDF Print E-mail
Sunday, 12 April 2009 01:50

Updated 2/9/10: Updated links so they worked correctly. Removed the references for maildrop and procmail spam filtering as they are now obsoleted.

Once you have trained bayes with 200 hams and spams, you then need to 

There are a few things you need to train spamassassin to do before bayes can start learning how to tell the difference between spam and non-spam. The more you train bayes, the better the learning algorythim.

Before continuing on I want to let you know about 1 thing. If you are running the freebsdrocks spamd service, you do not have to change spamd to a non-root user. The service is configured to run as user qscand. Please skip down to the section that starts First to make sure bayes can be turned on, bayes needs to be trained for 200 hams and 200 spams. Run the following command:

Before starting with Bayes, one of the things I would suggest is running SpamAssassin as a non-root account. You can do this by adding an option to the spamd.sh startup script. Edit your SpamAssassin startup script and look for the following line. I give 2 different options depends on what version of SpamAssassin you're running.

Option 1: spamd_flags=${spamd_flags:-"-d -x -r ${spamd_pidfile} "}
Option 2: : ${spamd_flags="-c  "}

and modify it. Add the -u qscand to make SpamAssassin run as user qscand:

Option 1: spamd_flags=${spamd_flags:-"-u qscand -d -x -r ${spamd_pidfile} "}
Option 2: : ${spamd_flags="-c -u qscand "}

The path or flags file may vary from system to system. When you are done, save and exit and restart spamd. All your spamd processes should now run as qscand.

First to make sure bayes can be turned on, bayes needs to be trained for 200 hams and 200 spams. Run the following command:

# sa-learn --dump magic

0.000 0 5752 0 non-token data: nspam
0.000 0 1702 0 non-token data: nham

As you can see from the above example, I have 5752 spams and 1702 hams
The spam and ham totals must be at least 200 each.

The nspam total is the total amount of spams Bayes has learned.
The nham total is the total amount of hams Bayes has learned.

Here is how to train SpamAssassin hams and spams.

There are a few ways to feed sa-learn spams and hams. The easiest way is
by running the command right from console. Lets just say that you have a
folder in ~vpopmail/domains/domain.ext/test/Maildir/spam. Run the sa-learn
command like so. Replace domain.ext with your domain andreplace user with the actual user on your system :

# sa-learn --spam ~vpopmail/domains/domain.ext/user/Maildir/.Spam/new

To learn hams in ~vpopmail/domains/domain.ext/user/Maildir/new, run

# sa-learn --ham ~vpopmail/domains/domain.ext/user/Maildir/new

You'll get an output similar to the following in wither either case. Actual messages numbers may vary.

Learned from 30 message(s) (30 message(s) examined).

This tells you that out of 30 messages in the new folder, 30 were learned. If you run sa-learn --dump magic, your nspam total will have 30 more new messages learned as spam.

You basically need 200 hams and 200 spams before you can enable bayes autolearning. Once you have done that, add the following lines to your local.cf

# The line below needs to point to the users bayes_path that spamassassin runs as. In this case, the qscand home folder is /tmp
bayes_path /tmp/.spamassassin/bayes
use_bayes 1
bayes_auto_learn 1
bayes_file_mode 0770

The first line tells the bayes path to tell bayes where to store the bayes database. The next line enables bayes. The next line after that enables autolearning. and the next line just forces a chmod of 770 on the bayes database for security reasons.

Restart spamd and within a day or so you will see autolearn appear in your headers. I am not sure why it takes so long for it to come into the header part of the emails. It just does for some reason.

Last Updated on Friday, 23 April 2010 19:19