Feed the spam to salearn and watch it evaporate away after awhile.
Of course, I have a long mantra about the RIGHT way to use SpamAssassin
that the authors do not agree with. So you might need to blow away
(erase) the Bayes databases and <sob> start over.
1) Use per user Bayes. Your posting makes it sound like your install
is for a single person or a family. Take advantage of this. Per
user Bayes has some serious advantages when one user's ham is
another user's spam. Delete the auto-whilelist database, too.
2) Turn off auto-this-and-that. This means autolearn, autowhitelist,
and all the other auto crap. SpamAssassin misfires on its ham and
spam selection sometimes. Sometimes it misfires dramatically. The
auto-crap codifies this into your database.
3) Make a decision whether you want to use per-user Bayes rules or not.
There is a theoretical security risk and a practical SpamAssassin/Perl
bug to deal with. I suspect the theoretical risk is just that if the
folks with access to your SA machine are not kiddies intent on
breaking it. I find per user rules handy with this Earthlink account.
A lot of spam comes to improperly formatted addresses, Earthlink.com
and mailgate.earthlink.[com|net]. I have a rule for that. I also have
a rule that looks for one of my email aliases explicitly in the to
list along with several other Earthlink.net addresses. That's a good
spam sign for me.
4) Visit the SpamAssassin Rules Emporium, SARE, at it's home address
http://www.rulesemporium.com/. Look over the rules selection. I go
overboard here. I service two people with 4 or 5 accounts each. So
I run about 40 sets of rules. I have a fast enough and big enough
machine that this works nicely in the background. Pick the rules for
what you need. RulesDuJour is the handy tool for grabbing updates
to the rules. But they have not needed much updating recently.
5) Run the SURBL tests. SURBL is VERY conservative. Jeff is of the
"No collateral damage" school of thought. His various facilities
are almost perfect at not mis-tagging ham as spam.
6) Be sure to setup a handy manual training facility for Bayes. Bayes
is one of your best friends. I run DoveCot for that here. (I rather
prefer the old IMAPD "stuff" but....) I deliver POP3 for regular
mail for the user's main Outlook Express "account". I use IMAP for
the user's spam Outlook Express "account". I have four folders,
ham, spam, oldham, and oldspam. Since both ham samples and spam
samples are needed as training I have Loren and myself feed some
batches of ham from time to time to keep ham and spam levels about
even. Once I reached about a thousand of each Bayes was working very
well indeed. So I only feed Bayes now when some obvious spam with
some content (not just a URL) sneaks through with a low Bayes score.
Outside the LKML, which has some spam trickle through and has patches
and bug reports that look like "Chicken Pox" spam (visit SARE), I am
seeing maybe one in 1000 spams sneaking through and one in several
thousand hams getting a low spam score. After you've been training
awhile you may want to empty "spam" into "oldspam" and "ham" into
"oldham". Keep these messages around in case you need to blow away
Bayes and retrain from scratch. It makes life easier. {^_-}
7) "Low spam score" is a piece of magic. Change your markup for the
subject like to look like this (in /etc/mail/spamassassin/local.cf):
rewrite_header Subject *****SPAM***** _SCORE(00)_ **
This gives you a three digit spam score with leading zeros that can
be sorted VERY easily, even in Outlook Express, when you sort by
subject. Low scores, which "might" be ham, float to the top. Check
it a couple times a day and Bob's your uncle.
8) I mentioned a per user rules bug. If you set up for per user rules,
"allow_user_rules 1" in local.cf, then you want to protect against
the bug allowing random emails through totally unmarked. I use
procmail here. This is ONLY needed with per user rules when you have
a personal rule that scans "full" messages. So in my .procmailrc I
include this mantra for running SpamAssassin:
===8<---
# First I rename forged (or prior) SpamAssassin markups.
:0
* ^X-Spam-Status:
{
:0 fw
| formail -R "X-Spam-Status:" "X-False-Spam-Status:"
:0 fw
| formail -A "X-Nasty: Aren't we?"
}
:0
* ^X-Spam-Level
{
:0 fw
| formail -R "X-Spam-Level" "X-False-Spam-Level"
}
:0
* ^X-Spam-Checker-Version:
{
:0 fw
| formail -R "X-Spam-Checker-Version:" "X-False-Spam-Checker-Version:"
}
# "The Tag"
:0 fw
| formail -A "X-Jdow: user $LOGNAME"
# Now the meat. I don't want SA to scan the SA mailing lists.
:0 fw
* ^List-Id: .*(dev\@spamassassin\.apache\.org|dev.spamassassin\.apache\.org)
| formail -A "$PROCMAILMATCH SpamAssassin Dev list" -i "Reply-to:
dev@xxxxxxxxxxxxxxxxxxxxxxx"
# Now I presume I should have marked with the X-Jdow tag. But I look for
# it anyway.
:0 fw
* ^X-Jdow: user
{
# So we did see the special tag. Did we ALSO see SpamAssassin markup?
# And the renamed prior markup will not trip this rule.
:0 fw
* !^X-Spam-Checker-Version:
* < 250000
* !^List-Id: .*(spamassassin\.apache.\org)
{
# We did not, this happens on about one in thirty emails at random.
:0 fw
| nice -n 1 /usr/bin/spamassassin
# Let me know about it for monitoring purposes.
:0 fw
| Formail -A "X-JdowMissed: SpamAssassin checks bombed first time."
# REALLY let me know about it if I want to get a visibly clear
# indication.
# :0 fw
# | sed -e 's/Subject:/Subject: [ZZ Missed]/'
# Copy the failed mail to a special folder for diagnostics
# :0c: clone1.lock
# $HOME/mail/sa_failed
}
}
===8<---
I am sure some procmail expert out there can clean this up quite a
bit. But it is basically what I am running with right now so I figured
to pass it along.
9) If you REALLY want to get into SpamAssassin visit the users list at
http://wiki.apache.org/spamassassin/MailingLists. The users list is
probably the most useful. And as noted above it is best not to filter
that list. This is one place on the Internet where spam is food rather
than trash. It's used to make new rules so we never see it again. {^_-}
10) There is no tenth commandment^H^H^H^H^H^H^H^H^H^H^Hrule at this time.
This is subject to change.
{^_^} Joannne "My Bayes ALWAYS works" Dow, who has been reading the SA
users list long enough to have noticed auto-stuff does not work
as well as it might. (If you must use it change the trigger levels
to make it a LOT more conservative.)