Fedora Users — Re: Spamassasin bribed?

Feed the spam to salearn and watch it evaporate away after awhile.

Of course, I have a long mantra about the RIGHT way to use SpamAssassin
that the authors do not agree with. So you might need to blow away
(erase) the Bayes databases and <sob> start over.

1) Use per user Bayes. Your posting makes it sound like your install
  is for a single person or a family. Take advantage of this. Per
  user Bayes has some serious advantages when one user's ham is
  another user's spam. Delete the auto-whilelist database, too.

2) Turn off auto-this-and-that. This means autolearn, autowhitelist,
  and all the other auto crap. SpamAssassin misfires on its ham and
  spam selection sometimes. Sometimes it misfires dramatically. The
  auto-crap codifies this into your database.

3) Make a decision whether you want to use per-user Bayes rules or not.
  There is a theoretical security risk and a practical SpamAssassin/Perl
  bug to deal with. I suspect the theoretical risk is just that if the
  folks with access to your SA machine are not kiddies intent on
  breaking it. I find per user rules handy with this Earthlink account.
  A lot of spam comes to improperly formatted addresses, Earthlink.com
  and mailgate.earthlink.[com|net]. I have a rule for that. I also have
  a rule that looks for one of my email aliases explicitly in the to
  list along with several other Earthlink.net addresses. That's a good
  spam sign for me.

4) Visit the SpamAssassin Rules Emporium, SARE, at it's home address
  http://www.rulesemporium.com/. Look over the rules selection. I go
  overboard here. I service two people with 4 or 5 accounts each. So
  I run about 40 sets of rules. I have a fast enough and big enough
  machine that this works nicely in the background. Pick the rules for
  what you need. RulesDuJour is the handy tool for grabbing updates
  to the rules. But they have not needed much updating recently.

5) Run the SURBL tests. SURBL is VERY conservative. Jeff is of the
  "No collateral damage" school of thought. His various facilities
  are almost perfect at not mis-tagging ham as spam.

6) Be sure to setup a handy manual training facility for Bayes. Bayes
  is one of your best friends. I run DoveCot for that here. (I rather
  prefer the old IMAPD "stuff" but....) I deliver POP3 for regular
  mail for the user's main Outlook Express "account". I use IMAP for
  the user's spam Outlook Express "account". I have four folders,
  ham, spam, oldham, and oldspam. Since both ham samples and spam
  samples are needed as training I have Loren and myself feed some
  batches of ham from time to time to keep ham and spam levels about
  even. Once I reached about a thousand of each Bayes was working very
  well indeed. So I only feed Bayes now when some obvious spam with
  some content (not just a URL) sneaks through with a low Bayes score.
  Outside the LKML, which has some spam trickle through and has patches
  and bug reports that look like "Chicken Pox" spam (visit SARE), I am
  seeing maybe one in 1000 spams sneaking through and one in several
  thousand hams getting a low spam score. After you've been training
  awhile you may want to empty "spam" into "oldspam" and "ham" into
  "oldham". Keep these messages around in case you need to blow away
  Bayes and retrain from scratch. It makes life easier. {^_-}

7) "Low spam score" is a piece of magic. Change your markup for the
  subject like to look like this (in /etc/mail/spamassassin/local.cf):
  rewrite_header Subject     *****SPAM***** _SCORE(00)_ **
  This gives you a three digit spam score with leading zeros that can
  be sorted VERY easily, even in Outlook Express, when you sort by
  subject. Low scores, which "might" be ham, float to the top. Check
  it a couple times a day and Bob's your uncle.

8) I mentioned a per user rules bug. If you set up for per user rules,
  "allow_user_rules 1" in local.cf, then you want to protect against
  the bug allowing random emails through totally unmarked. I use
  procmail here. This is ONLY needed with per user rules when you have
  a personal rule that scans "full" messages. So in my .procmailrc I
  include this mantra for running SpamAssassin:
===8<---
# First I rename forged (or prior) SpamAssassin markups.
:0
* ^X-Spam-Status:
{
   :0 fw
   | formail -R "X-Spam-Status:" "X-False-Spam-Status:"

   :0 fw
   | formail -A "X-Nasty: Aren't we?"
}

:0
* ^X-Spam-Level
{
   :0 fw
   | formail -R "X-Spam-Level" "X-False-Spam-Level"
}

:0
* ^X-Spam-Checker-Version:
{
   :0 fw
   | formail -R "X-Spam-Checker-Version:" "X-False-Spam-Checker-Version:"
}

# "The Tag"
:0 fw
| formail -A "X-Jdow: user $LOGNAME"

# Now the meat. I don't want SA to scan the SA mailing lists.
:0 fw
* ^List-Id: .*(dev\@spamassassin\.apache\.org|dev.spamassassin\.apache\.org)

| formail -A "$PROCMAILMATCH SpamAssassin Dev list" -i "Reply-to:dev@xxxxxxxxxxxxxxxxxxxxxxx"


# Now I presume I should have marked with the X-Jdow tag. But I look for
# it anyway.
:0 fw
* ^X-Jdow: user
{
  # So we did see the special tag. Did we ALSO see SpamAssassin markup?
  # And the renamed prior markup will not trip this rule.
  :0 fw
  * !^X-Spam-Checker-Version:
  * < 250000
  * !^List-Id: .*(spamassassin\.apache.\org)
  {
     # We did not, this happens on about one in thirty emails at random.
     :0 fw
     | nice -n 1 /usr/bin/spamassassin

     # Let me know about it for monitoring purposes.
     :0 fw
     | Formail -A "X-JdowMissed: SpamAssassin checks bombed first time."

     # REALLY let me know about it if I want to get a visibly clear
     # indication.
#      :0 fw
#      | sed -e 's/Subject:/Subject: [ZZ Missed]/'

     # Copy the failed mail to a special folder for diagnostics
#      :0c: clone1.lock
#      $HOME/mail/sa_failed
  }
}
===8<---
  I am sure some procmail expert out there can clean this up quite a
  bit. But it is basically what I am running with right now so I figured
  to pass it along.

9) If you REALLY want to get into SpamAssassin visit the users list at
  http://wiki.apache.org/spamassassin/MailingLists. The users list is
  probably the most useful. And as noted above it is best not to filter
  that list. This is one place on the Internet where spam is food rather
  than trash. It's used to make new rules so we never see it again. {^_-}

10) There is no tenth commandment^H^H^H^H^H^H^H^H^H^H^Hrule at this time.
   This is subject to change.

{^_^}   Joannne "My Bayes ALWAYS works" Dow, who has been reading the SA
       users list long enough to have noticed auto-stuff does not work
       as well as it might. (If you must use it change the trigger levels

to make it a LOT more conservative.)