Fedora Users — Re: spamassassin doesn't seem to be using bayes

From: "D. D. Brierton" <darren@xxxxxxxxxxx>

I'm using FC4 with spamassassin-3.0.4-1.fc4. fetchmail delivers mail to
a locally running postfix. spamd is running as a service, and spamc is
called by procmail on my mail. My setup is almost identical to that
desribed here:

http://wiki.apache.org/spamassassin/UsedViaProcmail

However, despite the fact that I have trained spamassassin on a vast
amount of both ham and spam using sa-learn, I suspect that Bayesian
testing is not being applied. I became suspicious that this might be the
case after receiving over a dozen almost identical messages and despite
training spamassassin on them they are still not being identified as
spam. So I started looking at the headers that spamassassin adds to each
message more closely. Here is the header it added to a recent message
from this list:

X-Spam-Status: No, score=0.0 required=5.0 tests=RCVD_BY_IP
autolearn=failed version=3.0.4

And here is an example of an incorrectly identified spam message:

X-Spam-Status: No, score=2.8 required=5.0 tests=HELO_DYNAMIC_IPADDR,
RCVD_BY_IP autolearn=no version=3.0.4

So we come back to this message and note that indeed for spamc as he
has it invoked the Bayes scores are not working.

So what we need to know is what spamd options are used and how he is
calling spamc.

Spamd in the setup I have gets called this way:
SPAMDOPTIONS="-d -c -m5 -Hi -A 192.168.0.,127. --max-conn-per-child=15"

The procmail recipe I have is rather complex. I use a lot of "full" and
"all" based rules in my user_prefs file. So I have installed a work-
around for a bug, presumably in perl itself, which these rules trigger
on a seemingly random basis. The spamassassin part of the .procmailrc
file looks like this:
===8<---
# Remove some spurious markups that some spams seem to include
:0
* ^X-Spam-Status:
{
   :0 fw
   | formail -R "X-Spam-Status:" "X-False-Spam-Status:"

   :0 fw
   | formail -A "X-Nasty: Aren't we?"
}

:0
* ^X-Spam-Level
{
   :0 fw
   | formail -R "X-Spam-Level" "X-False-Spam-Level"
}

# This one is important to remove. It is used for the PerMsgStatus.pm
# bug work around.
:0
* ^X-Spam-Checker-Version:
{
   :0 fw
   | formail -R "X-Spam-Checker-Version:" "X-False-Spam-Checker-Version:"
}

##############################################################################
# run spamassassin on things not from the spamassassin list
##############################################################################
:0
* < 250000
* !^List-Id: .*(spamassassin\.apache.\org)
{
  :0 fw: spamassassin.lock
  | /usr/bin/spamc -t 150 -u jdow
}

# Did we get a PerMsgStatus.pm bug hit? If so we have scanned but
# no SA markups.
# So we did at least start processing the message. Does it have an
# SA markup, is it smaller than 250k BYTES, and is it NOT to one of
# the spamassassin lists?
:0 fw
* !^X-Spam-Checker-Version:
* < 250000
* !^List-Id: .*(spamassassin\.apache.\org)
{
  # Rescan it with raw spamassassin slightly niced.
  :0 fw
  | nice -n 1 /usr/bin/spamassassin

   # For debugging mark the message clearly for easy sorting.
#   :0 fw
#   | Formail -A "X-JdowMissed: SpamAssassin checks bombed first time."

   # Alternative subject marking for debugging.
#   :0 fw
#   | sed -e 's/Subject:/Subject: [ZZ Missed]/'

  # Place a COPY of the message in sa_failed folder. Be nice to the
  # poor thing and review the folder from time to time. {^_-}
  :0c: clone1.lock
  $HOME/mail/sa_failed
}
# Bingo - we're done.
===>8---
Note that spamc is invoked with a LONG timeout, which is not usually
needed. And I explicitly tell it to run as me. This is from MY home
directory's .procmailrc file, of course. This could probably be
generalized for a system wide /etc/procmailrc file quite easily via
"$USER". (I think I'll remove the timeout "-t 150" option. It was
needed on the old machine, 66MHz Pentium. It's not on the newer machine,
a 1GHz Athlon. {^_-}) I suspect the "-c $USER" aspect is what is missing
from your procmailrc.

{^_^}