Fedora Users — Re: mail server send mail to yahoo bulk folder

Tim wrote (about SpamAssassin flagging .biz domains):
> Yes, in that message, though in practice the entire message was parsed.
> Any mention of one gets detected.
> 
> My main beef with it has been not that it uses it as a "slightly higher
> rating", but that it flags it *as* spam, outright.  On the basis that an
> e-mail came from that domain, or mentions a URI inside it with it, it'd
> be flagged as BEING spam.  So, completely NON spam e-mail was getting
> dumped, carte blanche.  Often recipients were completely unaware of
> this.

I'm not sure which version of SpamAssassin you're talking about, nor yet
which rulesets are enabled. But that doesn't sound right -- normally, a
spam will have to trigger a number of rules before SpamAssassin will
mark it as spam.

In the current 3.1.7:

score BIZ_TLD 1.719 1.169 2.035 2.013
uri BIZ_TLD                     /\.biz(?::\d+)?(?:\/|$)/i
describe BIZ_TLD                Contains an URL in the BIZ top-level domain    

score INFO_TLD 1.373 0.813 1.457 1.273
uri INFO_TLD                    /\.info(?::\d+)?(?:\/|$)/i
describe INFO_TLD               Contains an URL in the INFO top-level domain

The scores vary on whether you've got "network" tests (i.e. blacklists)
and Bayesian testing turned on.

By default, one will need a score of 5 before a spam is marked as spam.

SpamAssassin *doesn't* currently score on mail (allegedly) *from* a .biz
domain (I've got plenty, thanks, and there's no mention of SpamAssassin
scoring on it).

> Yet, real spam, loaded with a plethora of readily identifiable
> indicators gets a quit low probability rating, and still gets through to
> the inbox.

This is quite possible.

My experience with SpamAssassin is that the extra, non-Bayesian rules
are helpful as an adjunct to the Bayesian rules, but that the default
rules don't catch a good proportion of the spam. Worse, SpamAssassin
won't even turn Bayesian filtering on until it has learnt 200 good and
200 spam e-mails.

But once it's been trained (and once the Bayesian filter has been given
boosted scores), it is an effective way of sorting out spam.

http://lwn.net/Articles/172491/ is a good review of various filters.

> The detection priorities are all out of whack.  Spamassassin's moronic
> defaults (e.g. the above mentioned behaviour), plus it's black & white
> listing databasing on "to" and "from" addresses, instead of the content
> of the message (e.g. white list a mailing list, and all spam gets
> approved that forges those addresses),

Um.

For one thing, SpamAssassin tries to "auto-whitelist" and
"auto-blacklist" on IP address and sender
(http://wiki.apache.org/spamassassin/AutoWhitelist). This would
effectively stop spam from forging an existing correspondent's identity,
*if* SpamAssassin can identify the "right" IP to log. (If an e-mail is
forwarded from one address to an ISP, through fetchmail and through
Postfix, each step adding a "received" address, then SpamAssassin really
does need to be told which MTAs are trustworthy. The trusted_networks
configuration option does this).

For another, SpamAssassin shouldn't be auto-whitelisting mailing lists.
At least on the Fedora list, it "auto-whitelists" individual senders,
not the list. Even then, all it does is note an average scores per
(alleged) sender. If a mail forges that sender's address and scores
highly, SpamAssassin will reduce the score somewhat, but if there are
enough "spam" signs, the score will still be enough to get the mail
marked as spam.

The detection priorities are periodically reassigned based on the
SpamAssassin databases of good and spam e-mails -- there is a method
behind it.

If you are finding that e-mail is being flagged as spam when it
shouldn't be, then http://wiki.apache.org/spamassassin/DoYouWantMyHam
will allow you to send the developers a copy, so the appropriate rules
can be tightened or lowered.

Hope this helps,

James.
-- 
E-mail:     james@ | You can accept the existence of rain without denying the
aprilcottage.co.uk | existence of umbrellas.
                   |     -- http://ozyandmillie.org/2006/om20060615.html