Tim wrote (about SpamAssassin flagging .biz domains): > Yes, in that message, though in practice the entire message was parsed. > Any mention of one gets detected. > > My main beef with it has been not that it uses it as a "slightly higher > rating", but that it flags it *as* spam, outright. On the basis that an > e-mail came from that domain, or mentions a URI inside it with it, it'd > be flagged as BEING spam. So, completely NON spam e-mail was getting > dumped, carte blanche. Often recipients were completely unaware of > this. I'm not sure which version of SpamAssassin you're talking about, nor yet which rulesets are enabled. But that doesn't sound right -- normally, a spam will have to trigger a number of rules before SpamAssassin will mark it as spam. In the current 3.1.7: score BIZ_TLD 1.719 1.169 2.035 2.013 uri BIZ_TLD /\.biz(?::\d+)?(?:\/|$)/i describe BIZ_TLD Contains an URL in the BIZ top-level domain score INFO_TLD 1.373 0.813 1.457 1.273 uri INFO_TLD /\.info(?::\d+)?(?:\/|$)/i describe INFO_TLD Contains an URL in the INFO top-level domain The scores vary on whether you've got "network" tests (i.e. blacklists) and Bayesian testing turned on. By default, one will need a score of 5 before a spam is marked as spam. SpamAssassin *doesn't* currently score on mail (allegedly) *from* a .biz domain (I've got plenty, thanks, and there's no mention of SpamAssassin scoring on it). > Yet, real spam, loaded with a plethora of readily identifiable > indicators gets a quit low probability rating, and still gets through to > the inbox. This is quite possible. My experience with SpamAssassin is that the extra, non-Bayesian rules are helpful as an adjunct to the Bayesian rules, but that the default rules don't catch a good proportion of the spam. Worse, SpamAssassin won't even turn Bayesian filtering on until it has learnt 200 good and 200 spam e-mails. But once it's been trained (and once the Bayesian filter has been given boosted scores), it is an effective way of sorting out spam. http://lwn.net/Articles/172491/ is a good review of various filters. > The detection priorities are all out of whack. Spamassassin's moronic > defaults (e.g. the above mentioned behaviour), plus it's black & white > listing databasing on "to" and "from" addresses, instead of the content > of the message (e.g. white list a mailing list, and all spam gets > approved that forges those addresses), Um. For one thing, SpamAssassin tries to "auto-whitelist" and "auto-blacklist" on IP address and sender (http://wiki.apache.org/spamassassin/AutoWhitelist). This would effectively stop spam from forging an existing correspondent's identity, *if* SpamAssassin can identify the "right" IP to log. (If an e-mail is forwarded from one address to an ISP, through fetchmail and through Postfix, each step adding a "received" address, then SpamAssassin really does need to be told which MTAs are trustworthy. The trusted_networks configuration option does this). For another, SpamAssassin shouldn't be auto-whitelisting mailing lists. At least on the Fedora list, it "auto-whitelists" individual senders, not the list. Even then, all it does is note an average scores per (alleged) sender. If a mail forges that sender's address and scores highly, SpamAssassin will reduce the score somewhat, but if there are enough "spam" signs, the score will still be enough to get the mail marked as spam. The detection priorities are periodically reassigned based on the SpamAssassin databases of good and spam e-mails -- there is a method behind it. If you are finding that e-mail is being flagged as spam when it shouldn't be, then http://wiki.apache.org/spamassassin/DoYouWantMyHam will allow you to send the developers a copy, so the appropriate rules can be tightened or lowered. Hope this helps, James. -- E-mail: james@ | You can accept the existence of rain without denying the aprilcottage.co.uk | existence of umbrellas. | -- http://ozyandmillie.org/2006/om20060615.html