Fedora Users — Re: For sale Brand New Juicy Couture Sidekick II for $120

From: "Nigel Henry" <cave.dnb@xxxxxxxxxx>

I can't say I'm too clued up on the finer points of spam filtering, but amwilling to learn. Ideally spam should be stopped at source, but I don'tsuppose there's much chance of that happening.

I can give a data dump. But I am not at the level of expertise of the
BogoFilter authors or the SpamAssassin authors. There are some really
fine writeups of the filtering techniques in SpamAssassin that have
been done over the years. It has a state of the art Bayes filter and
it supports rules for things that indicate spam.

I think I mentioned this before. But there are some MUAs that will
accept a text/plain base64 encoding that is really a gif image and
display it. So an obvious "rule" comes to mind. Gif files have a
common leadin other wise they cannot be decoded. So the first 6 or 8
BYTEs of the BASE64 are "obvious". If you can detect them anywhere in
a message as the start of a BASE64 coding section, not exceptionally
hard to do with SA, you can create a __RULE type rule that defines
this detection. Then you can setup another rule that detects the
MIME type - if it is image/gif you give abother __RULE type rule
that criterion to fire. (The __RULE type rules will figure in META
rules but not the final score.) So we have __IS_GIF and __MIME_IS_GIF
for a pair of rules. We then create a META rule and give it a perhaps
hefty score: meta JD_MISPLACED_GIF (__IS_GIF && !__MIME_IS_GIF).
Bang, it's captured. There's no way a BAYES engine alone can do this
sort of trick. This is why I ended up dismissing them way back several
years ago when I surveyed the situation.

Rules alone can be quite good. This is especially true since SA
allows plugin code modules and it allows block list testing WITH
SCORES. That latter is critically important. "SORBS" is quick, it
is also very dirty. So when I use its general list I give that only
a modest score well below the spam threshold. The SURBL lists are
slower reacting and much better with respect to false positives. I
use them with a higher score. A filter that uses only Bayes misses
out on rule based flexibility and these BL lists. To be fair if you
are using your own smtp server to receive email this is somewhat
mooted if you can use greylisting with scores on the block lists.
That is often used to take load off the SA filters.

One modest ISP, which features one of the SARE ninjas on its staff,
works ONLY based on rules. Others use global Bayes to make the
picture a little tighter. Although this runs into the "one man's
'poison' is another man's gourmet soup." (Think cilantro. For some
men it's intolerably bitter.) Very large ISPs may have to trim down
more than medium sized or smaller ISPs in this regard. The boutique,
small office, or home "ISP" has the easy luxury of per user Bayes.
If the users are "smart enough" to train the Bayes the whole system
can become quite well tuned and effective. If it is small enough and
there is time to toss in special whitelist or blacklist rules per
user you can really do well. (And I have a devastatingly accurate
rule that is based on a 'peculiarity' about Earthlink that chops out
one whole set of spam messages based on the TLD. But it requires a
custom rule per user.)

That's the small data dump. Wandering around the SpamAssassin wiki
might be interesting. Hm, I need to tweak some of the people who
have done nice writeups to place them in an easy to find place in
the wiki.

This is Dr. Curtis Kret's presentation from BH 04 (Slow to load):
http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-kret.pdf
It is an excellent introduction to the concepts of rule based
anti-spam. And with the SpamAssassin Rules Emporium ninjas on the
job you could consider it almost a Bayesian filter for complex
spam features.

This is his presentation slides from Torocon 2004:
http://spamassassin.apache.org/presentations/2004-09-Toorcon/html
And this page on the wiki site has some excellent writeups about
SA's Bayes filter: http://wiki.apache.org/spamassassin/BayesAccuracy
The spamassassin users mailinglist is also a world of help.

That's probably more dump than I should have done. But somebody asked.
It's almost an interesting game trying to win against these turkeys.
I just wish the ultimate anti-spam could be legislated - open hunting
season for convicted spammers.... Yeahhhhh!

{^_-}   Joanne