On Jun 26, 2010, at 10:36 AM, Tim wrote: > On Fri, 2010-06-25 at 14:54 -0700, JD wrote: >> I wonder how Google does it. only .01% of my google email is spam. >> The spam folder contains tons of spam, and it is automatically purged >> by google. > > When you're a large mail host you have one big advantage in spam > killing: You will receive tons of identical messages, many > addressed to > bogus users, or honeypot addresses (addresses that you leak out, > somehow, that aren't for real mail use). When you receive large > numbers > of identical messages, especially to non-real addresses, you know that > they're spam, and you can mark every single one of them as being spam > with 100% confidence. You don't need to check for false positives, as > no real mail will be sent to such addresses. Whereas it is > possible for > lots of users to receive identical mail, if you have lots of people > subscribed to some popular lists. > > I've done that (honeypotting) in the past, and it's a reliable > technique. Unlike many other anti-spam techniques which falsely > identify so many real messages as being spam that they make using > them a > waste of time (if you're having to keep on checking your spam box, > manually, there's no point in running anti-spam software). Not to > mention the problems caused when users have no idea that they must > check > for false detections, and simply never see some of their mail. > > This is harder to do on an individual level, because most of your spam > messages are different from each other. At the individual level, I keep telling myself I'm going to set up a honeytrap, or maybe it should be called flypaper. Deliberately leak trap addresses in places I tend to use my real addresses, auto-blacklist anything that hits the trap addresses. Haven't got around to it yet. > So you're left with trying to > look for *similarities* to prior spam. Though it is possible to make > use of other people's honeypotting data (the various anti-spam lists > that you've seen discussed in other messages in this thread). > > -- > [tim@localhost ~]$ uname -r > 2.6.27.25-78.2.56.fc9.i686 > > Don't send private replies to my address, the mailbox is ignored. I > read messages from the public lists. > > > > -- > users mailing list > users@xxxxxxxxxxxxxxxxxxxxxxx > To unsubscribe or change subscription options: > https://admin.fedoraproject.org/mailman/listinfo/users > Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines