On Fri, 2010-06-25 at 14:54 -0700, JD wrote: > I wonder how Google does it. only .01% of my google email is spam. > The spam folder contains tons of spam, and it is automatically purged > by google. When you're a large mail host you have one big advantage in spam killing: You will receive tons of identical messages, many addressed to bogus users, or honeypot addresses (addresses that you leak out, somehow, that aren't for real mail use). When you receive large numbers of identical messages, especially to non-real addresses, you know that they're spam, and you can mark every single one of them as being spam with 100% confidence. You don't need to check for false positives, as no real mail will be sent to such addresses. Whereas it is possible for lots of users to receive identical mail, if you have lots of people subscribed to some popular lists. I've done that (honeypotting) in the past, and it's a reliable technique. Unlike many other anti-spam techniques which falsely identify so many real messages as being spam that they make using them a waste of time (if you're having to keep on checking your spam box, manually, there's no point in running anti-spam software). Not to mention the problems caused when users have no idea that they must check for false detections, and simply never see some of their mail. This is harder to do on an individual level, because most of your spam messages are different from each other. So you're left with trying to look for *similarities* to prior spam. Though it is possible to make use of other people's honeypotting data (the various anti-spam lists that you've seen discussed in other messages in this thread). -- [tim@localhost ~]$ uname -r 2.6.27.25-78.2.56.fc9.i686 Don't send private replies to my address, the mailbox is ignored. I read messages from the public lists. -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines