The Fight Against Spam, Part 2
Pages: 1, 2
Does it Work with Other Languages?
Mail is often criticized because the system it uses "only reads English." Nothing could be further from the truth. Mail does accurately flag messages in other languages. The corpus on which it is pre-trained uses mail in different languages, and it is just as trainable in German or Japanese as it is in English texts -- thanks to a few other cool Apple technologies regarding tokenization that go beyond the scope of this article.
This Sounds Complex, Should I Disable it on my iBook?
Don't worry. Even though Junk Mail relies on very complex technologies, it's very efficient and easy on the computer, even on slower G3 laptops.
This is a good example of expanding capability without sacrificing performance, by writing good code.
An Introduction to Using "Junk Mail"
As soon as you launch Mail, the Junk Mail filter is turned on in "training mode." As long as training mode is on, Mail will display all the messages you receive in your inbox, including the junk. However, potential spams will be marked with cute, paper-bag icons and will appear in a disgustingly distinctive brown color, making spotting the unwanted messages easy.
If you notice a message that is incorrectly flagged as junk, simply open it and click on the "Not junk" button located at the top of the message in the brown banner. If you notice a message that should be marked as spam but isn't, select it and use the "Message" menu to "Mark it as junk mail." Alternatively, you can place a "Junk" button in your toolbar; simply use the "View" menu to customize it.
As soon as you mark a mail as Junk or Not Junk, the junk mail filter will fine-tune its analysis, learning what you consider to be junk and what it should let go through to your inbox. This simple-looking learning capability is actually what makes Mail amazing and very different from its competitors.
For most people, Viagra ads are spam and gardening-related messages are updates from their grandparents. But what if your grandparents like to talk about Viagra and you are being spammed by a gardening service? While most other programs won't be able to adapt to your situation, Mail will, and effortlessly.
Once you're satisfied with the accuracy of its analysis, you can switch it to "automatic" mode.

Figure 2. Mail's junk preferences.
As soon as automatic mode is turned on, any mail flagged as junk mail will be moved to a special Junk mailbox. Of course, you are still responsible for what happens to this mail. Should it be deleted? Kept for archiving> We'll see in a minute how to fine-tune this behavior.
Turning automatic mode on is a big step since it may prevent you from reading legitimate mails, especially if you don't check the Junk mailbox or you choose to delete your junk mails immediately. Although the number of false positives is extremely low (or, in most cases, null), you may want to add a signature to your mail or a note to your web site, stating that you use anti-spam filtering technologies. You can also ask that your potential correspondents resend emails if they do not receive answers in a certain timeframe.
Fine Tuning and Automating "Junk Mail"
In order to customize the filtering, use the "Mail" menu to open the Mail preferences and click on the "Junk Mail" button. Switching between "training" and "automatic" mode is as simple as selecting the corresponding radio button. As soon as you enter "automatic" you will see that Mail creates a new Junk mailbox with the same paper-bag icon. The following preferences are easily understandable. However, here are a few notes about what they can do:- Preventing messages that come from senders in your Address Book from being flagged as junk is probably a good choice. However, in some cases, you may not want to leave this feature on. Let's imagine that your aunt has your address but stores it on a virus-infected PC that sends your mail to spammers. In that case, applying filtering rules to the emails she seems to send to you may be a good idea.
- The same applies to the Previous recipients. While this feature can usually be safely turned on, business users or users who deal with dozens of emails per day will probably want to have it off, to ensure maximum protection.
- The fact that a message is addressed using your full name is in no way a warranty that it is legitimate. In fact, in my case, it is almost always a warranty that it isn't. Everyone I know calls me "F.J.", and only spammers who got my name off of a list use my real name.
The "Trust Junk Mail headers set by your Internet Service Provider" feature is great, but only as long as your provider uses standard junk-mail filtering options. Indeed, some ISPs use proprietary solutions that Mail doesn't know. If this is the case, you can create a special rule that scans the "Header" used by your provider to rate junk messages and decide whether it should be marked as junk or not -- a simple task that does not require any programming on your part.
![]()
Figure 3. Typical mail headers.
However, when turning this feature on, you will want to take into account how reliable your mail provider's filters are. Indeed, some of them are known for setting up paranoid filters that block all legitimate mails while some others let everything go through. Some of them now allow users to customize filters, a great step forward. In most cases, server-side junk-mail filtering features can be accessed through the provider's webmail interface, so it's worth having a look if you haven't checked in for awhile. You may actually find other nice features there. For example, the .Mac webmail allows you to set up a custom mail icon visible by all Mail.app users.
The "Advanced" button is extremely interesting. Do you remember the old days when Junk Mail was listed in the "Rules" category? Well, this button allows you to see junk-mail settings as a rule. For example, you could also set mail up to run an AppleScript when you receive mail. What about getting the headers of the message so that you can send them to your IT department? Or your email provider?
On a less ambitious scale, you can use this rule to mark junk mails as read automatically -- to avoid seeing the "unread messages" notifications while sorting through your legitimate mail. Play a specific sound as a reminder to have a look through your Junk Mail mailbox from time to time or, let's be crazy, switch the mail color from brown to purple.
What Should I Do with Spam Once It's Flagged?
We've seen that Mail.app will put flagged mail into a special mailbox called "Junk." However, your messages will stay there unless you specifically tell Mail what to do with them.
In order to do so, check the "Special Mailboxes" tab of your various account preferences. It contains a popup menu that allows you to specify what should be done to this mailbox.
Usually, storing junk messages on the server is a bad idea since it will increase the chances that your server will be cluttered and that your mailbox will reach full capacity, effectively bouncing legitimate messages back.
Deleting Junk messages when "Quitting Mail" would be my setting of choice since you probably don't want to keep them on your hard drive for too long. However, you should remember to check this mailbox for false positives before quitting Mail. Otherwise, they go unnoticed and may be deleted without you ever seeing them.
It sounds silly, but I suggest you use this opportunity to make sure Trash (in Mail.app) is set up well and that deleting messages from your various accounts does not simply move them to another folder on the server.
Since the trash setting is applied evenly to all of your accounts, you can set up separate rules to manage them individually if need be. For example, you may want to delete junk mail from your Home account automatically -- since your friends probably won't be too mad if you miss one of their healthy cooking tips. But you should purge your business account every week, so that you have a chance to scan it and avoid missing a potential customer.
Next Time
I'll wrap up this series on Friday with a closer look at techniques for applying rules, address masking, and some general tips to confound spammers. See you then!
FJ de Kermadec is an author, stylist and entrepreneur in Paris, France.
Return to the Mac DevCenter
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 32 of 32.
-
Junk rules not being followed
2007-04-05 08:12:27 Bookie1 [Reply | View]
I transferred my files automatically from my old Mac to my new one. The rules for junk are the same, but the program keeps sending email from addresses in my address book to the junk folder. I have tried "adding the sender to my address book" as well as clicking on "not junk" and moving the message into my inbox for a period of time, but something is still amiss. The program still doesn't learn. I might add that it worked perfectly on my old computer. I am ready to erase my address book and start over, but I dread the thought of that! Any suggestions?
-
Junk rules not being followed
2007-04-05 08:12:14 Bookie1 [Reply | View]
I transferred my files automatically from my old Mac to my new one. The rules for junk are the same, but the program keeps sending email from addresses in my address book to the junk folder. I have tried "adding the sender to my address book" as well as clicking on "not junk" and moving the message into my inbox for a period of time, but something is still amiss. The program still doesn't learn. I might add that it worked perfectly on my old computer. I am ready to erase my address book and start over, but I dread the thought of that! Any suggestions"
-
Apple's Mail Filter not savvy enough
2006-08-29 10:32:07 ixley [Reply | View]
I have had continued disappointments with Apple Mail's Junk Filter. Sometimes it works fine, but many times it does not. As much as I try to train it by marking junk mail, it does not seem to get the idea. The worst part is that it is not really that customizable. There is no threshold to set for junk status, and it won't even recognize my server designated Spam Assassin junk mail when it comes in. I had to set up a rule to move all mail marked [SPAM] to the junk folder, but there isn't even a way to actually mark it as junk in the rule! Therefore, it still gets acknowledged as legitimate mail until I manually go in and mark off the bad guys. For those unfamiliar, you really do want to mark spam as junk mail istead of just deleting it, because otherwise Mail adds the senders to your OK list and is more likely to accept mail from these senders in the future! Bad form... Checking out the new features for the new version of mail, I was pretty disappointed Apple chooses to glitz up crappy features like user templates, without adding any real improvement to its functionality. It's like they think that user-friendly software means fewer options and more eye-candy features.
-
The problem...
2005-12-02 09:54:59 freelancer [Reply | View]
I've found Mail.app's spam detection to be quite good. The big problem with it, though, the show-stopper, is that it runs inside the mail client. I get well over a thousand spams per day, often approaching two thousand, and nothing inside the mail client is ever going to be practical. Aside from not being practical, it also ties you to one mail client.
I use SpamAssassin, plus some fancy mail server configuration to reject lots of stuff at SMTP time, and it catches almost everything. But I'd love to have Apple's stuff on the server side. I'd pay for it.
-
The problem...
2005-12-06 02:18:46 FJ de Kermadec |
[Reply | View]
Hi!
First of all, thanks for taking the time to share your ideas with us!
While I do not know whether Apple is in the market for developing standalone SPAM management solutions, it certainly is worth sending your suggestion to them through one of the forms on the website.
Truly yours,
FJ
-
this might help
2004-07-20 07:38:37 nat0 [Reply | View]
At first i was unwilling to give any control to a filter that i couldn't control myself so instead i simply made rules to block domains that i knew where only used for spam. After a month or so this was blocking 95% of my spam.
But then the spammers started using domains only once so I i tried training the spam filter manually. this was tedious until i set "mark as junk" as on of the things that happened to msgs that got caught in my filter of rules. (consisting of about 150 blocked domains that were still spamming me).
It took about 10 days of my specialized filter training apple's spam filter for the accuracy to reach about 98%.
-
Unrelated text appended to spam
2004-05-28 20:24:54 verket [Reply | View]
I've noticed that some of my spam now has a paragraph or two at the end of unrelated text. It hits me as their attempt to get by statistical systems by lowering the signal to noise.
Should I mark these as junk as well? I don't want to start getting false positives because of the unrelated text.
Thanks! -
Unrelated text appended to spam
2004-06-17 09:10:08 FJ de Kermadec |
[Reply | View]
Hi !
Chances are that the additional text is not as carefully or as well written as a true letter. For example, a text that would read as follows for 20 or so lines :
Pay your bills, mortgage payments for bill payments.
Will probably not interfere with the normal filtering process.
In my opinion, it would be worth trying.
Let me know if this helps !
F.J.
-
Apple Data Kit
2004-05-22 16:45:32 DMouse [Reply | View]
I have searched developer.apple.com looking for information on "Apple Data Kit", but all I get are references to IO layer doco and the like. Can someone point me in the direction of the apple doco on Apple Data Kit? It looks like a lot of fun... -
Apple Data Kit
2004-05-23 01:10:35 FJ de Kermadec |
[Reply | View]
Hi !
The "Apple Data Kit" is in fact a larger structure : this name points to the various kits available to developers to search, categorize and handle data.
You should however be able to find technical documentation related to the various elements that are available under this name. The newly created Reference Library would probably be the best place to start :
http://developer.apple.com/referencelibrary/
F.J. -
Help! No mention in Reference Library
2004-06-17 00:28:06 golem [Reply | View]
The Reference Library doesn't mention Latent Semantic Analisys / Indexing, Singular Value Decomposition, or cluster analysis.
Are these tools available as part of the publically available Developers Kit?
I've been looking at applying cluster analysis for some time now, so I'd love to find any information about where to find the Apple tools. The GNU Scientific Library is close, but I'm especially interested in Apple's use of the Altivec processor for these vector calculations.
-
Applescript Bug?
2004-05-21 08:39:03 alex281 [Reply | View]
I love Mail.app's spam filter! I have set up an applescript that advances a web counter one "click" each time it runs and i display the counter on my website http://www.alex281.com/#junk
My problem is that Mail.app only runs the applescript once for each time it checks my mail regardless of how many messages it sees as junk. Is this the expected behavior of Applescripts in rules?
I can't take credit for the idea of a web spam counter, there was a hint on macosxhints.com that described how to set it up.
-
IP issues; could we see this in Mozilla/Thunderbird?
2004-05-19 14:13:41 Chirael [Reply | View]
This all sounds extremely cool. However, I have to wonder if we will--or ever could--see it in Mozilla or Thunderbird. What is the status of the intellectual property protection on this technology?
AFAIK, Mozilla/Thunderbird uses Bayesian filtering, which works OK but sounds like it might be able to be enhanced/replaced by this sort of thing. However, if Apple's got a patent on it, then I guess the open source community is out of luck (or will have to come up with something better :) -
IP issues; could we see this in Mozilla/Thunderbird?
2004-05-20 02:13:36 FJ de Kermadec |
[Reply | View]
Hi !
I am afraid that I cannot answer such a question... I could venture a guess, but not much more.
Would you be part of an Open Source project, I would recommend that you contact Apple and ask about the status of this technology. They are probably the ones to ask directly when it comes to legal matters.
I am sorry I cannot be of more help...
F.J. -
IP issues; could we see this in Mozilla/Thunderbird?
2004-05-20 12:36:22 DFoesch [Reply | View]
I worked for a Professor, who is one of the leads in developing LSA.
Good site to check out is at CU Bolder: http://lsa.colorado.edu/
New Mexico State University (where I worked) is in cooperation with them in developing LSA.
Any information that should be available on its usage or whatever, should be available from there.
-
Marking Messages as Junk in Rules
2004-05-19 10:15:59 rekreisler [Reply | View]
I have been getting multiple spams with the same subject (and same or similar body), which are not getting flagged as junk, even after I have marked all the previous ones as junk. I've been filtering them directly into the Trash using rules, but I think I'm messing up my Junk mail filter by not marking them as junk before they're thrown away. I'm guessing that this is the case because the Junk filter seems to operate at around 75% effectiveness, which is much less than the generic Spam Assassin filter I have server side (I filter the Spam Assassin flagged mail into my junk folder and manually mark it as junk if mail.app didn't think it was junk). I've used the Junk feature for over a year, it is not in training, and I haven't touched the defaults, such as turning on accept messages with my full name. Anyone know if filtering into the trash messes up your junk filter? -
Marking Messages as Junk in Rules
2004-05-19 10:41:02 FJ de Kermadec |
[Reply | View]
Hi !
As a general rule, it is a good idea to train Mail's junk mail filters so that it can better recognize SPAM.
A solution may be to move your messages automatically to the Junk mailbox and to mark them yourself from time to time...
However, would you think that Mail's filters have stopped learning, this may be due to a corrupted file. You can try deleting the two files it relies on : this will reset it entirely and may take care of the issue you are experiencing.
These files are in ~/Library/Mail and are called :
- LSMMap2
- MessageSorting.plist
Note that your rules will be reset too.
Let me know if this helps !
F.J.
-
Is mail relearning?
2004-05-19 06:28:36 Palmtop-Pro [Reply | View]
When settings are in "automatic" mode and a received mail is manually flagged as spam, does the program recognize that individual mail as such and learn?
Or is the spam simply moved to the trashbag?
Thats a vital point / question - in case the mail is only moved that means one has to go through the learning process from the beginning, what is a pain.
-
Is mail relearning?
2004-05-19 06:32:58 FJ de Kermadec |
[Reply | View]
Hi !
Whenever you manually flag a message as "Junk" or "Not Junk", the Mail filters will learn from your command and better identify what you consider as SPAM.
There is no need to switch modes to take advantage of Mail's learning capabilities.
Let me know if this helps !
F.J. -
Is mail relearning?
2004-05-26 06:24:03 Palmtop-Pro [Reply | View]
Hi again!
I tried to mark repeating similar incoming (sex) junks as such in automatic mode and it seems that the filter does not learn and catch this trash. Are you relly sure the learn mode is still on and resharping in automatic mode?
Thanks for enlightening me.
-
Is mail relearning?
2004-05-26 17:17:28 FJ de Kermadec |
[Reply | View]
Hi again !
You're very welcome ! :-)
Mail's Junk Mail filter should still work in automatic mode. Would you think that it is not the case on your Mac, switch back temporarily to training mode and see if you can train it more easily.
Would it work, a corrupted cache file may have prevented it from doing its work properly. In that case, the switch should take care of it.
Would it still not work, you can create a rule for these messages. However, before doing so, it is worth checking what may be raising an issue with Mail -- for example, is their syntax extremely elaborate, do they use a mix of languages ?
Let me know if it helps !
F.J. -
Is mail relearning?
2004-05-19 06:42:21 Palmtop-Pro [Reply | View]
Thanks for updating us / me.
That means in automatic mode if mails are marked manually as junk the filter gets sharper and sharper i understand now.
Another one:
I have a pretty good filter on my powerbook and want to transfer that very filter, eg. settings (or as we have learned matrix table) to my other machines. Is this possible?
Even Mac press support Europe could not answer that one - though.
Boris
www.palmtop-pro.com
publishing editor -
Is mail relearning?
2004-05-19 06:54:55 FJ de Kermadec |
[Reply | View]
Hi again !
You're really welcome ! It's always a pleasure to talk with my fellow Mac users ! :-)
You may want to try to transfer these two files. This is a bit of DYI computing but it should help.
~/Library/Mail/MessageSorting.plist
~/Library/Mail/LSMMap2
Let me know if this helps !
F.J. -
Is mail relearning?
2004-05-26 06:26:46 Palmtop-Pro [Reply | View]
Hi again!
I tried to mark repeating similar incoming (sex) junks as such in automatic mode and it seems that the filter does not learn and catch this trash. Are you relly sure the learn mode is still on and resharping in automatic mode?
Thanks for enlightening me.
-
visualisation of results
2004-05-19 06:22:36 drc11 [Reply | View]
"Slightly" off topic, the document searching facility is really useful but having a simple list returned is perhaps not the most informative way of looking at the results. Anyone know of a way of displaying the results as clusters with the distance between points representing the similarity between documents.
This could be an interesting way of exploring related documents
-
Usually works well
2004-05-18 23:57:07 ducasi [Reply | View]
I've found that the Junk filter works very well, often beating our Spam Assassin setup in identifying spam.
However, on my Mac OS 10.3 machine it has (or had) a problem. It never, ever flagged as spam messages which had an old email address of mine in the headers. I made sure this email address didn't appear anywhere in my address book, account settings, or "previous recipients". I tried resetting the system, rebuilding the whole of my Mail system. I tried everything. Nothing worked.
Eventually I killed the old email address, but I'd be curious to know what the problem was, and whether it has been fixed since Mac OS 10.3.2.
Cheers!
-
Usually works well
2004-05-19 06:37:29 FJ de Kermadec |
[Reply | View]
Hi !
While it is difficult to know what precisely happened on your installation without having a closer look at it, it is possible that one of the filter's cache files was corrupted, preventing it from working normally.
Usually, deleting the file allows Mail to work again perfectly. The "MessageSorting.plist" file is a good candidate, for example.
Let me know if this helps !
F.J.
-
98% isn't that good
2004-05-18 22:36:16 alderete [Reply | View]
My experience with Apple Mail's junk mail features hasn't been that good. The author suggests that it's better than Bayesian filtering, but my stats from SpamSieve indicate that it's 99.5% accurate on my real live mail, while the author says that Apple's Mail can theoretically get up to 98%.
Sounds like a small difference, but do the math. If you receive 100 messages a week (low volume), that's 5200 messages a year. Mail will make 100+ mistakes a year; SpamSieve will make 25 or so.
If, like me, you get several hundred messages a day, then the difference is even noticeable.
And this assumes that Mail reaches 98%, which it's no where near for me. More like 90%. So now we're talking 500+ mistakes a year, on a low-volume e-mail account.
Your mileage may vary, but if you get a lot of spam, you should definitely try SpamSieve, which works with Mail, Eudora, Mailsmith, and most other e-mail clients for Mac OS X. -
98% isn't that good.. not at all
2004-05-25 19:43:26 mkarver [Reply | View]
I used to just rely on desktop based spam filters, but it was too much work to update all the computers on the network when new versions came out. So I signed up for a server based filter with Sentinare Spam Filtering. Greater then 99.6% is awesome and all server side, so nothing to download and install and maintain. I was gonna use Postini but I heard they are even worse then iMail at 85% effectivness and they have a $250 monthly minimum, so Sentinare was a way better deal at less the $3 a mailbox. I have had zero false positives thus far and really happy with the speed, service & support. -
98% isn't that good
2004-05-19 11:20:55 hondo77 [Reply | View]
In my experience, ditto.
Mail.app started off okay for me because some filtering was better than none and, as the article mentions, it's really easy to set up. However, good enough soon isn't.
I got SpamSieve a few months ago and after a single day of training it was catching much more spam than Mail.app. On my work account, which gets far fewer emails than my home account, SpamSieve not only caught all the spam but it didn't generate any of the false positives that Mail.app was always doing (something about email from Slashdot really rubbed Mail.app the wrong way no matter how many times I marked them as not junk). -
98% isn't that good
2004-05-19 06:40:36 FJ de Kermadec |
[Reply | View]
Hi !
I am sorry to hear that you have been experiencing issues with Mail's Junk Mail feature.
Of course, every SPAM-fighting solution has its strengths and weaknesses. However, you should get far better results from Mail.app than what you are describing...
Have you tried resetting Mail by deleting its message sorting cache and preferences files ? Sometimes, a corrupted file prevents Mail from working properly and reaching full accuracy.
F.J.





