Fedora Users — Re: deleteing e-Mail, quota and dovecot

On Mon, Nov 07, 2005 at 10:50:00AM -0600, Les Mikesell wrote:
> On Mon, 2005-11-07 at 09:05, Derek Martin wrote:
> > I'm not sure how dovecot works internally, but I know UW-IMAP makes
> > temporary copies of the whole mailbox when the user is deleting
> > stuff... 
[snip]
> With mbox format there isn't much choice but to copy the whole
> file to make any change.  

Now why would you say a thing like that?  ;-)

On Mon, Nov 07, 2005 at 12:17:25PM -0500, Tony Nelson wrote:
> Mbox is good for using less disk space and often faster searching.

Indeed.  Though some users argue that it is more convenient to search
maildir, because you can use tools like grep and such.

> However, an mbox is just a large file, and deleting anything from the
> middle (though not the end) /requires/ making a copy of the file without
> the deleted messages.  

OK, people keep chiming in with this misconception, so I gotta speak
up.  I'm sorry to say it, but this just isn't true...  It usually is
implemented this way for the sake of simplicity and a little added
security (in the sense of data assurance), but it is *not* required by
any means.

Mbox mailboxes can be re-written in place, which eliminates the need
to make copies of the entire mailbox.  Though complex, message deletes
can be done in place, which saves potentially a great deal of time.

I can think of 2 ways to implement expunging deleted messages without
making a copy of the mailbox, and without changing the order of the
messages.  Though the basic idea is the same; one method uses MMIO,
and the other uses stream I/O, the basic algorithm for both is the
same.  I have implemented the MMIO version in the past...  I may even
still have the code around, if you're curious.

The basic jist is to overwrite the deleted message with data from the
next undeleted message, moving all the subsequent data down as you go.
The advantage to doing it this way is that it is faster than making a
copy of the entire mailbox, in the common case.  Most people have a
number of messages that they save in their mailbox(es), some of which
have large attachments, and most of the deleting happens at the end of
the mailbox.  Deleting messages in place means you don't need to
re-copy all that data that's going to hang around.  You only need to
move some data at the end of the file to earlier parts of the message,
and then truncate the file.  This can be done with seek and write
operations, or it can be done with MMIO, by simply copying the memory.
MMIO should be faster, unless the host OS's implementation of MMIO is
broken.

The downside is that if the system crashes while you're doing this,
your mailbox is toast.  But that's par for the course with mbox
anyway.  And of course it's complex to code, so the programmer has to
be careful, or your mailboxes are toast.  :-D  Still, it's too bad
it's not implemented this way more often, cuz it is better, and makes
the "maildir deletes are faster than mbox" argument a lot less
compelling...

I can think of a third way to do it, if you don't care about
maintaining message order of the mailbox.  You can pull messages off
the end of the mailbox, writing them to a temporary mailbox, and
deleting them as you go.  This isn't as good as writing them in place
because you're still re-writing the whole mailbox, but at least you
never use more space than the amount of mail you want to keep.

I like mbox.  For most things that I do, it's just plain faster than
maildir, and for the rest it mostly doesn't matter.  The main reason I
use maildir is usually because locking is broken in the given
environment, or some other stupid thing like that.  Certain back-up
software also breaks Mutt's new mail detection, so I occasionally use
it to work around that problem.

-- 
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D
Attachment: pgpoD9BKJjQQo.pgp
Description: PGP signature