view TODO @ 568:f2aa58c2afd0 HEAD

SEARCH CHARSET support. Currently we do it through iconv() and only ASCII characters are compared case-insensitively.
author Timo Sirainen <tss@iki.fi>
date Sun, 03 Nov 2002 10:39:43 +0200
parents fe8a014a479e
children e5ff7ddeb895
line wrap: on
line source

 - bugs
    - fix update_by_replace (.data file updating is broken now)
    - RENAME: If the name has inferior hierarchical names, then the inferior
	      hierarchical names MUST also be renamed (ie. foo -> bar renames
	      also foo/bar -> bar/bar). (and RENAME INBOX!)
    - passwd-file doesn't notice changes in the file
    - tree has some locking issues while opening it
    - maildir: if mail file isn't found, it may be because it was renamed
      (flag changed). we must then sync the directory and see again if the mail
      is found
    - mail-lockdir.c isn't 100% safe.. stale locks are detected by checking
      that hard link count is 1, then it's unlink()ed. but what if another
      process did the same unlink() + creat() in the middle of our
      stat()..unlink()?
    - SEARCH FROM/TO/CC/BCC now generates the field from ENVELOPE which it
      uses for matching. This however gives different results than when
      matching from headers.
    - mbox: what if 1 msg is deleted is x-imapbase rewritten?

 - reliability fixes:
    - if we deleted mail but didn't write modify log, other dovecots don't
      handle it properly. they either assert at index-sync.c:42 or if new
      mails have also been added since, they don't notice it at all
      actually, that breaks reads as well since we get expunges only from
      the old file.. and check that deleting file does "inconsistency error"
    - if imap process notices that both modify logs are getting full because
      it's client isn't syncing, the client should be disconnected
    - if opening indexes fails because we timeout while trying to lock it, we
      recreate the indexes. that's not very good idea .. also does it do that
      even if .customflags can't be locked? it's not really related to
      indexes..

 - checks:
   - if we have entries in modifylog with UID 10..11, 9..12, 8..13 etc.
     do they work correctly?
   - check that search message-id worked properly always
   - check that search's OR and () work properly
   - Should SEARCH SENT* apply timezone?
   - make sure SELECT rebuilds index properly when next_uid is near 32bit value
   - make sure connection limits work

 - enhancements:
    - "* NO Mailbox is locked, will override in xx seconds"
    - UW-IMAP doesn't send it's fields to client: X-IMAPbase, Status, X-Status,
      X-Keywords, X-UID.. should we? probably just makes things more difficult
    - search: support having longer keywords than buffer block
    - option: use mmap() vs. read() to access mails. read() seems to be a bit
      faster with linux/x86, and better through NFS since it doesn't read
      data uselessly.
    - when fetching body/envelope/etc we could try to cache it immediately if
      we can get lock with try_lock.
    - optionally use only in-memory indexes
    - maildir could support also the dirty-flag in messages. files would be
      renamed "whenever there's time" (that'd require the indexer program, or
      forking and doing it in background)
    - optionally keep the message file name as it's UID. Then we don't have to
      save the filename anywhere.
    - send EXISTS immediately after new mail arrives.
        - linux: we can use dnotify for maildir (but not mbox I think)
	- *bsd: kqueue() can notify changes in mbox and maildir
	- rest: stat() mbox file and maildir directory once in a while.
	  maybe configurable how often if at all, not nice with NFS.
	- we could sync flag changes at least if there's no expunges
	- don't send it more often than once in 30 secs or so

 - allow index files to be in completely separate location than mail data.
   mails could be read through slow NFS access but indexes from fast local
   disk. with this thinking it makes more sense to create larger index files
   to save for example mail headers. also index rebuilding should be very
   light operation, the indexes would be filled while the data is being
   accessed by the imap client. of course all this should be optional so
   we don't slow down when mails and indexes are stored in same disk.

 - we need permanent storage for UIDs. with mbox use X-UID like UW-IMAP,
   with maildir a) file:2,flags,Uuid b) file,U=uid:2,flags. uid validity
   would be in .uidvalidity file. the b-case would require that to be done
   by the client moving it from new/ to cur/
     - other possible maildir flags to use in filename: S=size (file size,
       for maildir++ quota), W=size (rfc822.size by some uw-imap patch)

index:
 - mbox:
     - if a file isn't valid mbox and it's tried to be opened, say it in one
       line in error log, not 6..
     - empty lines at beginning of file still aren't ignored
     - UW-IMAPd writes empty spaces after X-Keywords which it uses so that
       it doesn't have to rewrite the whole file if status flags changed
       in the beginning of it. We could do that too.
     - we need either From-line escaping or writing Content-Length when saving
       mails.
 - read-only support for mailboxes where we don't have write-access
 - we should try to avoid completely rebuilding indexes unless they're
   corrupted. especially if we later want to support some read-only boxes
   and keep the mail flags only in index file. fsck() could verify that
   records are ok, and that if data file isn't ok the record is deleted.
 - if .customflags is removed and Maildir files have custom flags, add
   "unknown1" "unknown2" etc. flags to .customflags file for each found flag
 - debug: index could be read-only mmaped when it's not locked. 
 - when index is being rebuilt, it always complains about tree/modifylog
   having wrong indexid..
 - we sometimes leave some space in the index files (memory alignment,
   extra_space). we should keep those bytes zeroed to make sure nothing
   sensitive is left there.
 - log transferred amount of bytes. just a bit problematic who logs it, since
   imap-login does SSL transfers but not unencrypted.. could also log SSL
   settings (especially compression).
 - if we wanted to support huge mailboxes with small memory usage, it'd now
   be possible if we just instead of mmap()ing the whole index files would
   have maybe 3-4 256k mmap()ed areas which we move based on the need.
     - should work fine with .imap.index and .imap.index.data
     - log files aren't affected by mailbox size
     - if the tree file also kept constantly moving the nodes so that
       tree's root was at the beginning of the file, we could use this mmap
       caching with it too
     - but, is it worth the trouble really? the OS can do all this itself,
       only thing we're doing is keeping the processes virtual memory usage
       small.

lib-storage:
 - support multiple mailbox formats and locations for one user. that would
   require support for multiple MailStorages, and since we're chroot()ed,
   usually the only way to communicate with others would be to create
   RemoteMailStorage which would use TCP/UNIX sockets to connect to another 
   imap session.
 - SEARCH:
     - CHARSET support, iconv()? also means we need to parse the charset stuff
       in headers.
     - could optionally support scanning inside file attachments and use
       plugins to extract text out of them (word, excel, pdf, etc. etc.)
     - use a trie index for fast text searching, like cyrus squat?
 - DELETE/RENAME: when someone else had the mailbox open, we should
   disconnect it (when stat() fails with ENOENT while syncing)
 - RENAME INBOX isn't atomic with Maildir. And in general, RENAME can't
   move mails between different storages. Maybe support doing also using
   COPY + delete once COPY is atomic?
 - maildir: atomic COPY could be done by having transaction directories.
   Make a "tra" directory at the same level as cur/new/tmp, and make it
   have subdirectories in the same way as tmp has temp files. Directory
   begins with a "." as long as transaction isn't finished, rename()ing
   it away finishes it. All mails under finished dirs must be moved into
   new/ directory and the directory removed by any process who notices them.
 - we should probably do some light checking that appended mails actually
   look like valid rfc822 mails..
 - maybe limit the length of custom flags? we don't really have a problem
   with them, but with mbox a long X-IMAPbase could break something.. Maybe
   configurable, default to 50 chars?
 - we could send flag changes after all commands by making expunge/flags sync
   counters separate for modify log. flags would need to update the seq
   though, too slow?
 - things calling message_send() could verify that it wrote enough data.
   if not, fill the rest with spaces and return failure. -1 = error,
   0 = filled, 1 = ok.

general:
 - sieve (rfc3028)
 - rfc2231 continuation support

 - ulimit / setrlimit() should be set somewhere for imap process. and maybe
   also separate limits for data stack and mem pools
 - create indexer binary
 - SIGHUPing master should reload the configuration .. killing imap-auth and
   imap-login processes? or just signal imap-login to stop accepting new
   connections and let it kill itself
 - setting for choosing mbox locking methods
 - imap-login leaks I/O descriptors when killed, that's because the SSL
   fds are destroyed lazily.. should we bother fixing..?
 - logins are always sent now using syslog(), we'd need to have i_info()
   or something so they could also be written to log files.. also make it
   possible to log into different log than errors.
 - should we bother checking if there's invalid 8bit headers in
   BODY/BODYSTRUCTURE output and converting them to quoted printable? well,
   several of them are now but not all..
 - update docs/index.txt
 - support Maildir++ quota
 - maybe give more untagged NO/ALERT replies? like when mailbox is in
   inconsistent state. and when UIDs are reordered because they're too large.
 - *_strdup_printf() functions could use C99 compatible vsnprintf() instead of
   printf_string_upper_bound().
 - imap/ and lib-imap/ should allow infinite number of custom flags, it's
   storage's problem if it can't handle too many of them.

auth / login:
 - kchuid, SRP, anonymous SASL
 - PAM: support some options so /etc/passwd-lookup isn't needed. uid=x, gid=y,
   mailroot=/var/mail. maildirs should be then created when needed
 - Digest-MD5: support integrity protection, and maybe crypting. Do it
   through imap-login like SSL is done?
 - imap-auth should limit how fast authentication requests are allowed from
   login processes. especially if there's one login/connection the speed
   should be something like once/sec. also limit how fast to accept new
   connections.
 - Diffie Hellman parameters should be regenerated once in a while
 - HIGH: support executing each login in it's own process, so if an exploit
   is ever found from it, the attacker can't see other users' passwords.
     - master should limit number of login processes to max_logging_users,
       killing old processes when limit is reached
     - master should try to keep login_processes_count extra processes all
       the time
     - login should notify master after it accept()s, and it must close the
       listening socket immediately
     - Diffie Hellman parameters for SSL need to be somehow transferred
       between login processes. It's too slow if they're generated every time,
       and I'd rather not link SSL libs to imap-master.

cleanups / checks:
 - grep for FIXME
 - check if t_push()/t_pop() should be added somewhere
 - allocating readwrite pools now just uses system_pool .. so pool_unref()
   can't free memory used by it .. what to do about it? at least count the
   malloc/free calls and complain if at the exit they don't match
 - ..wonder what it would look like if I did s/FooBarBaz/struct foo_bar_baz/..
 - HIGH: Make sure messages of size INT_MAX..UINT_MAX (and more) work
   correctly with 32bit file offsets. virtual_size can also overflow making
   it less than physical_size.
 - create env_put() and env_clean()
 - nearest_power() could be problematic with things that want it for ints,
   not size_t..
 - when sending lots of data with io_buffer_send(), it does a lot of
   io_add() and io_remove()s. and io_remove() just marks it destroyed, so
   it may creates lots of IOs before the next ioloop run.. Though now it
   doesn't matter much since we're corked and we don't create the IOs,
   but ioloop should probably be fixed anyway.

optional optimizations:
 - provide some helper binary to save new mail into mailboxes with CR+LF
   line breaks?
 - disk I/O is the biggest problem, so split the mail into multiple computers
   based on user and have a proxy in the front redirecting the connection.
   cyrus had something like this except a lot more complicated - it tried
   to fix the problem of having shared mailboxes. we have the same problem
   with local shared mailboxes as we don't use same UID for everyone's mail
   and we may be chrooted, so locally we could communicate with UNIX sockets,
   remotely that could be done with TCP sockets.

capabilities:
 - preferrably all should be possible to #ifdef away by a configure
   option (--without-capabilities=acl,namespace,...)
 - possibility to disable them from config file
 - acl (rfc2086, draft-ietf-imapext-acl), namespace (rfc2342)
     - probably do it like cyrus. "user.<username>" to access other
       users, with "" defaulting to "user.<myself>". these should be
       configurable however.
     - shared namespaces? maybe configurable in config file
     - easiest way to do ACL would be to use unix modes, but is that
       useful at all? Well, ACL2 has a bit better support for that, so
       maybe we could support it.
     - otherwise gets a bit trickly, we could keep all mail in "imapmail"
       group and 0600/0700 mode by default, but when mail is shared to others,
       the group read/write access bits would be set. or alternatively we
       could launch another imap process to handle it, which we should support
       anyway. ACLs could be stored into ".acl" ascii file in each folder.
     - support for private and shared flags, configurable by mailbox admin.
       this isn't in any draft yet, but ACL2 author was going to create one.
       [SHAREDFLAGS (...)] would specify which ones are shared, don't know yet
       how they would be configured.
 - quota (rfc2087, draft-cridland-imap-quota)
     - give filesystem values only to admins
     - support for Maildir++, probably no need to support more.
       quota capability supports complex quota configuration, but if
       no mailer supports them we probably shouldn't bother either
 - id (rfc2971)
     - must be configurable what gets sent, default to only name=Dovecot
     - separate pre/post-login settings
     - optionally log configured parts of the client information, but only
       once, probably at the same time as logging "Logged in",
       "Disconnected", etc.
     - remember to force truncating values longer than 30 chars,
       especially before logging
 - mailbox-referrals (rfc2193)
     - this is useful whenever we would otherwise need to make the
       connection ourself. for example load balancing and shared mailboxes
       requiring another UID to run.
     - this rfc defines no exact way for server to detect if client
       supports referrals or not. I don't think there's much point in
       supporting only referrals, as most clients don't support them.
       Instead we should return referrals when we know that client
       supports them, otherwise do the connecting ourself. If client
       issues RLIST or RLSUB command, it's safe to assume it supports
       referrals.
     - for load balancing this works just fine, but what about shared
       mailboxes which require different UID? If we login with our own
       username, we end up with our own UID instead of what we wanted.
       IMAP URLs don't support separated authorization id which would
       have made this very easy.. We could give the "userid@group" as
       userid, but clients probably treat it as different userid and
       ask the password again.
     - problems, problems, .. maybe not worth the trouble.
 - literal+ (rfc2088)
     - simple. in case of invalid data, just disconnect client.
 - idle (rfc2177)
     - just call the syncing every few seconds (configurable)
     - with Linux we can use fcntl() and F_SETSIG to provide fast checks.
       just make sure sync() still won't be called more than once in a
       few seconds
 - uidplus (rfc2359)
     - uid expunge: no problem
     - append, copy: oh no. these would slow down things and make
       handling them much more difficult. currently we just store the
       mails to destination mailbox without touching the indexes. since
       we'd need to know their final UID, we'd have to lock the indexes
       and mbox) fsck() first and append() next to find out the uid,
       maildir) move the mail directly into cur/ and index it.
 - unselect (no draft or anything AFAIK)
     - like CLOSE, but doesn't expunge mails. easy.
 - drafts:
     - http://www.imc.org/ids.html
     - multiappend (draft-crispin-imap-multiappend)
	 - shouldn't have any problems
     - listext (draft-ietf-imapext-list-extensions)
	 - well, it expired January 2002.. I like it though.
     - children (draft-gahrns-imap-child-mailbox)
	 - I like listext more.. They have the same functionality though,
	   so pretty easy to support both if needed
     - annotate (draft-ietf-imapext-annotate)
	 - per-message annotations. this will be major change. especially
	   because currently there's no suitable storage for them, and
	   they'll probably change all the time.. maybe if we moved into
	   berkeley db to store the .data file and these annotations.
     - annotatemore (draft-daboo-imap-annotatemore)
	 - server and per-mailbox annotations. much easier than
	   per-message annotations, but they'd be easier to place into
	   db as well.
     - binary (draft-nerenberg-imap-binary)
	 - perhaps not too useful. I'd like to make Dovecot fully
	   binary-safe though.
     - sort (draft-ietf-imapext-sort)
	 - basically sorted SEARCH, requiring CHARSET support for
	   UTF-8 and ASCII
	 - we could create alternative binary tree file(s) for different sort
	   conditions, ".tree-sort" or something. or if we decide to just
	   keep it in memory, btree could still be best choice.
	 - required by squirrelmail (webmail)
     - thread (draft-ietf-imapext-thread)
         - basically SORT but reply with thread lists
	 - possibly use a binary tree too .. or maybe it's enough to use the
	   sort-tree and then just pick up the references separately? have to
	   check more carefully later.
     - view (draft-ietf-imapext-view)
         - slow, complex, luckily draft expired almost two years ago.
	   i hope i don't have to implement this :)
	 - can be done client-side just fine (evolution's virtual folders)