view TODO @ 300:a101127403a7 HEAD

keep exclusive lock while rewriting.
author Timo Sirainen <tss@iki.fi>
date Mon, 23 Sep 2002 14:48:29 +0300
parents 696139d3b8f6
children fd304e62e88a
line wrap: on
line source

test:
 - make sure mmap()s work properly with NFS
 - make sure locking is done properly when opening/switching modifylog 
 - make sure SELECT rebuilds index properly when next_uid is near 32bit value
 - make sure rfc822_parse_date() works properly
 - make sure imap_match functions work properly
 - make sure connection limits work
 - make sure it's noticed by other processes if a) data file is compressed,
   b) hash is rebuilt
 - make sure the index's ftruncate stuff works
 - make sure modify log works properly, especially switching the files

index:
 - optimization:
     - could hash function be better..? like uid*uid? what about changing
       probe strategy from linear to something else?
     - support shrinking hash file when it becomes 99% empty or so
     - if first_hole_records == MAIL_INDEX_RECORD_COUNT() -
       header->messages_count, we know we can just skip over the hole and do
       another direct lookup there
     - we could use tree structure to keep track of seqnumbers.. each node
       would store how many subnodes it has. deleting nodes (mails) would just
       update those counts. this increases the cost of lookups/inserts/deletions
       but is faster when more than one hole appears in file.. is it worth it?
       maybe #ifdefed away. except we could get rid of the hash file with this
       as well, since it could be used to look for both sequences and uids. it
       also speeds up UID range lookups when the first UIDs don't exist. use
       right-threaded redblack/avl trees (we need to know all child node counts,
       does that affect redblack's performance?)
 - mbox:
     - if a file isn't valid mbox and it's tried to be opened, say it in one
       line in error log, not 6..
     - locking: if we set shared lock to it while we're accessing it, we could
       get it pretty reliable. this means that the mbox fd needs to be locked
       before sync() and kept locked after that until we're done with it.
       problems are:
         - we don't have a single open mbox fd, we open it multiple times
	 - switching to exclusive lock may deadlock
	 - because mbox-rewrite rename()s the file, the old file gets lost.
	   if mailer only checks the fd lock, the new mails disappear..
	   i guess the only way to fix this is to set dotlock before opening
	   the mbox file.
     - maybe support Content-Length for figuring out size of text? at least
       mutt doesn't prefix "From " in outbox.. If we verify that both
       Content-Length and Lines match correctly, there's quite a little chance
       that it could be broken by sending them invalid (doesn't local MTA
       update them anyway?). Though, this may be a bit difficult to implement,
       and now that we verify the From-line better, is this even needed?
     - rewriting could try to preserve the locations of fields it changes
       instead of writing them all to end..
     - mbox-rewrite rename()s the file, which breaks if the original was a
       symlink. but how do we fix this? we may not have write-access to the
       directory where it points to, so we'd need to manually copy it..
 - read-only support for mailboxes where we don't have write-access? Maybe,
   but don't try to use their indexes since that's way too problematic, and
   probably even impossible since we can't lock it.
 - we should try to avoid completely rebuilding indexes unless they're
   corrupted. especially if we later want to support some read-only boxes
   and keep the mail flags only in index file. fsck() could verify that
   records are ok, and that if data file isn't ok the record is deleted.
 - if .customflags is removed and Maildir files have custom flags, add
   "unknown1" "unknown2" etc. flags to .customflags file for each found flag
 - debug: index could be read-only mmaped when it's not locked. 
 - if message text is modified (or indexes are corrupted), this may happen:
    Panic: file imap-bodystructure.c: line 179 (part_parse_headers):
           assertion failed: (part->physical_pos >= inbuf->offset)

lib-storage:
 - support multiple mailbox formats and locations for one user. that would
   require support for multiple MailStorages, and since we're chroot()ed,
   usually the only way to communicate with others would be to create
   RemoteMailStorage which would use TCP/UNIX sockets to connect to another 
   imap session.
 - DELETE/RENAME: when someone else had the mailbox open, we should
   disconnect it (when stat() fails with ENOENT while syncing)
 - optimize SEARCH [UN]SEEN, [UN]DELETED and [UN]RECENT. They're able to
   skip lots of messages based on the index header data.
 - use a trie index for fast text searching, like cyrus squat?
 - BUG: hardlink-COPY doesn't work right:
     - it should generate new filename for destination folder, so copying
       same message twice won't break it
     - custom flags aren't copied
 - maildir: atomic COPY could be done by setting a "temporary" flag into the
   file's name. once copying is done, set an ignore-temporary field into
   index's header. at next sync the temporary flag will be removed.
 - we should probably do some light checking that appended mails actually
   look like valid rfc822 mails..
 - SEARCH CHARSET support, iconv()? also means we need to parse the charset
   stuff in headers.
 - SEARCH could optionally support scanning inside file attachments and use
   plugins to extract text out of them (word, excel, pdf, etc. etc.)
 - RENAME INBOX isn't atomic with Maildir. And in general, RENAME can't
   move mails between different storages. Maybe support doing also using
   COPY + delete once COPY is atomic?
 - "UID FETCH|SEARCH|STORE *" doesn't work if latest message was deleted.
 - maybe limit the length of custom flags? we don't really have a problem
   with them, but with mbox a long X-IMAPbase could break something.. Maybe
   configurable, default to 50 chars?
 - "APPEND invalid data {5}" - says "+ OK" and after that says it's invalid.
   that "+ OK" shouldn't be sent by imap-parser if LITERAL_SIZE is used..
 - SEARCH should use imap-msgcache, especially for size checking

general:
 - capabilities:
     - acl (rfc2086)
     - quota (rfc2087)
     - namespace (rfc2342), id (rfc2971), mailbox-referrals (rfc2193),
       literal+ (rfc2088), idle (rfc2177), uidplus (rfc2359)
     - drafts: listext, children, unselect, multiappend, annotatemore
         - sort, thread: are these really useful for clients? do any actually
	   use them? i'd think most clients want to know all the messages
	   anyway and can do the sorting/threading themselves.
         - http://www.imc.org/ids.html
 - sieve? (rfc-3028)
 - rfc-2231 continuation support

 - go through .temp files and delete them
 - Content-Language isn't parsed correctly
 - ulimit / setrlimit() should be set somewhere for imap process
 - create indexer binary
 - SIGHUPing master should reload the configuration .. killing imap-auth and
   imap-login processes? or just signal imap-login to stop accepting new
   connections and let it kill itself
 - users should always be able to delete mail from mailbox, even if their
   quota is completely full. this would require us to create the indexes
   elsewhere .. in-memory should work fine?
 - if index was rebuilt (because corruption was noticed), the user should be
   disconnected because everything might have changed (unless it's noticed
   while just opening the indexes).
 - settings for specifying what sort of data to cache by default
   (index->cache_fields)
 - setting for choosing mbox locking methods
 - maybe a bit more verbose warnings for some errors, like "invalid date:
   <date that was tried>". easier than sniffing the traffic.
 - imap-login writes UTC timestamps to log file .. why is that?
 - imap-login leaks I/O descriptors when killed (ssl_input + plain_input)
 - logins are always sent now using syslog(), we'd need to have i_info()
   or something so they could also be written to log files.. also make it
   possible to log into different log than errors.
 - should we bother checking if there's invalid 8bit headers in
   BODY/BODYSTRUCTURE output and converting them to quoted printable?
 - virtual mail which shows up every time we're out of disk space. but how?..
 - update docs/index.txt

auth / login:
 - SRP authentication support?
 - PAM: support some options so /etc/passwd-lookup isn't needed. uid=x, gid=y,
   mailroot=/var/mail. maildirs should be then created when needed
 - vpopmail support
 - Digest-MD5: support integrity protection, and maybe crypting. Do it
   through imap-login like SSL is done?
 - imap-auth should limit how fast authentication requests are allowed from
   login processes. especially if there's one login/connection the speed
   should be something like once/sec. also limit how fast to accept new
   connections.
 - HIGH: support executing each login in it's own process, so if an exploit
   is ever found from it, the attacker can't see other users' passwords.
    - master should limit number of login processes to max_logging_users,
      killing old processes when limit is reached
    - master should try to keep login_processes_count extra processes all
      the time
    - login should notify master after it accept()s, and it must close the
      listening socket immediately

cleanups / checks:
 - grep for FIXME
 - check if t_push()/t_pop() should be added somewhere
 - IOBuffer should probably be split into IBuffer and OBuffer, and maybe
   making it's internals hidden .. or at least only partly visible.
 - io_buffer_fd_ref() .. unref() and destroy() would close if refcount = 0?
   annoying those close(inbuf->fd)s with open_mail()..
 - allocating readwrite pools now just uses system_pool .. so pool_unref()
   can't free memory used by it .. what to do about it? at least count the
   malloc/free calls and complain if at the exit they don't match
 - ..wonder what it would look like if I did s/FooBarBaz/struct foo_bar_baz/..
 - HIGH: Make sure messages of size INT_MAX..UINT_MAX (and more) work
   correctly. virtual_size can also overflow making it less than physical_size
 - verify memory alignment is valid when reading from index files
 - create env_put() and env_clean()
 - nearest_power() could be problematic with things that want it for ints,
   not size_t..

optional optimizations:
 - provide some helper binary to save new mail into mailboxes with CR+LF
   line breaks?
 - disk I/O is the biggest problem, so split the mail into multiple computers
   based on user and have a proxy in the front redirecting the connection.
   cyrus had something like this except a lot more complicated - it tried
   to fix the problem of having shared mailboxes. we have the same problem
   with local shared mailboxes as we don't use same UID for everyone's mail
   and we may be chrooted, so locally we could communicate with UNIX sockets,
   remotely that could be done with TCP sockets.