comparison doc/thread-refs.txt @ 7800:524b4acad38b HEAD

Added documentation about how thread indexes work.
author Timo Sirainen <tss@iki.fi>
date Mon, 09 Jun 2008 05:55:51 +0300
parents
children 04a3be30e5a6
comparison
equal deleted inserted replaced
7799:4aa830b4f8bc 7800:524b4acad38b
1 Optimistic incremental THREAD=REFERENCES
2
3 Step (1) is the slowest stage for building a THREAD=REFERENCES tree. If its
4 result tree is permanently saved, the following thread builds can be based
5 on it by updating the tree incrementally.
6
7 Adding new messages to the tree is simple: simply follow the normal rules
8 as when building a new tree from scratch. Expunging messages gets more
9 problematic though.
10
11 Each node in the tree keeps a "link reference count" which specifies how
12 many messages contain a "this message" -> "parent message" reference. The
13 first reference is usually added by the message's own References: or
14 In-Reply-To: header and the latter references are added by References:
15 headers. This link refcount must be updated when adding and expunging
16 messages. When the link refcount drops to zero, the message becomes a root.
17 The link refcount doesn't tell much about the number of children the node
18 has, because References: headers may reference any number of its ancestors.
19
20 The optimistic approach assumes that usually there are no problematic
21 links. In such case expunging a message simply updates the link refcounts
22 and marks the message's node as expunged. Expunged messages may still act
23 as dummy nodes. The node may be removed only after there are no nodes which
24 point to it.
25
26 Problematic links are handled by marking nodes affected by them. If such a
27 node is expunged or its link is unreferenced, the thread tree must be
28 rebuilt. This is assumed to be a rarely occurring event. The problematic
29 cases are:
30
31 1) Duplicate Message-ID: headers. If the first message having it is
32 expunged, the thread tree must be rebuilt.
33
34 2) Message-ID loops. If a message referencing the looping path gets
35 expunged, the loop may break and the thread tree must be rebuilt. Because
36 it can't be determined easily which loops get broken by which expunges,
37 this case can be handled in a bit easier way: When detecting a loop between
38 parent and child, rebuild the tree if any link between the parent and child
39 gets unreferenced.
40
41 3) A message changes its parent because an earlier message's References:
42 header contained a different link. If the message gets expunged, the thread
43 tree must be rebuilt to get the original parent back.
44
45 4) A link in a message's References: header is ignored, because the
46 referenced child already specified a different parent to itself. If the
47 message gets expunged, the thread tree must be rebuilt to determine its new
48 parent.
49
50 5) A link in a message's References: header is ignored, because an earlier
51 message's References: header already specified a different link. If the
52 earlier message gets expunged, the parent may change. The earlier message
53 could be found out quickly by keeping some extra state (or with a slow
54 scan), but since this is assumed to be a rare problem, there's an easier
55 (but less specific) way to handle this: Rebuild the tree if a link to the
56 child node is unreferenced (or alternatively if a link to the original
57 parent node is unreferenced, but it probably happens more often).
58
59 Pseudocode:
60
61 node {
62 parent: Pointer to parent node. Children pointers aren't required.
63 uid: Message's UID (0 = expunged or never even existed)
64
65 link_refcount: Number of messages referencing "this node" -> "parent node"
66 (e.g. "References: a b" would increase this message's reference as well as
67 b's reference, regardless of how the links are really added to the thread
68 tree).
69 expunge_rebuilds: If this message gets expunged, rebuild the thread tree.
70 parent_unref_rebuilds: If a link between this node and its child gets
71 unreferenced, rebuild the thread tree.
72 }
73
74 link_reference(parent_node, child_node)
75 parent_node.link_refcount++
76 if is_ancestor(child_node, parent_node)
77 // child_node is an ancestor of parent_node. Adding child_node ->
78 // parent_node would introduce a loop. If any messages referencing the
79 // path between parent_node's parent and child_node get expunged, we
80 // have to rebuild the tree because the loop might break. For example:
81 // #1: a -> b (a.ref=1, b.ref=1)
82 // #2: b -> a (a.ref=2, b.ref=2)
83 // #3: c -> a -> b (a.ref=3, b.ref=3, c.ref=1)
84 // Expunging #3 wouldn't break the loop, but expunging #1 would.
85 for node in nodes[parent_node.parent .. child_node]
86 node.parent_unref_rebuilds = true
87 else if child_node.parent == parent_node
88 // The same link already exists
89 else
90 // Set parent_node as child_node's parent
91 if child_node.parent == NIL
92 child_node.parent = parent_node
93 else
94 // Conflicting parent already exists, keep the original.
95 // We get here only when handling References: header.
96 if child_node.uid != 0
97 // We've already seen this message. It specifies its own parent and
98 // it doesn't matter what any other reference says. The only way its
99 // parent can change is if the message itself gets expunged.
100 child_node.expunge_rebuilds = true
101 else
102 // Message doesn't exist, so it was one of the node's children
103 // that created the original reference. If that reference gets
104 // dropped, the parent is changed. We could catch this in one of
105 // several ways:
106 // a) Link to original parent node gets unreferenced
107 // b) Link to this node gets unreferenced
108 // c) Any of the child nodes gets expunged
109 // b) is probably the least likely to happen, so use it
110 child_node.parent_unref_rebuilds = true
111
112 thread_add_msg(uid)
113 // get the Message-IDs as specified by the thread spec
114 (msgid, parent_msgid, references) = message_get_thread_headers(uid)
115
116 if msgid != NIL
117 if nodes[msgid].uid == 0
118 nodes[msgid].uid = uid
119 else
120 // duplicate Message-ID. if the original ever gets expunged,
121 // rebuild the thread tree
122 nodes[msgid].expunge_rebuilds = true
123 msgid = NIL
124
125 if msgid == NIL
126 msgid = get_unique_msg_id()
127
128 node = nodes[msgid]
129 if node.parent != NIL and
130 (parent_msgid == NIL or node.parent.msgid != parent_msgid)
131 // Conflicting parent, remove it. If this message gets expunged, we have
132 // to revert back to the original parent.
133 node.parent = NIL
134 node.expunge_rebuilds = true
135
136 if parent_msgid != NIL
137 link_reference(nodes[parent_msgid], node)
138
139 // go through References (skipping the last one)
140 for (parent_msgid, child_msgid) in references
141 link_reference(nodes[parent_msgid], nodes[child_msgid])
142
143 unref_link(parent, child)
144 if parent.parent_unref_rebuilds
145 return false
146
147 child.link_refcount--
148 if child.link_refcount == 0
149 child.parent = NIL
150 return true
151
152 // returns false if thread tree needs to be rebuilt
153 thread_expunge_msg(uid)
154 // get the Message-IDs as specified by the thread spec
155 (msgid, in_reply_to_msgid, references) = message_get_thread_headers(uid)
156
157 node = nodes[msgid]
158 if node.uid != uid
159 // Removing a duplicate Message-ID
160 return false
161
162 if node.expunge_rebuilds
163 return false
164
165 if parent_msgid != NIL and
166 not unref_link(nodes[parent_msgid], nodes[child_msgid])
167 return false
168
169 if references != NIL
170 // go through References
171 for (parent_msgid, child_msgid) in references
172 if not unref_link(nodes[parent_msgid], nodes[child_msgid])
173 return false
174 last_parent_msgid = references.last
175 else if in_reply_to_msgid != NIL
176 last_parent_msgid = in_reply_to_msgid
177
178 if last_parent_msgid != NIL and
179 not unref_link(nodes[last_parent_msgid], node)
180 return false
181
182 // mark this node as expunged
183 node.uid = 0
184 return true
185
186 thread_iterate()
187 root_nodes = []
188 referenced = []
189 children = [][]
190 // Steps (2) and (3)
191 for node in nodes
192 if node.parent != NIL
193 root_nodes[] = node
194 else
195 referenced[node.parent] = true
196 if node.uid != 0
197 // Find the node's first non-dummy parent and add the node as its child.
198 // If there are no non-dummy parents, add it as the highest dummy's
199 // child.
200 nondummy_parent = node.parent
201 while nondummy_parent.uid == 0 and nondummy_parent.parent != NIL
202 nondummy_parent = nondummy_parent.parent
203 children[nondummy_parent][] = node
204
205 for node in root_nodes
206 if node.uid == 0
207 if children[node] == NIL
208 // remove dummy roots that have no children.
209 delete(node)
210 else if count(children[node]) == 1
211 // dummy root has a single child, replace the root with its child
212 node = children[node]
213
214 for node in nodes
215 if node.uid == 0 and !referenced[node]
216 free(node)
217
218 // root_nodes and children now contain a tree equivalent to a tree built by
219 // THREAD=REFERENCES specification steps (1)-(3). The rest of the steps
220 // can be performed using them. Note that node.parent should not (and need
221 // not) be used because it points its parent before steps (2) and (3).