annotate notes.txt @ 2080:1cbb14c048cb

Reduce index memory usage by storing the bare string instead of tuples Storing the tuple returned by struct.unpack significantly increases the memory required to store the entire index in ram. This patch uses struct.unpack on demand instead.
author mason@suse.com
date Tue, 04 Apr 2006 19:00:40 -0400
parents 2073e5a71008
children 4f072bb06e89
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
1 Some notes about Mercurial's design
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
2
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
3 Revlogs:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
4
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
5 The fundamental storage type in Mercurial is a "revlog". A revlog is
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
6 the set of all revisions to a file. Each revision is either stored
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
7 compressed in its entirety or as a compressed binary delta against the
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
8 previous version. The decision of when to store a full version is made
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
9 based on how much data would be needed to reconstruct the file. This
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
10 lets us ensure that we never need to read huge amounts of data to
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
11 reconstruct a file, regardless of how many revisions of it we store.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
12
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
13 In fact, we should always be able to do it with a single read,
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
14 provided we know when and where to read. This is where the index comes
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
15 in. Each revlog has an index containing a special hash (nodeid) of the
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
16 text, hashes for its parents, and where and how much of the revlog
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
17 data we need to read to reconstruct it. Thus, with one read of the
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
18 index and one read of the data, we can reconstruct any version in time
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
19 proportional to the file size.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
20
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
21 Similarly, revlogs and their indices are append-only. This means that
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
22 adding a new version is also O(1) seeks.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
23
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
24 Generally revlogs are used to represent revisions of files, but they
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
25 also are used to represent manifests and changesets.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
26
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
27 Manifests:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
28
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
29 A manifest is simply a list of all files in a given revision of a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
30 project along with the nodeids of the corresponding file revisions. So
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
31 grabbing a given version of the project means simply looking up its
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
32 manifest and reconstruction all the file revisions pointed to by it.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
33
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
34 Changesets:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
35
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
36 A changeset is a list of all files changed in a check-in along with a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
37 change description and some metadata like user and date. It also
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
38 contains a nodeid to the relevent revision of the manifest. Changesets
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
39 and manifests are one-to-one, but contain different data for
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
40 convenience.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
41
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
42 Nodeids:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
43
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
44 Nodeids are unique ids that are used to represent the contents of a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
45 file AND its position in the project history. That is, if you change a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
46 file and then change it back, the result will have a different nodeid
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
47 because it has different history. This is accomplished by including
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
48 the parents of a given revision's nodeids with the revision's text
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
49 when calculating the hash.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
50
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
51 Graph merging:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
52
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
53 Nodeids are implemented as they are to simplify merging. Merging a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
54 pair of directed acyclic graphs (aka "the family tree" of the file
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
55 history) requires some method of determining if nodes in different
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
56 graphs correspond. Simply comparing the contents of the node (by
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
57 comparing text of given revisions or their hashes) can get confused by
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
58 identical revisions in the tree.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
59
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
60 The nodeid approach makes it trivial - the hash uniquely describes a
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
61 revision's contents and its graph position relative to the root, so
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
62 merge is simply checking whether each nodeid in graph A is in the hash
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
63 table of graph B. If not, we pull them in, adding them sequentially to
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
64 the revlog.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
65
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
66 Branching and merging:
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
67
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
68 Everything in Mercurial is potentially a branch and every user
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
69 effectively works in their own branch. When you do a checkout,
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
70 Mercurial remembers what the parent changeset was and uses it for the
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
71 next check in.
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
72
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
73 To do a merge of branches in Mercurial, you check out the heads of the
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
74 two branches into the same working directory which causes a merge to
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
75 be performed, and then check in the result once you're happy with it.
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
76 The resulting checkin will have two parents.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
77
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
78 It decides when a merge is necessary by first determining if there are
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
79 any uncommitted changes in the working directory. This effectively
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
80 makes the working directory a branch off the checked in version it's
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
81 based on. Then it also determines if the working directory is a direct
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
82 ancestor or descendent of the second version we're attempting to
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
83 checkout. If neither is true, we simply replace the working directory
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
84 version with the new version. Otherwise we perform a merge between the
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
85 two versions.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
86
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
87 Merging files and manifests:
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
88
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
89 We begin by comparing two versions manifests and deciding which files
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
90 need to be added, deleted, and merged.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
91
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
92 Then for each file, we perform a graph merge and resolve as above.
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
93 It's important to merge files using per-file DAGs rather than just
1308
2073e5a71008 Cleanup of tabs and trailing spaces.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 260
diff changeset
94 changeset level DAGs as this diagram illustrates:
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
95
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
96 M M1 M2
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
97
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
98 AB
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
99 |`-------v M2 clones M
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
100 aB AB file A is change in mainline
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
101 |`---v AB' file B is changed in M2
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
102 | aB / | M1 clones M
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
103 | ab/ | M1 changes B
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
104 | ab' | M1 merges from M2, changes to B conflict
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
105 | | A'B' M2 changes A
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
106 `---+--.|
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
107 | a'B' M2 merges from mainline, changes to A conflict
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
108 `--.|
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
109 ??? depending on which ancestor we choose, we will have
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
110 to redo A hand-merge, B hand-merge, or both
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
111 but if we look at the files independently, everything
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
112 is fine
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
113
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
114 The result is a merged version in the working directory, waiting for
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
115 check-in.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
116
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
117 Rollback:
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
118
47
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
119 When performing a commit or a merge, we order things so that the
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
120 changeset entry gets added last. We keep a transaction log of the name
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
121 of each file touched and its length prior to the transaction. On
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
122 abort, we simply truncate each file to its prior length. This is one
45cc818f2445 Rollback is implemented
mpm@selenic.com
parents: 0
diff changeset
123 of the nice properties of the append-only structure of the revlogs.
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
124 We can also reuse this journal for "undo".
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
125
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
126 Merging between repositories:
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
127
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
128 One of the key features of Mercurial is the ability to merge between
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
129 independent repositories in a decentralized fashion. Each repository
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
130 can act as a read-only server or a client. Clients operating by
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
131 pulling all branches that it hasn't seen from the server and adding
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
132 them into its graph. This is done in two steps: searching for new
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
133 "roots" and pulling a "changegroup"
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
134
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
135 Searching for new "roots" begins by finding all new heads and
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
136 searching backwards from those heads to the first unknown nodes in
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
137 their respective branches. These nodes are the 'roots' that are used
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
138 to calculate the 'changegroup': the set of all changesets starting at
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
139 those roots. Mercurial takes pains to make this search efficient in
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
140 both bandwidth and round-trips.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
141
260
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
142 Once the roots are found, the changegroup can be transferred as a
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
143 single streaming transfer. This is organized as an ordered set of
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
144 deltas for changesets, manifests, and files. Large chunks of deltas
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
145 can be directly added to the repository without unpacking so it's
d7ce76d82876 Some tweaking of notes.txt
mpm@selenic.com
parents: 59
diff changeset
146 fairly fast.