Version Diagrams
Gray means version is read only and blue means version is read-write.
Figure 1. Full Persistence
Full persistence
In fully persistent model, both updates and queries are allowed on any version of the data structure.The construction for partial persistence can be expanded to implement full persistence. This result is also due to [5]. We again assume a pointer machine with p < O(1) incoming pointers per node.
We need to take care of a two new problems:
- Now we need to represent versions in a way that lets us efficiently check which precedes which. Versions form a tree, but traversing it is O(# of versions).
- Now we have to support writes to any version. This affects how we handle writes: we no longer have the concept of ‘current’ vs. ‘old’ nodes: every node needs to support writes.
Version representation
The version representation problem is solved by keeping a tree
representing the version structure, as well as an efficient
representation of this tree in a linearized way.
The linearized representation of the tree in the following figure
is ba, bb, bc, ec, eb, bd, ed, ea. You can read ba as ‘begin node
a’ and ea as ‘end node a’. This representation losslessly encodes
the tree, and we can directly answer queries about the tree using that
encoded representation. For example we know c is nested within b,
since bb < bc and ec < eb.
Figure 2. in-order tree traversal
This linearized representation can be implemented using an ‘order maintenance’ data structure. For now, it suffices to say an order maintenance data structure supports the following two operations, both in O(1) time.
-
insert an item before or after a specified element.
-
check if item s precedes item t.
For example, a linked list supports insertions in O(1), but tests for
precedence take O(n). Similarly, a balanced BST supports both
operations but in O(log n) time. Deitz and Sleator show an O(1)
implementation for both operations in @dietz, which will be covered in
lecture 8.
To implement version tree queries such as ‘is version v an ancestor of
version w’ we can use two comparison queries bv < bw and ew < ev
in O(1). To implement updates like ‘add version v as a child of
version w’ we can insert the two elements bv and ev after bw and
before ew respectively, also in O(1).
Construction and algorithm:
The nodes in our data structure will keep the same kinds of additional
data per node as they did in the partially persistent case.
For each node we store d data entries and p back pointers, but now
allow up to 2(d+p+1) modifications. The amount of data d is also a
bound on out-pointers per node. Additionally we now also
version back-pointers.
-
read(n.field, version): By using the order-maintenance data structure we can pick the most recent ancestor of version from among the entries in the mod log and return that value.
-
write(n.field, value, version): If there is space in node, just add mod entry. else:
- m = new Node(). Split the contents of node n’s mod log into two parts following the diagram figure 3. Partitioning into subtrees rather than arbitrarily is required.
- Now node m has some of the mods of the internal tree in Figure 3, and node n retains the ‘older’ half of the updates.
- from the ‘old’ mod entries in node n, compute the latest values of each field and write them into the data and back pointer section of node m.
- recursively update all (up to) d + p + (d + p + 1) forward and backward pointers of neighbors.
- insert new version to our version tree representation.
Figure 3. Splitting a tree-shaped version genealogy into two subtrees
Analysis:
-
Space – 1 if we do not split or d + p + 2(d + p + 1) = 3d + 3p + 2 when we split a node, both O(1)
-
Time – read(var, version) is implemented in a similar way to the partial case. We use our auxiliary version tree data structure to find the largest ancestor of version from among a list of O(1) elements in O(1).
Like with the partial case, writes are cheap when a node has space in its mods log and more expensive when nodes are full.
Consider ϕ =-c(# empty slots), then when we split Δϕ=-2c(d+p+1) and when we do not Δϕ=c. Hence, for the worst possible choice of x from the neighbors. When we unfold the recursion once, we find the constants cancel out: c - 2c(d+p+1) + (2p + 2p + 1)c = 0.
OPEN: De-amortization of full persistence.
OPEN: Is there a matching lower bound for both full and
partial persistence?
References
[1] Gerth Stolting Brodal: Partially Persistent Data Structures of Bounded
Degree with Constant Update Time. Nord. J. Comput. 3(3): 238-255 (1996)
[2] Erik D. Demaine, John Iacono, Stefan Langerman: Retroactive data
structures. SODA 2004: 281-290
[3] Erik D. Demaine, Stefan Langerman, and Eric Price: Confluently
Persistent Tries for Efficient Version Control. Algorithmica (2008).
[4] Paul F. Dietz, Daniel Dominic Sleator: Two Algorithms for Maintaining
Order in a List STOC 1987: 365-372
[5] Paul F. Dietz: Fully Persistent Arrays (Extended Array). WADS 1989:
67-74
[6] James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre
Tarjan: Making Data Structures Persistent. J. Comput. Syst. Sci.
38(1): 86-124 (1989)
[7] Yoav Giora, Haim Kaplan: Optimal dynamic vertical ray shooting in
rectilinear planar subdivisions ACM Transactions on Algorithms 5(3)
(2009)
[8] Haim Kaplan, Chris Okasaki, Robert Endre Tarjan: Simple Confluently
Persistent Catenable Lists. SIAM J. Comput. 30(3): 965-977 (2000)
[9] Amos Fiat, Haim Kaplan: Making data structures confluently persistent.
J. Algorithms 48(1): 16-58 (2003)
[10] Chris Okasaki: Purely Functional Data Structures. New York: Cambridge
University Press, 2003.
[11] Sebastien Collette, John Iacono, and Stefan Langerman. Confluent
Persistence Revisited. In Symposium on Discrete Algorithms (SODA),
pages 593-601, 2012.
[12] T. H. Cormen and C. E. Leiserson and R. L. Rivest and C. Stein,
Introduction to Algorithms. 3rd. Edition. The MIT Press, 2009.
[13] Nicholas Pippenger. Pure Versus Impure Lisp. ACM Transactions on
Programming Languages and Systems, Vol. 19, No. 2. (March 1997), pp.
223-238.
[14] Gerth Stolting Brodal, Christos Makris, Kostas Tsichlas: Purely
Functional Worst Case Constant Time Catenable Sorted Lists. ESA 2006:
172-183