|
demos are
explained here;
a menu at top column right indexes actual topic demos.
Here we demo arc.
problem
As described in the graphs subsection of the node demo, Wil sees most computing system representions as graphs in both code and data. Nodes are typically somewhat over-represented; structs and objects for state are typically favored in code description to the exclusion of arcs: the way you refer to something else. But careful arc design is a good way to solve problems too. Tuning graphs can target either arcs or nodes — sometimes both. This page uses the term arc to mean the same thing as arrow or dart or directed edge in an abstract pointing role in a graph. You should think "arrow" but focus on graph edges. Concrete arcs can be pointers, absolute offsets, relative offsets, map keys, and/or combinations of these in series. A chain of arrows acts like one arrow with extra internal detail exposed. Wil rarely bothers discussing this perspective, because arc design folds seamlessly into effective coding. However, this also means an arc perspective is usually buried, and Wil rarely tells you the other options that would have worked too. So when Wil shows a very specific way to code blobs or btrees, you might have trouble seeing the other ways to code them using different arc formats to point between nodes. To help you with this, Zé offers material below on arc ideas, and later nags Wil to present arc agnostic designs in future blob and tree demos, along with Wil's concrete code. chunking « "What's the point of all this attention to arcs?" Wil asked. "How will it help anyone?" "Chunking," Zé replied. "Complex designs often have many details. They're easier to understand when you group parts into the right chunks. And some chunks are arcs." "Pretty abstract," Wil observed. "I think I'll skip this one. Maybe Stu wants to play straight man today. Let me know when you think it affects code I write, okay?" "Chicken," Zé called to Wil's departing back. "You're afraid you'll bore readers who just want to see final code. What about you, Stu? Game for some graph oriented ideas?" "Not really, no," Stu admitted while sliding into a chair. "But Wil said you might have some pictures. I'm willing to listen if I see figures explaining context for Wil's blob and tree demos. He says he's only going use text figures." "Pictures," Zé drummed his fingers and sighed. "You just want pretty pictures? You're supposed to imagine those like Wil does. Do I really have to draw them?" "Yeah, or I'm gone," Stu shrugged. "Sorry, dude." "You're on my list," Zé warned. "Photoshop time."
figures
"Let's start with diagrams illustrating Wil's btrees in the blob demo," Zé said. "My goal here is to get you to see how pictures relate to text representations Wil uses." "Cool," Stu perked up. "Even Wil's rare ascii diagrams are hard to grasp. Can you do better?" "Sure," Zé said. "How about this: I'll re-render one ascii diagram as a nice Photoshop image, then explain." +---+---+---+---+
|tag|r=0|r=0|r=0|
11=|h=1| 6 | 5 | 7 |
|u=3|#01|#02|#03|
+---+---+---+---+
/ | \
/ | \
/ | \
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
01=|H o w T o - -| 02=| C o o k - - -| 03=| H u m a n s -|
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
"Okay, yeah," Stu agreed. "I'd like that one explained. Say, is that a line from an old The Simpsons episode?" "Yes, How To Cook Humans was the title of a book," Zé confirmed. "Lisa found it on this alien spacecraft and accused them of planning to eat the family." Stu laughed. "I remember," he said. "They kept blowing dust off the cover so more words appear, changing the meaning. First For, then Forty, then For Forty." "That's right," Zé smiled. "Part of this demo shows the end result as an edit of the original tree shown above. Before I explain, here's a better image."
"That's a lot better," Stu approved. "I take it the top center node indexes bytes in three leaves?" "Yes, and How To Cook Humans is the eighteen byte string described by this btree," Zé confirmed. "It's broken into six, five, and seven byte pieces because this example assumes eight bytes is the maximum leaf size." "Why only eight bytes per leaf?" Stu asked. "Isn't that arbitrary? Is it to make tiny examples?" "Yes, of course," Zé nodded. "Real btrees might have much larger leaf nodes: say 512 or 4K bytes in size. But Wil said he wants eight byte capacity leaves in examples." "Why?" Stu wondered. "Four byte leaves would make even smaller sample tree figures." "Yes, but Wil said his btree algorithms for blobs need at least six elements in leaf nodes to show all cases," Zé explained. "That's so a half full leaf node can still have content members in spots corresponding to left, middle, and right in some kind of topological analysis." "Then why not six byte leaves?" Stu asked. "That would work, right?" "Yes, but a power of two leaf size is more realistic," Zé shrugged. "Don't ask me. Complain to Wil. I'm just the illustrator. See any arcs in there?" "I wondered when we'd get to arcs," Stu confessed. "I see three arrows. Are those arcs?" "Yes," Zé awarded a glowing smile. "Except each arrow is a figurative rendering of the three entries in the inner node used as the tree's root — each arc is a column." "I see four columns in the root node," Stu. "I guess the first is some header? Are the last three arcs? Are you saying each arc is actually a triple of three numbers?" "In this case, yes," Zé asserted. "The number of attributes in any given arc depends on what you need. Here we have three attributes: a readonly bit, a size in bytes, and a pointer to a child node. You can said name or ID instead of pointer here, since it might not be an address." "I'm trying to digest this," Stu admitted. "Um, what are examples of other attributes an arc might have? Is three the most you'd ever need for an arc?" "Oh lord no," Zé laughed. "You could have lots more attributes in one arc. For example, you can copy a generation number from a referenced node — you can see an example of this in the node demo, where each handle copies the node's gen. A handle is an arc to a node." pointer encoding « "Sounds like it would help catch graph corruption," Stu approved. "What else? Let me guess: an attribute that encodes what the pointer means?" "Good guess," Zé beamed. "Maybe you support more than one kind of pointer to a child node, so you'd need to know how a child location was specified. Say, pointer vs offset." "I was thinking of pointer swizzling at wikipedia," Stu admitted. "It's an old school technique to replace position independent refs with direct pointers." "That's a great example," Zé enthused. "As you move a disk-based btree in and of memory, you might want to swizzle the child node pointer. If done lazily, you'd need to know what format the pointer was in at any given time." "Did we just wander off in the weeds?" Stu asked. "Sounds like we're in Wil's coding territory now. What format is used by the child pointer in the diagram?" "Actually, you hit a main point of this demo: basic diagrams ignore child pointer encoding," Zé corrected. "There's no one correct way to do it." shared structure « "Please explain the encoding of child node pointers in the figure," Stu requested politely. "Is it pseudo Lisp?" "Kinda," Zé shrugged. "I can write Lisp syntax for the last figure if I use shared structure notation to name a value with an integer as shorthand. See writer for more on Lisp shared structure." "Can you show the figure one more time?" Stu requested. "So I can compare image to text notation? Thanks."
"Okay," Zé continued, "Let's use ordinary quoted string notation for the three leaf nodes." (#01="How To--" #02=" Cook---" #03=" Humans-")
"Uh," Stu winced. "That looks disturbing. I see a list of three strings. You labeled each one with an integer? So a Lisp reader replaces #01# later with "How To--"?" "That's exactly right," Zé agreed. "But I'm taking liberties with the scope of context for which the integers remain assigned to the physical addresses of objects." "Before we go on," Stu interrupted. "What's up with those "-" minus characters at the end of each leaf?" "Those pad each leaf string to eight bytes," Zé explained. "Dummy characters. You wouldn't need to do that in Lisp, but I'm simulating the figure exactly." "The figure shows diamands in, uh, unused leaf slots?" Stu guessed. "To show we don't care what's in those slots?" "Right," Zé nodded. "And in Lisp I'm using a minus — same thing. It doesn't matter what byte is used in extra slots when a parent node ignores bytes after any counted." graph as lisp « "Okay, now how would you write the entire graph in Lisp?" Stu asked. "Assume leaves as just defined." "Sure, here's one way," Zé said. "It's very short." #11=[tag 1 3 (0 6 #01#) (0 5 #02#) (0 7 #03#)]
"Wow, that is short," Stu marveled. "I see you use brackets as alternatives for parentheses. If Wil claimed that was the same thing as that image, I would balk." "But me you trust for some reason," Zé mused. "Note I had to pick some physical representation of the root node. I chose a list with metainfo, followed by triples for each arc." "But anything else would have worked?" Stu asked. "I could replace any part with arrays and it's just the same? How do I know that's really a btree for a scatter/gather string?" "Yeah, pick any consistent format you like," Zé agreed. "You know it's a blob btree because you store this value somewhere you expect this format." "What's an example?" Stu asked. "I realize you're now inventing context for the image. But how about a hint?" "You would define a blob object," Zé replied. "Or have yet another list that happens to reference that root." #91=(blob (refs 0) (len 18) (root #11#))
"I'm not getting this," Stu warned. "That looks like a list that says 'I'm a blob' followed by three attributes expressed as lists: a refcount, a length, and the root pointer." "No, you understood perfectly," Zé countered. "That's exactly what I had in mind. Okay, the real answer to your question is this: store the root pointer wherever you need to put the value of a blob. But you must know the format of your root before you go use it." "What are the refcount and total content length for?" asked Stu. "Are those just plausible details? And why didn't you write all the information like this." (define my-blob
'(blob (refs 0) (len 18)
(root
[tag 1 3
(0 6 "How To--") (0 5 " Cook---") (0 7 " Humans-")]
)
)
)
"That's an equivalent way to define a blob with the same root," Zé observed. "But when you don't label leaf strings with integer names, you can't share with another blob. And you'd also lose explicit arc name visibility for the figure." "Oooh," Stu considered. "You want the same leaf strings to appear in other blobs? So graphs overlap?" "Yes, which I'll show in the right column later," nodded Zé. "And you're right: the refcount and total content size are just plausible metadata you might want to keep in blobs." "What's a refcount for?" asked Stu. "And what if total content size doesn't match sum of arc sizes in a root?" "The refcount can be used to support friendly immutable semantics when blobs share structure," Zé explained. "If a new blob versions an old one, it makes sense for an original to become immutable by convention when refs from other blobs are nonzero. Shared content should not change." "Okay," Stu held up a hand. "A bookkeeping gimmick. I get it. And what do you do when content size is wrong?" "No one forces you to store a redundant copy of content size," Zé observed. "But you might want to have a copy in hand when the root is on disk. Also, if the size was ever observably wrong, you'd have feedback on correctness." "Oh," Stu considered. "You're saying if I don't store a copy of size, it's hard to notice if a root is garbled." "That's part of it," Zé nodded. "But let's get this demo back on track. What does this have to do with arcs? You're jumping ahead to details that belong in the blob demo." "Can we start again at the top of the right column," Stu pleaded. "Can you show me C or C++ formats for the figure's image? Then I can see the arcs." "We can do C++ if you like," Zé shrugged. "But then you'll be tempted to assume arcs can only look exactly like the first one I show you. I also want you to remember arcs look like anything you want, as long as you can reach referenced child nodes with any information you need to use them." (continued top of column right)
layers
«
"Hey, cool," Stu chimed. "This looks like a perspective view of the last figure at bottom column right."
"Yes, it is," Zé confirmed. "I separated the old immutable tree and new mutable tree into two layers: blue and red. The higher level red layer has pointers into the blue layer where they share structure: nodes #01# and #03#. I didn't show pointers from red layer to blue layer because it cluttered the image." "I wish that figure also included the two blobs pointing at their respective roots," Stu whined. "Hey, I just realized you have a problem with your original blob representation." "What's that?" Zé asked. "Arcs between blobs?" "Yes!" Stu almost shouted. "For a blob refcount to later drop when another blob stops using it, the derived blob should have a pointer, or arc, to the base blob. Am I right?" "Oops," Zé grinned. "I guess the example was shy some detail. Okay, let me take another shot at both blobs." #91=(blob (base '()) (height 1) (1 18 #11#))
#92=(blob (base #91#) (height 2) (0 28 #21#))
"Hmm, yeah," Stu appreciated. "I see you added the height of the root. Say, wouldn't that let you put a leaf node in place as the root? If one leaf was enough for the entire thing?" "Yes," Zé agreed. "Now that would work. Once height indicates format of a blob's root, the root can be a leaf." "And now that blob #91# is shown as base for #92#, when #92# goes away we can decrement #91#'s refcount," Stu guessed. "Which ... now appears at the first number in the last triple?" "That's what I had in mind," Zé confirmed. "I can see wheels turning in your mind. Getting some ideas?" "Just worrying about memory management," Stu said. "That very simple refcounting assumes you can blithely keep all of a base btree alive, even if no one else is sharing much of it." Zé scratched an ear. "It's not an arc topic," he said. "Not directly. But I see what you mean about granularity issues in keeping all of base trees alive when someone refers to just one part of it." "What's the answer?" Stu asked. "Pretend working with arcs correctly also means you must avoid gross errors in fragmentation caused by sharing. Any advice?" "It's a longevity and planning problem," Zé shrugged. "First, perhaps versioned objects simply don't live very long, so any waste matters little because it's a transient blip." "I see," Stu mused. "If I handle a server request in passing, which temporarily edits a large object, my main concern is not copying the entire thing just to effect a small change. Since old and new versions are both released right away, it doesn't matter if the new edited version keeps the old one alive slightly longer, in its entirety." "Exactly," Zé nodded. "Think of cost of waste as a volume multiplied by time problem. Smaller times forgive larger volumes. But if you save drafts of docs in a database a long time, you might care whether a tree's base is hardly shared by new versions." Stu nodded slowly. "Okay," he said. "If I'm designing a long term database, I should work out the cost consequences of design choices myself. I get it: my needs drive my blob designs."
license
«
All this code is available only under the BriarPig mu-babel license described fully on the rights page. You do not have permission to reprint this page in any way. Neither feeds nor repackaging is allowed. You can link this page if you want folks to read it. |
A submenu for demos appears below, letting you
go to the page on a topic written as a demo (as the
demos page defines it).
menu
thorn: todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc « Þ, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype (mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile) Some demos are stubs: todo is a demo guide. See toy for mu updates on language pages; names introduces naming schemes.
tuple
«
"Each arc is actually a tuple — a collection of values — and not just one value," Zé claimed. "If you think an arc is just a pointer, you were likely thinking of just one attribute." "Excuse me," Stu interrupted. "We already discussed that in the middle of the left column. Let's segue into an example of arcs in C or C++, where an arc has multiple attributes." "You're kinda pushy today," Zé noted. "What's up?" "The vagueness of this demo makes me uncomfortable," Stu admitted. "The example Lisp syntax gave me the jitters. I need C to calm me down. I won't get dependent, I swear." Zé shot his cuffs and lifted a warning eyebrow. "Anytime you want to take over writing this demo, just let me know."
structs
«
"Okay, so next let's look at some plausible looking C++ objects for the figure shown column left," Zé introduced. struct leaf_node8 { // leaf holding at most 8 bytes «
enum Ncapacity { e_nmax = 8 }; // eight bytes max
u8 n_u8[e_nmax]; // content in leaf
};
"Alright, if we need a leaf with at most eight byte capacity, I see leaf_node8 works fine," Stu studied. "I guess the name means: leaf node of eight bytes? Wait, I see you don't care." "Obsess about names much?" Zé asked. "Let's do an arc format next. This one is about as small as you can get for an in-memory btree arc. If you use actual pointers, that is." struct inner_arc { «
u32 a_readonly : 1; // whether child is immutable
u32 a_size : 31; // count bytes reachable via child
void* a_child; // either inner_node3 or leaf_node8
};
"Say, you never told me what bit a_readonly means," Stu complained. "Is it for copy-on-write support?" "Very good," Zé applauded. "It's for copy-on-write. When you make new blob B that versions an original blob A, the root of new blob B makes a copy of A's root, setting the readonly bit to one for each arc, so none of A's original children can be modified." "Hmm," Stu rubbed his chin and stared upward, thinking. "Then each time you want to update a readonly node of any sort, you just make a new copy. Newly copied inner nodes then get each readonly bit set to one. How am I doing?" "Perfect," Zé said. "That's how you do it. The only catch is subtle: any node is readonly if you passed through an arc with the readonly bit set on the way down the tree." "Why is that bit part of the arc?" Stu asked. "Thank you for asking an arc question," Zé praised. "Because that bit explains how the arc is used. It says: pass through this arc and everything you see is immutable." size « "Cool," Stu admired. "And what does a_size mean exactly? Does it also modify an arc's meaning?" "Field a_size sums all used bytes in leaf nodes reached by that arc," Zé defined. "So it tells you how many bytes of content are found under an arc. When pointing to a leaf, a_size says how many leaf bytes are used." "Why is a_size only 31 bits?" Stu asked. "Only two gigs?" Zé rolled his eyes. "I have no problem with you changing the arc format so a_size has more bits. Go as big as you like. But bigger means fewer can be put in a node of given size." "I guess a_child points to a leaf or inner node," Stu said. "But how do you know which one? Shouldn't that be another attribute of the arc? Saying to what a_child points?" height « "That's a really good question," Zé said. "The answer appears in the format of the inner node below." struct inner_node3 { «
u32 n_tag; // designating inner_node3 type
u8 n_height; // height of this node above leaves
u8 n_gen; // pseudo rand gen (arc must agree)
u16 n_used; // count of n_arcs[] slots now used
enum Ncapacity { e_nmax = 3 }; // max of 3 arcs
inner_arc n_arcs[e_nmax]; // name and describe children
};
"Okay, the answer to your question is n_height," Zé said. "It's the number of levels this node is above leaf height; leaves are defined to have zero height." "If n_height explains what each arc means, why is n_height not an attribute in the arcs we're using?" Stu asked. "What does that look in your eye mean? You have a trick up your sleeve." "I never said every attribute of an arc was stored in one place along with all other attributes of an arc," Zé hinted. "Oh," Stu rolled his eyes. "You mean n_height is actually an attribute shared by all the arcs in that node?" "Yeah, basically," Zé nodded. "You could store a separate height value in each arc, but since it would be the same in every arc in one node, that would be inefficient." Stu grimaced. "Saying n_height is an attribute of every arc in the node sounds academic and arbitrary," he complained. "Why not just say it's a node attribute and be done?" "Sure, you could," Zé shrugged. "But it really does change the interpretation of each arc. And it's not completely academic, because sometimes you have good reason to physically separate attributes you consider part of one arc. This is an example." gen « "You added a n_gen number to the inner node format," Stu observed. "For it to be useful, should we add a generation number to each arc? So arcs copy the n_gen of a child node?" "Yes, that would be a good idea," Zé agreed. "But I didn't add such a gen to each arc, illustrating another idea: you needn't code every notion that occurs to you the first time. I had a free byte in the node format, so I assigned it for future use." tag « "What does field n_tag mean?" Stu asked. "Is it a magic value saying 'I'm an inner node' for dynamic typing?" "Yes, and I didn't specify the magic constant because I don't care," Zé confirmed. "In some systems you want nodes to be self-identifying, in case you iterate over all nodes in a physical medium to study — say for damage repair." "Hmm," Stu considered. "But leaf nodes don't have a tag, so they're not self identifying?" "Depends," Zé replied. "Here leaf nodes have no tag, but an in-memory graph might actually have a tag associated with each memory box. Disk-based graphs are more problematic: it might not be feasible for content nodes to have tags. You might use btrees to index content in whatever original format something used." used « "You're giving me a headache," Stu pretended. "What about the n_used field — is that the number of used arcs in the n_arcs vector? And why do inner nodes have exactly three arcs?" "That's right, its the number of used arcs," Zé nodded. "This example uses three arcs per inner node for simpler examples: more arcs per node would make it harder to construct trees needing nodes at greater height, like one shown next section." "Is three the minimum number of arcs," Stu asked. "Um, yes," Zé confirmed after a pause to think. "If you aim to keep each node at least half full, you want each partially full node to have at least two children. Tree roots are the only exception: they can have any number of children, but only one would be degenerate since a one child root could be popped." "Whoa," Stu held up a hand. "I'll read about btree rules in the tree demo or blob demo. What would happen if a non-root node could have only one child?" "Then you'd be able to construct a degenerate linked list of nodes," Zé warned. "With linear instead of log time performance. You need at least two children per node to get a binary tree. A higher minimum number of children is better, though." capacity « "You mean," Stu suggested, "if I use something like 4K pages for inner nodes, I can pack them full of many arcs?" "Yeah," Zé affirmed. "A lot of arcs per node is good. For example, what if a half full node contained at least 512 arcs. How many leaves would be in a tree of height three?" "Let's see," Stu calculated, looking upward. "The root would have at least 512 children at height two. Each of those would have at least 512 children at height one. Then at least 512 leaves for each of those at height zero. So that's 512, cubed, minimum. That's 2^27 leaves because 512 is 2^9. If each leaf contained 4K bytes, say, the tree would cover 2^39 bytes at least. Half a terabyte?" "Yes, at minimum node occupancy," Zé agreed. "Typically occupancy would be more, of course. So in practice, with a btree of large enough fanout, you'd typically never represent a stream using more than three layers of inner nodes." "In this case," Stu guessed, "log time actually sounds like constant time, when a tree is so broad and flat. I guess the root would normally always be in memory. So you'd only navigate a few steps to reach a leaf. Because of geometric arcs." Zé nodded absent mindedly. "Did you notice what percentage of total space is consumed by inner nodes?" he asked. "Well, there's approximately 512 leaf nodes minimum per every inner node," Stu said. "I guess inner nodes are much less than 1% of space when leaf and inner nodes are those sizes." "In which case," Zé remarked, "you'd never have a reason to optimize for size because it wouldn't matter: half full inner nodes is fine. Why keep inner nodes two thirds full?" "But it would be worthwhile to keep leaf nodes at least two thirds full," Stu said. "Otherwise space might be wasted by low leaf node occupancy. Are we still talking about arcs?"
copy-on-write
«
"Sure, we're still talking about arcs," Zé replied. "But let's look at copy-on-write now. Did you notice my inner node format avoided sideways pointers to sibling nodes?" "Uh, yeah," Stu squinted in thought. "But so what? Would they interfere with copy-on-write? And why use sibling pointers anyway, with no reason to load balance siblings?" "You say that now," Zé chuckled. "You might have wanted to load balance inner node siblings to make them two thirds full, until you noticed inner nodes use negligible total space." "Well, yeah," Stu shuffled his feet. "Except in your example graph with a fanout of only three arcs per inner node." "Yes," Zé dismissed with a wave. "As you suggested, yes, inner node sibling pointers would indeed interfere with copy-on-write, because then a graph would be too strongly connected. If you tried to replace one immutable inner node at some height in a tree, what would happen? If siblings at a height form a linked list?" "Oh!" Stu slapped his forehead. "You'd replace every inner node at one height if they all pointed to one another. Yeah, I see why you don't want siblings to know about each other." "Exactly," Zé said. "The general rule is simple: only parent nodes know about child nodes. For copy-on-write you want the complete absence of arc path cycles. Any immutable parent node you want to modify, you simply replace with a new copy." "But replacing any parent node implies the grandparent node must also be modified," Stu noted. "Otherwise the grandparent would not point to a new replacement." "That's right," Zé agreed. "As a consequence, any change in a fully immutable tree replaces every node on the path from the root to the lowest modified node. In analogy to 'const propagation' in C++ you might call this 'mutable arc' propagation in trees." "I feel an example coming," Stu said in a hushed whisper. "Yes, let's go back to the first tree and make a new copy," Zé suggested. "With a mutable root, written like this:" #11=[tag 1 3 (0 6 #01#) (0 5 #02#) (0 7 #03#)]
#12=[tag 1 3 (1 6 #01#) (1 5 #02#) (1 7 #03#)]
"The new #12# root looks identical to old #11#, except the readonly bits are now one," Stu noticed. "Yes," Zé said. "And then let's insert " For" at the end of " Cook" in immutable node #02#. First we copy old immutable #02# to new mutable #04#, then perform the insertion." #12=[tag 1 3 (1 6 #01#) (0 5 #04#) (1 7 #03#)]
"It's too big to fit inside one eight byte node," Stu objected. "Does that mean #04# splits in two? With overflow in new sibling #05#? Why not load balance with left sibling #01#?" "Oh, you're right," Zé considered. "So to match the example I want to show, we might need to insert all the final " For Forty" ten bytes, which can't fit the overflow into #01#." "Okay," Stu calculated. "Then say we get " Cook Fo" in #04# and "r Forty-" in #05#. But root #12# is already full — how do we insert new #05#? Does root #12# overflow?" "Yes, inserting (0 7 #05#) inside #12# makes it split in two, load sharing with new #13#. Which means #12# can no longer be the root — so we add a new root #21# with arcs to #12# and #13#, as shown in the following ascii figure." +---+---+---+---+
|tag|r=0|r=0| - |
21=|h=2| 14| 14| - |
|u=2|#12|#13| - |
+---+---+---+---+
/ \
/ \
+---+---+---+---+ +---+---+---+---+
|tag|r=1|r=0| - | |tag|r=0|r=1| - |
12=|h=1| 6 | 8 | - | 13=|h=1| 7 | 7 | - |
|u=2|#01|#04| - | |u=2|#05|#03| - |
+---+---+---+---+ +---+---+---+---+
/ | / |
/ | / |
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ |
01=|H o w T o - -| | 05=|r F o r t y -| |
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
04=| C o o k F o| 03=| H u m a n s -|
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
"But you should show a nice rendering of that too," Stu suggested. "Be a good sport. It's in a good cause." "Okay, here it is," Zé gestured. "Note the use of red tint in new mutable nodes. The old immutable nodes shared with the original tree have a blue tint. Two immutable leaves remain."
"If you look closely, you can see this illustrates an interesting property of btrees," Zé commented. "A new root, when the root changes, always coincides to a new height for a tree." "Hey, that's right," Stu said. "Old root #12# remained at height one, while new root #21# entered at height two. Btree nodes never change height? That would interfere with copy-on-write, wouldn't it?" "Correct," Zé affirmed. "Btree nodes never change height. New trees can version old ones, and old node heights remain perfect as is. In this sense, arcs are context insensitive."
alternatives
«
"I think I'm starting to get the hang of this arc business," Stu said. "Once I start thinking in terms of arcs, I get new ideas. For example, now I think of each blob as actually holding an arc to the root." "That's the idea," Zé encouraged. "But instead of one bit for a_readonly, I think you want a count of refs from other btrees. Think of it as swapping a one bit boolean for a zero or nonzero boolean." "Uh," Stu grunted. "I guess a different arc format is needed in different places. In a blob, we'd want to fold in height as an arc attribute, and swap bit a_readonly for a count a_refs attribute." "Yes," Zé agreed. "This sort of thing leads to an issue in categories: is my arc still the 'same' after I change the attributes? How much can I change arcs and still say they're still the same arcs, just in new formats?" "Please tell me you'd don't have an example," Stu pleaded. "Sorry, if I don't finish prepping you for the blob demo, I won't be doing my job," Zé apologized. "I already showed you earlier how height could be factored out of each arc. You can also refactor arcs further in the name of efficiency. Then arcs might not make sense in isolation." "Well, that's always true," Stu observed. "No matter how you encode an arc's pointer, decorating with other attributes saying what it means, you must still interpret correctly in practice." "Sure, but now let me give you an example that might make you feel queasy — it made me queasy," Zé warned. "It's another way to write the a_size field in each arc, counting leaf bytes reachable by arc. I can make the size cumulative. Here, let's rewrite this node:" #11=[tag 1 3 (0 6 #01#) (0 5 #02#) (0 7 #03#)]
"Above you see the first version of a root node," Zé pointed. "But below it's redone so each a_size includes the size of each earlier arc in addition to bytes for each arc alone. See any pros and cons?" #11=[tag 1 3 (0 6 #01#) (0 11 #02#) (0 18 #03#)]
"Uh, except the first, none of the arcs says the actual size of its child," Stu noted. "So each child size can only be determined by subtracting the a_size field of the previous arc. The downside is no arc can be understood out of context — you need a predecessor arc to see the diff." "Yes, that's the bad part," Zé agreed. "But there are benefits. For example, a_size of the last arc in a root is the total length of the entire blob. Can you think of another benefit? What if a node had one thousand arcs: to skip past a megabyte of content, you would have to add the size of every arc when using the original format with absolute child sizes." "Oh," Stu blurted. "Since the new format reflects size of every earlier child, I can use binary search when skipping a megabyte. The new relative arc size format supports log time seeks? But the original absolute arc size format is linear? Ick, must we accept complexity in the name of speed?" Zé sighed. "In this case, yes: the second relative size format is faster for reading. When you write, inserting or removing arcs in the middle of a node would require adding or subtracting from size in every arc downstream in that node. Reading gets faster, writing gets slower." "Can you talk Wil into using the simpler absolute size format in the blob demo?" Stu begged. "Relative size stuff makes me queasy." "Told you so," Zé smirked. "I'll see what I can do." |
demos « Þ
+ todo + names + fd + iovec + assert + log + run + hex + crc + buf + in + out + quote + escape + compare + file + deck + cow + arc « Þ + blob + tree + slice + rand + time + stat + hash + heap + node + primes + page + book + pile + stack + atomic + lock + mutex + thread + map + meter + list + iter + ctype |