Þ   briarpig  » mu  » box


problem

     box — an object allocated in garbage collected gc space is called a box, as first mentioned on the stack demo. A box is the equivalent of struct in C and C++, and in fact a struct is used to represent each box format. Living in gc space is what makes a struct a box: it's subject to garbage collection, aka gc. This page shows early drafts of box formats still under revision.

     (These boxes are used in toy programming language lathe: a lisp dialect plus smalltalk class system, using þ library using the mu-babel license, under mu which presents broad context including a main focus on usefulness to the author.)

     Dialogs below use a new character named Zé, whose role is explained on the design page. This is done in part to help put chaoticly incomplete aspects into better focus. More boxes are shown than needed or currently used, and ones currently used are subject to change.

cast «

     "I've always been interested in gc box formats," Stu said. "So I'd like to hang out while you show parts to Zé."

     "No problemo," Wil waved. Then as Zé entered holding sodas Wil asked, "Did you get me a coke?"

     "No," Zé smiled, "just one for Stu. Bet you never offered guests a drink before, right? Shame on you."

     Wil turned to Stu and stage-whispered behind a hand, "I'm actually the nurturing one. This is just an act."

     "Your Vaudeville show must have been called 'Ripping You a New One' then," Zé suggested. "Staying in practice?"

     "Can we throttle jokes a little?" Wil asked nicely.

     Zé shrugged: all the same to him. "Are these gc boxes like the ones you showed me — when was it? 1995?" he asked.

     "Similar, but completely different," Wil semi-joked. "A little like the ML inspired boxes, but not at all like the XLisp style nodes in my Ygg interpreter."

     "That the one like Scheme with Smalltalk classes added?" Stu asked. "Gonna show that one too?"

     "That's the one," Wil agreed. "But it's too warty. However, you can consider this a rewrite of Ygg, but about three generations after. Maybe four. They pile up. It was called Mithril when I was at Netscape."

     Zé mocked Wil's excess chatter with a one-handed yak-yak-yak gesture. "What happened to Mithril?" he asked.

     "Got distracted by too much Smalltalk and too much C++ efficiency," Wil looked up, trying to remember.

     "Did you use a Y code prefix in Ygg?" Stu guessed.

     "That's right!" Zé realized. Then to Wil, "So thorn is a little Yggdrasil? And you kept a small y?"

     "Well," Wil looked sheepish. "Let's look at boxes."

meat «

     "Of course, you need to read peg and tag along with this stuff," Wil explained. "Virtually at the same time, unfortunately. All three depend on each other. And imm showing immediate values encoded using the yp peg api."

     "A yt tag precedes each box in memory?" Stu recalled.

     "Yes," Wil confirmed. "And each tag is four bytes — a 32-bit strong dynamic type tag describing primitive class, size, format, and metainfo bits."

     "Are boxes four byte aligned?" Zé guessed. "I guess eight byte alignment is another good choice: giving you three primary tag bits in pointers instead of only two."

     "Yes, just four byte aligned," Wil nodded. "More alignment would make things bigger, wasting space. I'll save eight byte alignment for a later 64 bit version."

     "Why are boxes aligned?" Stu wondered.

     "Several reasons," Wil ticked off his fingers. "Might as well since most C structs are multiples of four anyway; necessary to tell immediate values from box pointers; and aligned data is faster to execute most of the time."

     "Code used to crash when accessing data at non-aligned addresses," Zé noted. "But now Intel accesses data of any size at any address — it's just slower if not aligned."

     "How do boxes get aligned?" Stu asked. "How can you make sure? Allocate them using the book api?"

     "Yes, exactly," Wil smiled. "Books and pages have much better than four byte alignment. So it's only necessary to allocate space in multiples of four bytes."

     "And every box has a yt tag before it?" Stu asked. "Which are also four bytes in size? How do you ensure a tag precedes each box? You just do it?"

     "Yes, every box has a tag in the four bytes immediately preceding its starting address," Wil confirmed. "It's the contract — an invariant that must be true, no exceptions."

     "What did they call tags in ML source code," Zé asked.

     "Descriptors," Wil replied. "Descriptor means tag in more letters and syllables. In early 90's ML, every descriptor had 0x10 in the low two bits, so you could see the start of a box in some places. But I don't do that."

     "What's that ybox class I see there?" Zé pointed.

ybox «

     "This ybox class represents what is known about all boxes," Wil explained. "Mainly two things: a box must be four byte aligned and is preceded by a yt tag."

     "Sounds familiar," Zé joked. "Where did I hear that before?"

     Wil ignored him. "Note there's no state: no members are declared. Just a static method testing alignment, and uh, a method getting the tag. Which kinda looks like state, because it is. But using an unconventional negative index to reach back before the box."

struct ybox { // opaque type NOMINALLY base of box structs « typedef ptrdiff_t p_t; // ptr type: suitable int for masks «

// good: 4 byte aligned (nil is permitted for ybh::hgood()); // b is a box ptr; don't check for nil: peg boxes never nil; // return true if low two bits off p are zero (u32 aligned): static bool goodb(const void* b) { // 4-byte aligned? « return 0 == (3 & (p_t) b); // zero in low two bits } static void badb(const void* b, const char* where);

// goodn: length is 4-byte aligned? static bool goodn(n32 len) { return 0 == (3 & len); } « static void badn(n32 len, const char* where); // not goodn // enum ebase { e_box = 0 };

// b must be a 4-byte-aligned box addr in gc format memory; // return four bytes before any 4-byte-aligned box as tag static const yt& btag(const void* b) { « return ((const yt*) b)[-1]; } }; // struct ybox

     "Wil, that btag() method is disturbing," Stu squirmed.

     "Tough, it works," Wil shrugged. "If it stops working — won't happen — I'll change it to something else."

     "What if pointer b is a nil pointer?" Zé asked.

     "That's a good question," Wil beamed. "Obviously, don't use nil as a box address. Pegs stored in gc space never contain a nil value. Pegs are never zero: not a valid peg value."

     "What if it happens anyway?" Stu wondered.

     "Treated like memory corruption: an assert or an exception, or something," Wil explained. "Or maybe a crash. The idea is: I don't want Lathe to bother checking for nil."

     "How is language level nil represented?" Stu asked.

     "That's an immediate value, yp::i_nil, shown now in the peg api and later in imm," Wil replied.

     "I see it's equal to constant 2," Zé noticed. "Immediates have at least one nonzero bit in the low two bits?"

ybox.cpp «

     "That's right," Wil nodded. "But format of immediates won't figure in the rest of the box material. Let's look a two error logging methods for ybox."

void ybox::badb(const void* b, const char* where) { « // on false goodb ylog(1, "badb(b=%#lx, at=%s) box is NOT 4-byte ALIGNED\n", (long) b, where); }

void ybox::badn(n32 len, const char* where) { « // len NOT multiple of 4 ylog(1, "badn(len=%#lx, at=%s) len is NOT 4-byte ALIGNED\n", (long) len, where); }

     "These will never get called unless someone screws up," Wil explained. "So they're not really interesting."

     "Do all boxes subclass ybox?" Zé asked. "Some of them? None? There's not much api to inherit."

     "None of the box types inherit from ybox," Wil said. "Just about all of them define the btag() method again themselves. I like that better than inheriting btag()."

     "Why not?" Zé agreed. "Inlines are cheap."

     "Exactly," Wil chimed. "Let's look at a box now. Let's start with one used to track space for box allocation."

ytocb «

     "Let me guess: TOC stands for table of contents," Zé anticipated. "Reminds me of Bento a little."

     "Bingo," Wil pointed. "Box ytocb is the first box allocated in every book of a gc address space."

     "To say how much book space is used?" Stu asked.

     "Correct," Wil nodded. "The run named b_v describes space at book's end not yet allocated. To allocate N bytes, where N must be a multiple of four, you just call b_v.vtake(N)."

     "Oh my god," Stu blurted. "You just showed me how boxes are allocated. It's trivial pointer bumping."

     Wil scratched his ear. "Yes, except when vtake() returns nil, it means b_v has insufficient space, and a new book must be allocated — happening seldom when boxes are small."

     "Could that cause gc?" Zé hazarded.

     "I don't want to get into garbage collection early," Wil pleaded. "But I think the answer is no. I'd rather gc happens at well defined times, not at any random box alloc."

// ytocb is expected as first box put into box allocated space struct ytocb { // table of contents box « yv b_v; // yv describes unused space (in book) «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_toc, e_bt = (sizeof(yv)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; }

ytocb(yv const& v) : b_v(v) { } « ytocb() : b_v() { } };

     "Hey, Wil," Zé interrupted. "What's with the b suffix on the end of the class name? Does it mean box?"

     Wil confirmed quickly, "Yes, every box ends with b for box, and methods and members typically also begin with b for box. This should help clarify when gc memory is used in C++."

     "Didn't you also use b for buf?" Stu remembered, rubbing his chin. "Won't that cause some confusion?"

     "Yes, but yb buffers and out stream ybo don't appear often," Wil rationalized. "And it's hard to confuse them with boxes: one's too short and the other doesn't end in b."

     "I think you just have a fetish for b and p because they together they make þ for briarpig," Zé teased.

     Wil shook his head in exasperation. "Whatever."

     "It looks like constant e_bt is the value used by every ytocb box as it's tag," Zé changed subjects. "What's with the 16 bit shift shown? And why sizeof(yv)?"

     "The high 16 bits of every tag is a length field," Wil explained. "For a vector box of N parts, the length is the value of N. Constant sized boxes like ytocb put their size in bytes in the length field instead. Stu, you want to know why?"

     "Why put constant sizes in the length field, Wil?" Stu asked gamely at Wil's prompt. "Isn't it redundant?"

     "Yes! In constant sized boxes, size in the length field is redundant," Wil gestured excitedly. "It can be wrong: I can assert the expected size is present to detect corruption."

     "Detecting corruption sounds so exciting," Zé mocked.

     "You know what I mean," Wil smiled. "Notice variations from expected values as soon as possible to see when the runtime goes foul. Kinda like a soft SEGV."

     "Why expect a runtime to go foul?" Stu wondered.

     Wil shrugged. "It usually does. I just want to catch it faster so Darwinian selection filters more bugs sooner."

     "You're a pessimist," Stu accused.

     Wil made a wry expression. "Do ya think??"

     Zé raised a finger. "You should use a longer value than just size if that's your goal," he warned. "Maximizing grey distance might catch more corruption."

     Wil raised his hands. "I know there are other options," he said. "This one fits in a pick-one-and-go-on category."

     Zé dry-washed his hands like a mad scientist. "We can make a list of all options and carefully measure what works best."

     "Moving right along," Wil said to Stu, "The following struct defines both the tag and ytocb box together."

tytocb «

     "Wil Munny: license to overkill," Zé channeled Dex. "The name's only one letter different? You prefixed t for tag on the front, turning ytocb into tytocb."

     "Yeah, it seemed fitting," Wil said, "Since state concatenates a tag and a ytocb box, the name does too."

struct tytocb { // format for ytocb + preceding tag metainfo « yt t_toc; // ytocb::e_bt « ytocb b_toc; // box for toc «

tytocb(yv const& v) : t_toc(ytocb::e_bt), b_toc(v) { } « tytocb() : t_toc(ytocb::e_bt) { }

yv* bv() { return &b_toc.b_v; } // ptr to yv in toc box « }; // struct tytocb: used as header for every book of boxes

     "That's not really necessary, is it?" Stu wondered. "Oh, both the members have nearly the same name, but one has t for tag and the other b for box?"

     "Yeah, that's how the names go," Wil agreed. "No, it's not necessary. But it makes it somewhat easier to envision bytes allocated at the start of each book. This way is explicit: sizeof(tytocb) bytes are allocated."

     "Something tells me this is just a warmup for a more complex case," Zé guessed. "Do you have a hairy example later?"

     "Yes," Wil rubbed one eye sheepishly. "I do the same thing with symbol boxes, but each is preceded by a hash box. So one struct joins all four parts like this: t+b+_+t+b."

     "That must be a bitchin' name," Zé teased. "I thought you didn't like complexity? What's up?"

     "I wanted symbols to look physically identical to strings," Wil explained. "But I also wanted each symbol to cache a string hash for efficiency. Two boxes were simpler than writing complex code making two different formats look like one type."

     "You guys keep jumping ahead," Stu whined. "It's disturbing. Can we look at the symbol box next?"

     "Let's start with something simpler," Wil suggested. "Since Lathe is a Lisp, it should have pairs for cons cells, containing two pegs. But first, here's a box with one peg only."

yp1b «

     "The name yp1b means one-peg box?" Zé guessed. "A box with one peg in it? A pointer to something else."

     "Or an immediate value," Wil reminded. "Member b_p can hold any valid peg value, including immediate literals."

     "But what good is it?" Stu asked. "Anyone who wanted to use a box like this could instead point directly to the value desired. Right? What's the point?"

struct yp1b { « yp b_p; // one peg (refers to an imm or a box) «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_p1, e_bt = (sizeof(yp)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } }; // struct yp1b refers to exactly one imm or one boxed value

     "Indirection," Zé suggested. "Two parties can share a value whose content can be updated in one place. It's like a handle if a peg is like a pointer."

     "Isn't that the same thing as a vector of length one?" Stu asked. "Does this box have an advantage? Is it smaller?"

     "You're right," Wil conceded. "It's virtually the same as an array of length one. And it's the same size; there's no advantage. I don't actually care what it's used for. I just wanted to define one of every plausible primitive box format."

typ1b «

     "Looks like typ1b is another one of your combined tag and box deals," Zé noticed. "Whoa, check the comments about sentinel use in non-gc space. What does that mean?"

     "I don't remember," Wil chewed his lip.

     "You don't remember?" Zé asked incredulously.

     "Happens now and then," Wil shrugged. "Apparently I planned to use boxes in non-gc space — maybe as roots — and this would be used to mark some edge or another."

// typ1b first used as dummy box in non-gc land, as sentinel struct typ1b { // format: alloc yp1b + preceding tag metainfo « yt t_p1; // yp1b::e_bt « yp1b b_p1; // box for p1 «

typ1b() : t_p1(yp1b::e_bt) { } «

yp1b* bbox() { return &b_p1; } // ptr to box: box of 1 ptr « }; // struct typ1b: single boxed ptr (as non-gc sentinel box)

     "Don't all boxes have to live in gc-space?" Stu asked. "I thought all boxes had to be allocated in a book, and be garbage collected. What's the exception?"

     "Any exception you like," Wil said. "Boxes outside gc space don't move, so they don't increase cost of gc — but you must treat them as roots in any gc space they reference."

     "That's ... really interesting," Zé considered. "Any and every gc space can refer to boxes that aren't garbage collected. You can put standard data structures in a common place where they get shared. That's cool. It makes gc cheaper?"

     "Yes," Wil smiled. "It seems an especially good idea for immutable data like symbol intern tables. Global symbols need never move, and don't intersect collected gc spaces, so all those spaces collect faster. Other types are similar."

     Zé stared into space, wheels turning in his head. Wil decided to go on before the next idea popped out.

     "Hey," Stu noticed. "What's the default value of the peg in both yp1b and typ1b?"

     "Nil," Wil replied. "The yp peg empty constructor sets the value to immediate yp::i_nil. The same is true of any box that declares a yp peg member, like ypairb below."

ypairb «

     "Box ypairb has two slots for the car and the cdr," Wil introduced. "If you only know C++, that's like first and second in std::pair. Lists are a primary use case."

     "Are you going to explain the origin of the names in terms of contents of the address and decrement registers in IBM assembler way back when?" Stu asked.

     "Absolutedly not," Wil said. "Besides, you just did. I'm not going to spend any time talking about Lisp names, now or later. I just use them."

     "In lists," Zé offered without prompting, "a list element is held in car while cdr refers to the next pair in the list, so cdr is just a next pair pointer."

     "Thanks for that very small lecture," Wil said. "Will you be giving a seminar on Lisp semantics later?"

     "Sorry," Zé smiled. "I love Lisp. When will your Scheme be done? Wait, I'm kidding! Don't get upset."

struct ypairb { // pair box: a Lisp cons cell (synonym: yp2b) « yp b_car; // same as yp2b::b_p[0]; « yp b_cdr; // same as yp2b::b_p[1]; «

const yt& btag() const { return ((const yt*) this)[-1]; } «

bool bescaped() const { // getter for 'escaped' bit « return ((const yt*) this)[-1].isescaped(); }

void p2escape(bool esc) { ((yt*) this)[-1].tescape(esc); } «

     "What's an 'escaped' pair?" Stu asked.

     "Currently I use that bit to mean a pair should use brackets instead of parentheses when printing," Wil replied.

     "That's pretty arbitrary," Zé observed. "But I suppose that's not going to be visible at the language level?"

     "Don't know yet," Wil shrugged. "But it seems a good idea to expose as little as possible in, say, a Scheme view of Lathe — at least when it's arbitrary hacks like that."

enum { e_bk = yt::k_p2, e_bt = ((sizeof(yp)*2)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].vkind() == (u32) e_bk; }

bool bt() const { return e_bt == ((const u32*) this)[-1]; }

     "Api above is standard box boilerplate," Wil pointed. "Constructors below let you init the slots."

ypairb() { } // both car and cdr are yp::i_nil « ypairb(yp car, yp cdr) : b_car(car), b_cdr(cdr) { }

static ypairb* notb(yp const& p); // log err, MUST ret zero static ypairb* mustb(yp const& p) { « ypairb* b = (ypairb*) p.u.p_box; return ((3 & (yp::p_t) b) == 0 && b->bkind())? b : notb(p); } static ypairb* mayb(yp const& p) { « ypairb* b = (ypairb*) p.u.p_box; return ((3 & (yp::p_t) b) == 0 && b->bkind())? b : 0; }

     "Wil," Zé began in exagerrated patience. "What do those methods mean? They look like dynamic casts."

     "That's exactly what they are," Wil nodded. "You call mayb() when it's okay for a peg not to have a pair as its value. But mustb() logs an error by calling notb() when the referenced box is not a pair. Zero means not a pair both ways though."

     "Why bother with a notb() method?" Stu asked.

     "To save inlined code size," Wil said. "The speed of the unexpected error case isn't of any concern. This way mustb() is able to log an error, but does so cheaply."

// list length (safe from cycles by virtue of max length x); // x is max length targeted; if x<0, x is (100*1024*1024); // if x is zero, blen() always returns 1 (positive length); // return length if x is more than length; else x or x+1 // (length of list or x, whichever is less) n32 blen(int x) const;

     "I see blen() returns list length," Zé began. "But I can't believe the way you handle cycles is just stopping after a zillion steps. Don't you know how to detect list cycles?"

     "Okay," Wil placated. "I knew you'd complain about that. I was in a hurry, and knew I wasn't going to call it on a list with cycles. Are you going to make me do the right thing?"

     "Absolutely, you slacker," Zé chided. "When we get to it; there's no emergency right this second."

// true if length is exactly equal to n, otherwise false: bool beqlen(n32 n) const { « return this->blen((int) (n+1)) == n; }

     "How does this beqlen() efficiently check exact list length equal to n," Stu asked. "Won't blen() iterate over all the pairs anyway? Is the argument to blen() involved?"

     "Yeah," Wil confirmed. "blen() doesn't look at more than n+1 pairs, so it won't spend linear time scanning the list beyond the point of interest."

// use blen(n+1) to see if list len is lt, eq, or gt n; // neg: blen(n+1)<n; zero: blen(n+1)==n; pos: blen(n+1)>n int bcmplen(n32 n) const; // cmp blen(n+1) to n: blen(n+1)-n

     "I get it," Stu said. "And if N is actual list length, then bcmplen() returns the correct sign for N-n, also without looking at too many pairs. No don't stop; let's go on."

// true iff b_car == car & blen(n+1) == n (ie bcmplen(n)==0) bool bmatch(yp const& car, n32 n) const;

// this bmatch() is easier to use when car is yyb* bool bmatch(ybox* car, n32 n) const;

     "Method bmatch() returns true if a list starts with a specific value and is exactly a length of interest," Wil said.

     "Oh, so like," Zé hypothesized, "you can ask, 'Is this a list of length three whose first element is symbol foo?' Like that? For simple pattern recognition?"

     "That's right," Wil confirmed. "And that yyb in the comment is the symbol box type. Sorry. It's short, though."

     "You abbreviated symbol to just a single y?" Zé asked, chewed his lip, then smiled. "I like it. Weird though."

// access pointer to the nth b_car, starting from zero; // returns a zero (nil) ptr when n >= length (ie blen(n+1)); // zero-based car element or nil; semantically (*this)[n] yp* bnth(n32 n) const; // &b_car of nth pair, zero-based yp* operator[](n32 n) const { return this->bnth(n); } «

     "Method bnth() skips n pairs and then returns the address of the b_car of the next pair," Wil explained. "So if n is zero, no pairs are skipped and b_car of the current pair is returned."

     "I suppose a null pointer is returned when the list is not long enough?" Zé guessed. "If n exceeds length?"

     "Exactly," Wil said. "And operator[] is the same."

// non-nil ptr to &b_car for cadr ONLY IF length == two yp* blen2cadr() const; // cadr ONLY IF length exactly two }; // struct ypairb can be safely cast back and forth to yp2b

     "Let's go over blen2cadr() when we see the code below," Wil suggested. "The comment is correct though."

     "I can imagine a lot more pair methods," Zé noted.

     "I know!" Wil sang. "I'll add them at need later."

ypairb.cpp «

     "These ypairb methods are non-inline code," Wil introduced. "Method notb() logs when a peg is not a pair."

ypairb* ypairb::notb(yp const& p) { « // log error & ALWAYS return zero=nil ypairb* b = (ypairb*) p.u.p_box; if ((3 & (yp::p_t) b) == 0) // box? yellf(__LINE__,__FILE__,"ypairb::notb(p=0x%lx) is %s", (long) b, b->btag().tname()); else // immed yellf(__LINE__,__FILE__,"ypairb::notb(p=0x%lx) is immed", (long) b); return 0; // nil pointer }

// true iff b_car == car and blen(n+1) == n (ie bcmplen(n)==0) bool ypairb::bmatch(yp const& car, n32 n) const { « return b_car == car && this->blen((int) (n+1)) == n; }

bool ypairb::bmatch(ybox* car, n32 n) const { « // easier to use when car is yyb* return b_car == car && this->blen((int) (n+1)) == n; }

     "As mentioned earlier," Wil pointed, "bmatch() is true if a list is exactly length n and starts with the value given."

// use blen(n+1) to see if length of list is lt, eq, or gt n // neg: blen(n+1)<n; zero: blen(n+1)==n; pos: blen(n+1)>n int ypairb::bcmplen(n32 n) const { « // compare blen(n+1) & n: return blen(n+1)-n return (int) this->blen((int) (n+1)) - (int) n; }

     "Code for bcmplen() is self explanatory once you know what blen() returns: the list length capped to the value passed or one more. So bcmplen() returns list length ordering."

// access pointer to the nth b_car, starting from zero; // returns a zero (nil) ptr when n >= length (ie blen(n+1)); // zero-based car element or nil; semantically (*this)[n] yp* ypairb::bnth(n32 n) const { « // return &b_car of nth pair starting from zero const ypairb* p = this; // next pair to visit (zeroeth elem) do { if (0 == n) // passed n elems already? zero more to skip? return (yp*) &p->b_car; // cast away const --n; // skip another elem p = ypairb::mayb(p->b_cdr); // non-nil if it's a pair } while(p); // cdr another pair? (nb: ignore dotted lists) return (yp*) 0; // list was not long enough: return nil }

     "Method blen2cadr() returns the second list element, but only if list length is exactly two," Wil summarized.

// non-nil ptr to &b_car for cadr ONLY IF length == two yp* ypairb::blen2cadr() const { « // return cadr ONLY IF length is exactly two const ypairb* p = this; // next pair to visit (first elem) yp* cadr = 0; n32 n = 0; do { if (++n == 2) // found second elem containing the cadr? cadr = (yp*) &p->b_car; // cast away const else if (n > 2) // more than two? return 0; // fail with nil when length not exactly two p = ypairb::mayb(p->b_cdr); // non-nil if it's a pair } while(p); // cdr another pair? (nb: ignore dotted lists) return cadr; }

     "I see we're coming up on that lame blen() that doesn't detect cycles," Zé noticed. "Let's look at the old version first. Then I'll rewrite it for you."

     "What's so big about detecting list cycles?" Stu asked.

     "If you didn't stop scanning, you'd have an infinite loop," Zé explained. "Wil avoids that by stopping eventually, but only after burning more cycles than necessary."

     "Note blen() returns at least one since length can't be zero when a pair passed as this is a first elem," Wil said.

// list length (safe from cycles by virtue of max length x); // x is max length targeted; if x<0, x is (100*1024*1024); // if x is zero, blen() always returns 1 (positive length); // return length if x is more than length; else x or x+1 // (length of list or x, whichever is less) n32 ypairb::blen(int x) const { « // return len of list or x, whichever is less n32 max = (x < 0)? (100 * 1024 * 1024) : (n32) x; n32 n = 0; // length must reach one: this instance is first const ypairb* p = this; // next pair in list to count do { // always perform loop at least once if (++n >= max) // length after this pair reaches the max? return n; p = ypairb::mayb(p->b_cdr); // non-nil if it's a pair } while(p); // cdr another pair? (nb: ignore dotted lists) return n; // reached end of list before max; this is length }

cycle detection «

     "How do you improve blen() as shown above with cycle detection?" Stu asked.

     "That's a classic interview question," Zé noted. "All you need is two cursors advancing through a list at different speeds. If the fast one sees the slow one, you have a cycle."

     "That sounds simple and complex, both," Stu whined.

     "If you're done beating me up for being lazy, let me take a crack at a second draft," Wil said. "If I see a cycle, I'll immediately return max+1 since actual length is 'infinite'."

n32 ypairb::blen(int x) const { // new code detects cycles « // return len of list or x, whichever is less n32 max = (x < 0)? (1024 * 1024 * 1024) : (n32) x; n32 n = 0; // length must reach one: this instance is first const ypairb* p = this; // next pair in list to count const ypairb* slow = p; // slower, half-speed cursor do { // always perform loop at least once if (++n >= max) // length after this pair reaches the max? return n; p = ypairb::mayb(p->b_cdr); // non-nil if it's a pair if (yunlikely(p && p == slow)) // cycle detected? return max + 1; // cycled list length is 'infinite' if (0 == (n & 1)) // length is even this time? slow = ypairb::mayb(slow->b_cdr); // advance half speed } while(p); // cdr another pair? (note: ignore dotted lists) return n; // reached end of list before max; this is length }

     "That looks better," Zé gloated. "Wait, you didn't test it? It's not done until it's tested."

     "You just saw me write it!" Wil objected. "No, I didn't test it. I promise to check it out later."

     "Won't it go slower?" Stu asked. "Is that good?"

     "Slightly slower," Wil agreed. "But it detects pathological states faster, saving cycles on upper bound paths. If you can't afford linear time code, you shouldn't use a list."

     "What about nutballs who run micro-benchmarks?" Zé reminded. "I bet it's the first thing Dex will do."

     "Why don't you knock some sense into him for me?" Wil asked. "Expand his perspective a little."

scan latency «

     "What's your plan to reduce list scan latency?" Zé asked.

     Stu's eyes boggled. "What are you talking about? Scanning for end of a list takes a long as it takes. Doesn't it?"

     But Wil knew what Zé meant. "That's not a format problem," he excused. "We're just doing box formats now."

     Zé drummed his fingers. "You were just going to let other green threads wait while blen() steps through ten million list pairs?" he demanded. "I know you better than that."

     "Green threads?" Stu was lost. "What green threads?"

     Wil turned to Stu and explained, "It's not fair to other flows of control to make them wait arbitrary amounts of time for more cycles. Their latency would suck."

     "Oh," Stu realized. "C++ uses native threads for that problem. There's no provision for pre-emption otherwise. Are we talking about pre-emptive scheduling?"

     "No," Wil corrected. "Zé means blen() should scan a list incrementally, no more than N pairs at once so time before it returns is under some reasonable upper bound."

     "Cooperative scheduling," Zé told Stu. "Long running primitive methods can return with a code saying, 'I'm not done so please call me again so I can make more progress.'"

     "When I do it with trampolines," Wil explained, "it will make C or C++ methods look really weird when continuation passing is simulated. Initial toy code should skip it."

     "I think it's fun," Zé enthused. "Do it first."

     "No," Wil glared. "It'll be too easy to write bugs. I'm thinking toy lathe code can be used to emit C or C++ trampoline code; I can rev that several times more cheaply."

     "Ohhhh," Zé smiled. "That sounds even more fun. Your mindful shtick is useful sometimes."

     "Are we done with boxes?" Stu wondered.

     "No, I just got distracted," Wil said. "Some gc support next."

gc linkage «

yforeb «

     "Okay, this yforeb box is a behind the scenes type for use in gc," Wil introduced. "It's basically the same thing as a forwarding pointer in early 90's ML."

     "Wait, wait, I get it," Zé ethused. "Every time you copy an old box during gc, you overwrite the old box with an instance of yforeb pointing to the new copy."

     "Ding, ding, ding," Wil congratulated. "After a collected box b is moved, the yforeb left behind tells you what new pointer replaces each future reference to b seen."

// yforeb is used to forward refs during garbage collection struct yforeb { « static const int kind = yt::k_fore; « yp b_forward; // where the box was moved «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_fore, e_bt = ((sizeof(yp)*1)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yforeb() { } // default yp sets value to yp::i_nil « };

     "Doesn't that imply a box can't be smaller than a yp pointer?" Stu asked. "Since a yforeb must fit inside any box?"

     "Yes it does," Wil agreed. "So any objects like an empty vector or string that might be legitimately be zero sized must actually be at least four bytes in size."

     "Except those degenerate cases should be immutable shared global instances," Zé observed, "since they can't be modified — having no state — they might as well be unique."

     "Wil," Stu prodded gently, "would it have killed you to name the box yforwardb instead? What's three more letters?"

     "Yes, it would have killed me," Wil said calmly. "My doctor said longer names can put me in prophylactic shock."

     "Your sarcasm is stale," Stu criticized.

ybackb «

     "Uh, could a ybackb box work like the yforeb forwarding pointer above?" Stu asked. "But why a backward reference?"

     "Not quite," Wil said. "Early 90's ML had a backward reference to the start of a containing gc box object. This allowed you to point into the middle of a gc box."

     "Oh that's right," Zé remembered. "Code continuations were done that way for code stored inside strings. Code entry point targets for return addresses were preceded by back refs."

     "Yes, it effectively allows you to pretend the middle of a string is the start of a box," Wil said. "But during gc you need to figure this out and find the real box start exactly."

// ybackb embeds non-peg boxes in other non-peg boxes struct ybackb { « static const int kind = yt::k_back; « i32 b_offset; // offset to outer box holding this inner box «

ybackb() : b_offset(0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_back, e_bt = ((sizeof(i32)*1)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

     "Is that how you're going to use ybackb too," Stu asked. "The same way ML did? Any reason not do do so?"

     "I don't know yet," Wil shrugged. "I'll figure it out later and make any changes to ybackb that seem helpful."

objects and classes «

     Material from this column describing yobjb object boxes and yclassb class boxes has been moved to class where

a Smalltalk style object system can be treated in one place. However, so this page includes at least summary material on every box format, abbreviated hash and symbol box api appears below. However, few methods shown here — only those in common with most box formats. Please see class for the full api.

yobjb «

     The following code omits comments; see yobjb on the class page for full api and comments. An object box describes a Smalltalk style object: a record with a class dispatching methods polymorphically by duck-typed message sending.

struct yobjb { « static const int e_bk = yt::k_obj; « yp b_class; // the class for this object « yp b_slot[1]; // first of yt::t_len p slots «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_obj; } « };

yclassb «

     The following code omits comments; see yclassb on the class page for full api and comments. A class box describes a Smalltalk style class or metaclass.

// yclassb must be physical yobjb subtype in terms of format struct yclassb { « static const int kind = yt::k_class; « yp b_class; // the class for this object (yobjb::b_class) « yp b_superclass; // base class « yp b_subclasses; « yp b_methodDict; « yp b_fields; « yp b_instVars; « yp b_instVarDict; « yp b_name; « yp b_classPool; « yp b_classInstance; «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_class, e_bt = ((sizeof(yp)*10)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yclassb() { } // yp() constructor puts yp::i_nil into each « };

big integers «

     Material on big integers describing yintb bigint boxes has been

moved to bigint where they can be treated in one place, and possibly other big numbers as well. However, so this page includes at least summary material on every box format, abbreviated bigint box api appears below. But few methods shown here — only those in common with most box formats. Please see bigint for the full api.

yintb «

     yintb

struct yintb { « static const int kind = yt::k_int; // bigint « yp b_digits; // p to cord of byte digits «

bool neg() const { // is sign of this big int negative? « return ((const yt*) this)[-1].isescaped(); } void setneg(bool b) { ((yt*) this)[-1].tescape(b); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_int, e_bt = (sizeof(yp)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yintb() { } // default yp for b_digits: equal to yp::i_nil « };

hashes and symbols «

     The remainder of this column describing yhashb hash boxes and yyb symbol boxes has been

moved to symbol where symbols can be treated in one place, along with hashmaps used for intern tables. However, so this page includes at least summary material on every box format, abbreviated hash and symbol box api appears below. However, few methods shown here — only those in common with most box formats. Please see symbol for the full api.

yhashb «

     The following code omits constructors; see yhashb on the symbol page for full api and comments.

// minimal support for hashmaps of symbols kept in ysb format // yhashb holds a hash for yyb, and a link for making lists struct yhashb { « static const int kind = yt::k_hash; « yhashb* b_next; // linked list of hashes (this is a gc peg) « u32 b_h32; // aims to be zlib crc32() hash of a symbol «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_hash, e_bt = ((sizeof(yp)*2)<<16)|yt::e_c|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } void badt(const char* where) const; // on false bt() is }; // yhashb must precede yyb, but yyb need not follow yhashb

yyb «

     The following code omits many additional methods; see yyb on the symbol page for full api and comments.

// yyb looks like ysb with mandatory hash and is always const // yyb is identical in format to ysb, but preceded by yhashb struct yyb { « static const int e_bk = yt::e_c|yt::v_sym; // yt::e_c=CONST « u8 b_u8[4]; // first four of this->len() u8 slots «

u32 vblen() const { return ((const yt*) this)[-1].len(); } « u32 blen() const { return ((const yt*) this)[-1].len(); } « bool bkind() const { « return ((const yt*) this)[-1].cvkind() == (u32) e_bk; }

static u32 n2t(n32 n) { return (n<<16)|e_bk; } « // yyb differs from ysb in: // 1. always const, 2. yhashb always before

static u32 v2t(yv const& s) { « return (s.v_n<<16) | yesc(s) | e_bk; }

struct Yq { yyb const& q_y; Yq(yyb const& y): q_y(y) { } }; « Yq quote() const { return Yq(*this); } « void bprint() const; // dump to stdout via yout void bdump(yo& o) const; void bcite(yo& o) const; //void bshow(yo& o) const; n32 bcage(n32 max) const; }; // yyb (symbol box)

license «

     All this code is available only under the BriarPig mu-babel license described fully on the rights page. You do not have permission to reprint this page in any way. Neither feeds nor repackaging is allowed. You can link this page if you want folks to read it.

menu

     mu, toy, peg, imm, tag, box « Þ, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile

     (thorn, todo, names, fd, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, hash, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype)

abbreviated api «

     Zé scratched his jawline and pondered an imbalance he observed in code for box formats. "You know," he said. "It's odd pairs and symbols seem fleshed out, but little else has detail."

     "There's a good reason for that," Stu said confidently.

     "Oh yeah," Zé looked interested. "What is it?"

     "I don't know," Stu admitted snickering. "But I'm sure Wil has a good reason. Or a rationalization."

     "Comedian," Wil accused. "One day I was carefully putting meat on bones of all the boxes and I asked myself a question."

     "'Why am I working so hard?'," Stu suggested, then wilted under Wil's glare. "I shouldn't say your lines for you?"

     Wil held up a hand. "I asked myself," he continued, "why am I writing code breadth-first that I don't need to call yet?"

     "Ahhh," Stu pretended to follow, then, "I still don't get it."

     "When you write code you're not calling," Wil explained, "you get disconnected from evidence and feedback."

     "Next thing you know," Zé contributed, "you wander off in the weeds and get lost chasing butterflies."

     "Strange metaphor," Wil frowned. "I'd say you build traps to capture animals not now in evidence."

     "Is that hunter thing working for you?" Zé sniped.

     "In any case," Wil continued. "You start to run out of energy when you're sucking vacuum."

     "Okay, that part I understand," Stu nodded.

     "Once I saw the pattern was going to work if I continued," Wil explained, "I switched my attention to reading and writing."

     "Then you can come back to any one box later," Zé guessed, "as soon as you want to do something with it in live code."

     "Exactly," Wil grinned. "They I'll be able to execute something right after I write it. Fast turnaround is bliss."

     "I'm pretty sure gold or ignorance is bliss," Stu said.

tuples «

     "Does tuples mean the same thing it does in other languages?" Stu asked. "In Python tuple means a very specific thing: a kind of immutable array."

     "That's a great question," Wil smiled. "The word tuple applied to a toy language box only means ordered sequence of values, including different types, mutable or not."

     "So it's a very generic mathematical definition of tuple," Zé said. "You could even say a list was a tuple."

     "I guess I use tuple to mean fixed format even if mutable, and lists are too variable," Wil mused. "So let's avoid saying tuple when using lists, unless you're talking about semantics."

     "Well then, tuple describes practically any struct," Stu groused. "What's the point?"

     "Tuple is very generic in math — much like thing, or maybe object in oo languages," Zé suggested. "Merely grouping a set of items with an order gives you a tuple."

     "So pairs are tuples, vectors are tuples, and class instances are tuples?" Stu asked. "What good is that?"

     "It expresses an idea you focus on the set of grouped values," Zé explained, "without weighing physical form much."

     Wil nodded. "That sounds good," he said. "And tuple members are accessed by either number or symbols."

     "So tuple member names are like map keys?" Stu asked. "Except when keys are integers, you can use a vector?"

     "Yeah," Zé agreed. "Symbol names as keys just means someone has to lookup the symbol in a map and find an associated index or offset for that name."

     "Isn't that what compilers do with struct member names?" asked Stu. "How are Smalltalk style objects different?"

     "Very perceptive," Wil said. "A compiler maps names to offsets statically at compile time, and dynamic languages might delay the map lookup until runtime."

     "And usually do," Zé noted. "Lookaside caches optimize away repeat map lookups for speed. Standard stuff."

     "So a Smalltalk style object might physically just be an array of N slots," Wil explained. "And the class maps member symbol names to a slot index. Thus any object is a tuple."

     "In Smalltalk," Zé recalled, "members are all private: only visible in class methods. So compiling a method maps names to indexes at compile time, not at runtime."

     Stu stopped to think, so Wil pointed Zé at the next box. Before they could proceed, Stu blurted out another insight: "So single values are like a 1-tuple?"

     "If you insist," Wil shrugged. "You can think of all single value boxes as 1-tuples of slightly different primitive format. In fact, all boxes are just basic tuple architecture."

     Stu looked excited, so Zé nudged him and said, "But you knew this already, right?"

yp2b «

     "Box yp2b is basically identical to ypairb, but with a different name," Wil introduced. "The reason you might bother using it? To access by integer index."

     "It's name means two peg box?" Stu asked.

     "Yes, I suppose the name might have been y2pb instead," Wil considered. "But I'm used to suffixing numbers."

     "Just read the name as peg pair box," Zé suggested.

struct yp2b { // pair « yp b_p[2]; // (synonymous with ypairb) «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_p2, e_bt = ((sizeof(yp)*2)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } }; // struct yp2b can be safely cast back and forth to ypairb

yp3b «

     "Aha, then yp3b means peg triple box?" Stu concluded.

     "Exactly, and there's not much else to say about it," Wil said. "The answer to question 'what is it for?' is: I don't care. It's the same answer for all tuples: whatever you want."

struct yp3b { // triple « yp b_p[3]; // peg triple «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_p3, e_bt = ((sizeof(yp)*3)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yp& operator[](unsigned i) { return b_p[i]; } « yp3b() { } // default yp inits each member of b_p[i] to nil « };

yp4b «

     "Okay, so yp4b means a four peg box right," Stu said without enthusiam. "What's it for? Along with yp1b, yp2b, and yp3b, it's redundant with the vector box that's coming."

struct yp4b { « yp b_p[4]; «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_p4, e_bt = ((sizeof(yp)*4)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yp4b() { } // default yp inits each member of b_p[i] to nil « };

     "You're right," Wil agreed. "I guess my intent was a little fuzzy. But it basically involves one idea: using more box types in a complex graph can give you something to check."

     "You mean a kind of dynamic type checking?" Zé asked. "Say, unless a box is exactly the type expected, then something must be wrong? Isn't that just a bit subtle?"

     "Yes," Wil conceded. "But I can use the 7-bit alt field in a tuple's tag as a kind of extra type signature, if I build non-user-visible graphs using primitive boxes. Hypothetically."

     Zé winced. "I think you should just use Smalltalk style objects for complex graphs you build," he suggested. "Otherwise you'll be writing a lot of odd one-off code."

numbers and vectors «

     As described on types (under þ) many signed and unsigned integers types, as well as floating point types, are declared with specific bit widths. Boxes below declare both single instance versions and vector versions of all integer and floating point types.

     "Note the circumstances under which these might be used under Lathe at the language level will not be described," Wil warned. "It suffices here to show they exist."

i128 «

     "The following i128 type declares a pseudo-128 bit integer," Wil cautioned, "bigger than any native native C integer type."

     "So you would have to simulate 128 bit arithmetic," Zé asked. "Almost like a big int with only 128 bits?"

     "Something like that," Wil agreed. "Or you could just use it for something 128 bits in size, like an MD5 hash."

struct i128 { // 16 bytes of int (ie md5 digest 16-byte hash) « union { // to support a variety of access styles u64 i_u64[2]; « u32 i_u32[4]; « u8 i_u8[16]; « } u; « void clear() { u.i_u64[0] = u.i_u64[1] = 0; } « }; // struct i128

     "No arithmetic on these is defined without more code, right?" Stu asked. "I guess asking you whether you'll do that soon is out of order? Maybe the first milestone toy draft?"

     "I don't have a schedule for using it," Wil sighed. "But I do have a check already for providing space for it as a box."

yi128b «

     "Box yi128b simply delares a field of type i128," Wil said.

struct yi128b { // box for 'integer' with 128 bits « static const int kind = yt::k_i128; « i128 b_i128; // twice the size of i64 «

yi128b() { b_i128.clear(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_i128, e_bt = (sizeof(i128 )<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

yi128vb «

     "The tag for box yi128vb has the yt::e_v vector bit set," Wil said. "So length vblen() is a count of i128 instances."

struct yi128vb { « static const yt::Tvec kind = yt::v_i128; « i128 b_i128v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_i128; } « };

yi16vb «

     "The tag for box yi16vb has the yt::e_v vector bit set," Wil said. "So length vblen() is a count of i16 instances."

     "Why is there no box for a single i16?" Stu asked. "Is it too small to bother with?"

     "Yes," Wil nodded. "An immediate integer requiring no box at all is 30 bits in size, which dominates 16 bits in i16."

/*struct yi16b { « static const int kind = yt::k_i16; i16 b_i16; // unused because imm int subsumes this size };*/

struct yi16vb { « static const yt::Tvec kind = yt::v_i16; « i16 b_i16v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_i16; } « };

yu16vb «

     "I guess the same thing applies to yu16vb?" Stu asked. "Since immediate ints have 30 bits, there's no point in having a non-vector yu16b box?"

     "Yes," Wil confirmed. "And the tag for box yu16vb has the yt::e_v vector bit set, so length vblen() is a count of u16 instances."

/*struct yu16b { « static const int kind = yt::k_u16; u16 b_u16; // unused because imm int subsumes this size };*/

struct yu16vb { « static const yt::Tvec kind = yt::v_u16; « u16 b_u16v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_u16; } « };

yi32b «

     "Box yi32b is only useful for 32-bit integers that don't fit into 30-bit immediate ints," Wil remarked. "It seems a waste to alloc a box when only two more bits are needed."

struct yi32b { « static const int kind = yt::k_i32; « i32 b_i32; // signed int (normally outside imm int range) «

yi32b() : b_i32(0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_i32, e_bt = (sizeof(i32)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

yi32vb «

     "Box yi32vb is the vector version of yi32b with a single 32-bit integer," Wil said. "And vblen() is the vector length."

struct yi32vb { « static const yt::Tvec kind = yt::v_i32; « i32 b_i32v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_i32; } « };

yu32b «

     "Looks like box yu32b is identical to yi32b but unsigned," Zé noted. "Seems redundant. Just being thorough?"

     "Sure," Wil said. "If someone has a large number of unsigned integers to store, why make them use a format with less than perfect results? A UI might be awkward."

struct yu32b { « static const int kind = yt::k_u32; « u32 b_u32; // unsigned int (normally outside imm int range) «

yu32b() : b_u32(0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_u32, e_bt = (sizeof(u32)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

yu32vb «

     "Box yu32vb is the vector version of yu32b with a single 32-bit integer," Wil said. "And vblen() is the vector length."

     "Criminy," Zé whined. "Are you going to repeat yourself for every vector type? I'm gonna fall asleep."

struct yu32vb { « static const yt::Tvec kind = yt::v_u32; « u32 b_u32v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_u32; } « };

yi64b and yi64vb «

     "Alright, let's speed it up a little," Wil mollified. "Box yi64b and box yi64vb are single and vector versions of boxes for 64-bit signed integers. Is this any better?"

struct yi64b { // single i64 box « static const int kind = yt::k_i64; « i64 b_i64; // signed int 64-bits «

yi64b() : b_i64(0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_i64, e_bt = (sizeof(i64)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

     "As usual," Zé kibitzed, "the vector length is vblen()."

struct yi64vb { // vector of i64 « static const yt::Tvec kind = yt::v_i64; « i64 b_i64v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_i64; } « };

yu64b and yu64vb «

     "For unsigned 64-bit integer boxes, yu64b holds one instance, and yu64vb is the vector version," Wil summarized.

struct yu64b { // single u64 box « static const int kind = yt::k_u64; « u64 b_u64; // unsigned int 64-bits «

yu64b() : b_u64(0) { }«

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_u64, e_bt = (sizeof(u64)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

     "My turn," Stu pounced. "yu64vb vector length is vblen()."

struct yu64vb { // vector of u64 « static const yt::Tvec kind = yt::v_u64; « u64 b_u64v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_u64; } « };

yr32b «

     "Box yr32b holds a single 32-bit native C float," Wil explained. "See types for scalar declarations including r32."

struct yr32b { // box for 32-bit float « static const int kind = yt::k_r32; « r32 b_r32; // floating real 32-bits «

yr32b() : b_r32(0.0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_r32, e_bt = (sizeof(r32)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

yr32vb «

     "Box yr32vb is a vector of 32-bit native C float instances," Wil continued. "And the length is vblen()."

struct yr32vb { // vector of 32-bit floats « static const yt::Tvec kind = yt::v_r32; « r32 b_r32v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_r32; } « };

yr64b «

     "Box yr64b holds a single 64-bit native C double," Wil explained. "See types for scalar declarations including r64."

struct yr64b { // single 64-bit double box « static const int kind = yt::k_r64; « r64 b_r64; // floating real 64-bits «

yr64b() : b_r64(0.0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_r64, e_bt = (sizeof(r64)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

yr64vb «

     "Box yr64vb is a vector of 64-bit native C double instances," Wil continued. "And the length is vblen()."

struct yr64vb { // vector of 64-bit doubles « static const yt::Tvec kind = yt::v_r64; « r64 b_r64v[1]; // vector of length tag().len() «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].isTvec(kind); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_r64; } « };

sequences «

ypvb «

     "Box ypvb is a peg vector box representing an ordered sequence of value references," Wil said. "It's the universal vector type used in both Scheme and Smaltalk."

     "And later you'll build discontiguous arrays on top of these primitive vectors?" Zé guessed.

     "Yes," Wil confirmed. "A single contiguous vector must fit in one book, and that's why it's okay the tag length field is limited to 16 bits. Longer arrays can simply scatter gather vectors."

     "You can even do scatter/gather witih btrees," Zé noted.

     "That's one of the ways," Wil agreed. "And it should definitely be done when other btrees are added, for comparison."

     "What difference between vectors and arrays would appear visible to end users?" Stu asked.

     "Maybe none," Wil replied. "Once scatter/gather arrays are done, Lathe might use them transparently when efficiency or content size makes that choice better."

struct ypvb { « static const int e_bk = yt::k_vec; « yp b_pv[1]; // first of yt::len() slots in peg array «

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_vec; } « };

ysb «

     The following ysb string box strongly resembles the yyb symbol box shown on symbol, and that's where the following comments are duplicated. You might want to read symbol details on symbol boxes for comparison. They both have the same format shown below for octet sequences.

     Lathe's primitive string box, ysb, uses pointer plus length format with string address pointing to the first octet in a string, and with length held in the tag preceding the string. A totally gratuitous redundant null byte follows content of a string, to be friendly with C runtimes, and to act as a sentinel detecting overwrites. In effect, a primitive Lathe string looks like a C string, but with a dynamic type tag in 16 bits of the immediately preceding four bytes. Any null byte found inside is not null termination; if you use null as a length marker, you'll have bugs: it's wrong. Strings contain octets, not characters — neither charset nor codepoint size are specified, and both are imposed from without at a higher, non-standard level.

// ysb: contiguous octet sequence, always followed by nul byte struct ysb { « static const int kind = yt::k_cord; « // 1st 4 bytes reserve space for end nul plus up // to 3 non-aligned bytes u8 b_u8[4]; // first 4 of yt::len() u8's (room for end nul) «

ysb() { *b_u8 = 0; } // just final null for empty vector « // caller must ensure >= s.v_n+1 bytes alloc'd for this ysb: ysb(yv const& s) { b_u8[s.v_n] = 0; if(s) ::memcpy(b_u8, s.v_p, s.v_n); }

yv asv() const { « yv v(this, ((const yt*) this)[-1].len()); return v; } operator yv() const { « yv v(this, ((const yt*) this)[-1].len()); return v; }

u32 vblen() const { return ((const yt*) this)[-1].len(); } «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { return ((const yt*) this)[-1].istext(); } «

static u32 n2t(n32 n) { return (n<<16)|yt::v_cord; } «

// bmore: count of 4 u8-aligned static n32 bmore(n32 n) { return n & ~((n32)3); } « static n32 bmore(yv const& s) { return s.v_n & ~((n32)3); } };

     "I suppose strings need not be ascii either, any more than symbol boxes," Zé said. "The charset is not specified? And null bytes can be embedded, just as symbols do?"

     "Yes," Wil nodded. "And both convert into pointer plus length format using yv run instances."

     "Do you allocate bytes for a string in similar fashion?" Stu asked. "For a string of N octets, you need a box of sizeof(ysb) plus ysb::bmore(N) bytes?"

     "That's right," Wil nodded. "And just like ypvb peg vectors can be turned into scatter/gather arrays, later you'll be able to build scatter/gather string arrays using primitive ysb."

     "I bet you could page scatter/gather strings to disk," Zé guessed. "You'd only need btrees of string nodes with some ref swizzling. You can do gc-based virtual memory."

     "Well that's only been obvious to me for ten years now," Wil sighed. "Maybe I'll have time to do it this time around."

     "Of course," Zé considered, "you don't have to use gc strings for that — you could use pages or books outside garbage collected space. Or some mixture of the two."

     "Yes, yes," Wil granted. "We'll have a regular orgy of clever data games to play, just as soon as the toy language is done."

     "No need to get snippy," Zé defended. "I'll be patient."

misc «

     Boxes in this section are either stubs for future features, or else immediately necessary elements of behind-the-scenes work with purposes slightly hard to describe.

ytreeb «

     "Box ytreeb is a stub for future btree work," Wil introduced. "The 7-bit alt field in the tag is the node height. Inner nodes are ytreeb but leaves can be anything."

     "For example," Zé suggested "leaf nodes could be vectors or strings. The the tree would actually be a scatter/gather array? Or is something missing?"

struct ytreeb { « static const int e_bk = yt::k_tree; « yp b_node[1]; // first of yt::len() peg nodes «

const yt& btag() const { return ((const yt*) this)[-1]; } « bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

static u32 n2t(n32 n) { return (n<<16)|yt::v_tree; } « };

     "Yes, something is missing in ytreeb as shown," Wil said. "To make IronDoc style btrees, each node pointer also needs a count of leaf entries reachable in that direction."

     "That's what supports O(log(N)) time integer key indexing, right?" Zé asked. "Modeling an array with a tree?"

     "Exactly," Wil agreed. "So instead of yp as shown, the type of p_node should be a pair (yp,N) where N is a count of leaf entries that way. I derived this from Exodus btrees around 1997 or so."

yvirtual_s «

     "Is this yvirtual_s box some kind of funky foreign object interface," Zé guessed. "No wait, it's not a box is it? The name doesn't end in b. Is it a hand-rolled vtable?"

     "Yes, it's a stub for a vtable used in yfarb below," Wil said.

struct yfarb; struct yvirtual_s { // stub: future function ptrs vtable « typedef int (*bye_t)(const yfarb& inFar); « bye_t b_bye; // called when yfarb is collected « yvirtual_s() : b_bye(0) { } « };

     "But this vtable only has one function pointer," Stu objected. "The only thing you can say to a foreign object is goodbye? Why not add more things? A zero pointer can mean 'not supported'."

     "It's just a stub until I add more later at need," Wil explained. "But I know for sure I must notify foreign objects when they're no longer referenced in gc space."

     "What does the final s in the name of yvirtual_s mean?" asked Stu. "Struct? Or something more subtle?"

     "I think the s stood for static," Wil replied. "Because yfarb foreign object boxes assume there's only one global vtable instance per foreign object type — kinda like C++."

     "So you can use yvirtual_s pointer address itself as type identity," Zé concluded. "What if I generate them dynamically? How do I avoid repeating an address?"

     "Look," Wil tried to be gracious. "That's not my problem right now. How can I get anything done answering hypothetical questions about things not even used yet?"

yfarb «

     "So this yfarb foreign object box actually uses the yvirtual_s vtable shown above," Zé noted. "Reminds me of yobjb."

     "Yes, I see that," Wil considered. "Box yobjb starts with a yclassb pointer, and here yvirtual_s serves the same purpose, except yclassb is garbage collected."

     "And yvirtual_s isn't, okay," Zé acknowledged. "I guess the number of times yvirtual_s is used in yfarb boxes is not counted? So no vtable refcounting?"

struct yfarb { « static const int kind = yt::k_far; « yvirtual_s* b_virtual; // interface to this foreign object « void* b_void; // pointer to out-of-line foreign object «

yfarb() : b_virtual(0), b_void(0) { } «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_far, e_bt = ((sizeof(void*)*2)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

     "Nope, no vtable refcounting," Wil agreed. "And that's why yfarb assumes each yvirtual_s is global and permanent; vtable lifetime is assumed indefinite."

     "If you add a hello virtual method to the api," Zé realized, "I could refcount yvirtual_s vtables myself."

     "Okay," Wil held up his hands. "Let's take it up later."

     "The out-of-line b_void pointer takes the place inline slots used by yobj," Stu noticed. "Is that better?"

     "The foreign object is somewhere else so I don't get involved in its memory managment," Wil explained.

     "You don't want to move it during gc either," Zé added.

ymapb «

     "Is this ymapb box going to be a hashmap?" Stu asked. "Or maybe an STL style sorted tree?"

     "I can't live without hashmaps," Wil opined. "But I don't think I can settle on exactly one format. So variety seems likely. One of the map formats can be a btree, sure."

     "I didn't say btree," Stu squinted suspiciously.

     "Well I like btrees," Wil expanded querulously. "And they make good models for disk-based indexes, so that's what you're getting. Wait, did I just bite your head off?"

     "Drink less coffee, Wil," Zé urged.

struct ymapb { « static const int kind = yt::k_map; // tbd « yp b_stub; // tbd «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_map, e_bt = ((sizeof(yp)*1)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } };

     "Why didn't you name it map instead of dict?" Stu asked. "Dictionaries are cool. Is it just about brevity? "

     "It's not about power, it's about grace," Zé quoted from Bulletproof Monk with a smirk.

     "Shut up," Wil snapped with a smile. "The terms map and dict are synonyms, but the first is shorter and not an abbreviation. But I'm actually likely to use both terms."

     "You can use one name to imply one sort of format, and the other for a different format," Zé suggested.

     "Still won't be enough names to go around," Wil sighed.

     "The idea of using different names for different perspectives is giving me a weird feeling of déjà vu," Stu croaked.

     "It'll pass," Zé predicted.

yiterb «

     "Argh, are you planning to use iterators at the language level?" Stu cringed. "I'm assuming yiterb is an iterator box."

     "Maybe," Wil replied. "Think of yiterb (and variations specific to each collection type) as a way to think out loud about iteration. It might make some internal C++ api simpler."

     "The normal way to iterate in Smalltalk is by passing a block to execute per collection instance," Zé reminded.

struct yiterb { // stub for collection iterators « static const int kind = yt::k_iter; // tbd « yp b_set; // the collection being iterated « yp b_here; // current iteration member « int b_pos; // position in iteration «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = yt::k_iter, e_bt = ((sizeof(yp)*3)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].iskind(e_bk); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yiterb() { b_pos = 0; } // default yp sets value to yp::i_nil « };

     "That's right," Wil agreed. "And continuations and timeslicing are used to avoid making iteration monolithic. But it might be awkward to force C++ to depend on a high level language mechanism: a dependency can get me stuck."

     "Explicit iterator objects in gc space might help glue together code at different levels of abstraction," Zé suggested.

     "Whoa, that's way more than I needed to know," Stu backed off. "Can't you just hide details and show me pure style?"

yerrb «

     "I take it yerrb is an error box?" Stu asked. "It uses a funny feature in tag though: what does the term rig mean?"

     "Yes, it's a placeholder for more verbose error objects I might want to use," Wil said. "And rig is a special kind of tag that's basically an escape code — it's a jury-rigged box or a specially rigged box. Rig means harness, setup, frame, or tooling."

     "Sounds like you use rig-based box tags to avoid using Smalltalk style objects," Zé suspected. "To let you have as many special primitive level objects you want."

     "Yeah, rig boxes are part of the trusted base," Wil agreed. "Circular dependency can occur if they aren't primitive."

     "Kind of like a high level application object," Stu guessed. "But hardwired in C++ to be very specific? No wiggle?"

     "That's about the size of it," Wil granted. "Meaning of such boxes might depend on the rest of a language runtime."

struct yerrb { « static const int kind = yt::k_rig; « i32 b_err; static const yt::Trig rig = yt::r_err; « yp b_msg; // if not nil, more info (eg string or symbol) « yp b_why; // if not nil, cause of err (eg token object) «

const yt& btag() const { return ((const yt*) this)[-1]; } « enum { e_bk = (yt::a_err|yt::k_rig), e_bt = ((sizeof(yp)*3)<<16)|e_bk }; bool bkind() const { « return ((const yt*) this)[-1].isrig(yt::r_err); }

bool bt() const { return e_bt == ((const u32*) this)[-1]; } yerrb() : b_err(0) { } // b_msg, b_why default to i_nil « yerrb(int e) : b_err(e) { } // b_msg, b_why default to i_nil

enum Perr { « e_base = 0xe00, // base reserves start of yerrb err range e_max = 0x100, // assume no more than 256 err codes (?) e_high = e_base + e_max // assume <= 256 err codes (?) }; }; // yerrb

     "And this yerrb error box is an example," Stu tested.

     "Yes, because control flow for error handling must be primitive," Wil expanded. "I might use something like this in uncaught exceptions. But it's too early to codify."

     "Why didn't you include any of the other rig boxes?" Zé asked. "Frame objects used in context and environment management would be interesting. Do they belong here?"

     "Yes, they do belong here," Wil replied. "And I might add them later when I can include fully baked usage code, too."

     "Why don't you just use vectors for specialized objects?" Stu asked. "Isn't that how objects are done in SICP? The first slot can have metainformation like an object name."

     "I don't know if you noticed," Zé interjected. "But yobjb is the same thing, except the first slot is a Smalltalk style yclass class box. So Wil, why not use yobjb subclasses more?"

     "That would be a good idea if I was writing a metacircular system," Wil said. "But I'm not. Or at least the first toy language won't be metacircular. It's too changeable."

     "So you just want to hard-code everything, huh?" Zé summarized. "I think a base with more degrees of freedom would be more fun. Playing with it would be a blast."

     "It would also be harder to debug," Wil claimed, "because what could be happening at any given time might be very complex with spectacular indirection."

     "That's a no to fun then, I take it?" Zé moped.

     "You read the weight page, right?" Wil asked. "Limiting scope is a way to control all kinds of things that might push out time it takes to hit milestones I want in simple effectiveness."

     "Simple is boring," Zé protested.

     "Live with it," Wil dismissed. "If you need more fun, go mess with Dex's head or something. Just don't give him a nervous breakdown or anything."

     A twinkle showed in his eye, and Zé asked, "Do you know where Dex is? And what did he do to deserve it?"

     Wil drummed his fingers. "I'll show you," he said finally, getting out his cell phone. "I'll call him in here."

fiction «

     After several minutes of hearing Dex bitch and whine about missing the box review, Zé decided that was enough. It was the phrase "no talent losers" that got under his skin.

     "Why is it so earth-shatteringly important we do things the way you want?" Zé asked. "This is little stuff. What difference does it make? When I go to the beach, how will tiny format details matter? You're obsessed. Relax and go with the flow."

     "Shove off, you little twit," Dex sneered.

     "Okay, that's it," Zé advanced. "Pull my finger."

     "I'm gonna kick your freaky ass," Dex warned.

     "For someone who wants to kick my freaky ass," Zé said, "you do a lot of talking."

     "Don't let him touch you," Wil warned Dex perfunctorily, causing Dex to back away slowly.

     Zé caught Dex's wrist when he darted around a table.

     Now Dex and Zé stood at the side of a remote crossroads, in the middle of nowhere with dusty, dry, plowed crop fields on either side of a road receding into the distance. Both wore nice period suits. Dex's eyes bulged like he swallowed a bug.

     "That's funny," Zé stared into the distance.

     "What?" Dex choked, still trying to get his bearings.

     "That plane's dusting crops where there ain't no crops," Zé replied smoothly, still playing the script.

     "Where the hell are we?" Dex managed finally.

     "That's not your line," Zé tsked. "Indiana, 1958. And that's my bus coming now, right on time. I should get on and let you practice dodging bullets by yourself."

     "Uh," Dex searched for words as Zé waved off the antique bus that was stopping. "Is this where Wil grew up?"

     "Close," Zé replied. "Iowa does look like this. Ironically, that's actually where Cary Grant is from, too. Wil's Dad looks a bit like him. Better keep your eye on that crop duster."

     The sound of the biplane grew ominously as it zoomed straight for the two of them.

     "Ahhh!" screamed Dex, throwing himself to the ground as the plane passed overhead. Hot dust rose up and got in his mouth, tasting terrible, almost distracting him from the sound of ricocheting bullets.

     Zé picked himself up first, like this was all normal. "You know, it's funny your character is named Thornhill," Zé marveled. "Wow, I never noticed that before."

     "Whatever lesson you wanted me to learn," Dex sold with a note of near hysteria, "I think I got it now."

     "Look out, he's coming back," Zé started to run.