Þ   briarpig  » log


demos

     Lately I write demos under thorn with a todo list using C++ under a BriarPig mu-babel license. New toy language material about lathe (Lisp under thorn) appears under mu with update indexes itemized on toy.

     The run and hex demos were done on 13apr2008; crc on 14apr2008; buf on 20apr2008; in on 27apr2008; ctype on 04may2008; out on 18may2008; slice on 25may2008; quote on 31may2008; escape on 31may2008; mutex on 14jun2008; rand on 16jun2008; stat on 17jun2008; primes on 19jun2008; list on 23jun2008; heap on 29jun2008; iter on 02jul2008 and 04jul2008; atomic on 06jul2008; node on 13jul2008; page on 23jul2008; hash on 27jul2008; book on 27jul2008; pile on 03aug2008; stack on 07aug2008; mu on 12aug2008; toy on 16aug2008; weight on 23aug2008; symbol on 02sep2008; imm on 04sep2008; gc on 06sep2008; map on 14sep2008; meter on 16sep2008; thread on 22oct2008; arc on 05nov2008; this menu links all demos:


     mu: toy, peg, imm, tag, box, symbol, token, number, bigint, class, method, reader, writer, eval, env, vm, gc, world, pcode, compiler, asm, lathe, lisp, smalltalk, design, weight, jar, card, harp, debug, profile

     thorn: todo, names, iovec, assert, log, run, hex, crc, buf, in, out, quote, escape, compare, file, deck, cow, arc, blob, tree, slice, rand, time, stat, heap, node, primes, page, book, pile, stack, atomic, lock, mutex, thread, map, meter, list, iter, ctype


14nov08 thoughtful faces

type classes

     Jonathan Shapiro kindly wrote a generous response on the bitc-dev mailing list to a question I asked recently at the Lambda The Ultimate blog. I asked about terms typeclasses and polyinstantiation — which I had not heard before (because I don't really bother to keep up with programming language terminology) — because naasking had suggested v-tables in BitC might be unnecessary given these other mechanisms. It looked like a good opportunity to ask a newbie question, since I never mind looking ignorant. (Which is convenient, since I'm often ignorant.)

     Over at LtU, David Barbour gave a good short description of polyinstantiation, including a comparison to static specialization in C++ templates, which made me infer the term means something like: "polymorphism via type specialization at point of instantiation — usually static but could be dynamic." (Because you can always make anything dynamic, of course: late binding is easy, while early binding is hard since it narrows options.)

     I'm replying here because I've written long posts at LtU in the past making me self-conscious about verbosity while still not letting me say as much as I'd like. I'm happy to respond at LtU when I can keep it short. Note I don't use my last name on this site, and I'd rather not see it used by others. (I had a taste once of greater notoriety, and I don't like it much. The first name I adopted a few years and use at LtU is just fine to keep a low profile.)

     Sorry for a long setup; the rest of this is response.

     C++ does tend to tie too many things together. I'm not an apologist for C++, just someone who uses it a lot. Please don't take me for one of those provincial coders who think v-tables are the right way to do things. An array of function pointers is just one representation of a map, and is equivalent to (but more fragile than) dictionaries in other formats. Smalltalk style method dictionaries are nice and flexible, and this hashmap approach to api dispatch is used by Self and Python and other languages to similar effect, though Python uses it for everything including member variables; I gather objects in Python are tuples in the form of mapped name/value pairs. As far as I'm concerned, Objective-C's approach to method maps and dispatch is superior to that in C++, but I haven't had opportunity to use it much. My work life would have been simpler if Objective-C had bested C++ many years ago. Oh well.

     When I asked about space efficiency of typeclasses as compared to v-tables in C++, I was wondering whether it resembled (just guessing blindly of course) something like Ian Piumarta's approach in cola, which I glanced at some months ago, just long enough to get an impression it aped structural aspects of both Smalltalk class pointers in objects and C++ vtable pointers in objects — embedding a reference to metainformation about a class somewhere in the physical object representation, where it can be found at runtime by dynamic dispatch for polymorphism. Oh, and yes, yes, I know Piumarta is a god (someone I admire) and I mean no disrespect by using casual verbs like ape.

     The answer I was hoping to hear was one like, "Every single object spends a little space on overhead to access class oriented metadata in a very direct manner, instead of finding very clever indirect ways to factor that overhead out of every object." I like my indirection straight and without too much gamesmanship in optimization. I have a knee jerk negative reaction to fractal sounding solutions (where every rule has an exception for special cases described by new rules with their own exceptions, ad infinitum).

     I'm not really familiar with Java interfaces, except that when I first read about them in the 90's they sounded so close to Objective-C interfaces I assumed it was a direct copy. (Or they're cousins since essentially the same community of folks would have been involved.)

     The material on dictionary resolution you mentioned sounds like the part I find interesting. The interactions between multiple types is likely where I'd worry about problems. What I'd really like is a simple operational definition of outcomes from a casual coder's point of view: where should I look in code to resolve conflicts myself? Can I tell — just by looking at the code — how the resolution will work out in the compiler? Or is it too complex to do by hand in practice?

     This bears very closely upon an aspect of generic functions in Dylan that put me off the language: I had trouble guessing which method would be called by my own casual, mental static typing. I didn't want the runtime to "do the right thing" — I wanted to know myself what would dispatch a priori. In other words, I want to do global static analysis myself, without any maybes or probablys involved.

     One of the reasons I liked Smalltalk with single inheritance was ease of deciding where to look to see what method executes, once I made an assumption about what actual types were passed to methods. If you like, think of me as coding in object oriented assembler. I still want to imagine a deterministic assembler model despite a nice high level language.

     I'm glad you wanted to resist dynamic allocation of dictionaries. As a user I would be satisfied by a feature allowing me to turn off dynamic dictionary allocation for types I cared most to control.

     Incidentally, I like your use of capsule as an alternative for object, since the length is short enough, the syllables few enough, the relation to encapsulation close enough, and odds of confusion with random associations low enough.

     Thanks a lot for expanding on typeclasses. Mechanics of dispatch is one of my favorite topics, which is why I had trouble resisting an urge to ask for elaboration. I think of you as someone whose time is much more valuable than mine, so I'm flattered to get my interest addressed.

     I'm not sure I was confused (and I lack the modesty to claim I was), but I sure was ignorant about typeclasses and I thank you for tutorage. I hope it helps other folks familiar with C++ become more familiar with BitC in areas permitting some cross over. In some ways C++ looks like a train headed to run into a mountain without benefit of a tunnel, so it would be nice to have options.

simpler C++ rules

     Coincidentally (given the sub-topic in shap's remarks about interfaces generally being better than v-tables specifically) this afternoon I added similar remarks along these lines to a wiki page at work outlining future ground rules for a style of writing C++ embedded inside C oriented processes. My first draft of C++ usage rules concerned things not to do, in order to have no runtime impact on nearby C code linking no std C++ libs.

     The purpose of the wiki page was to inscribe a circle around a set of things to be explicitly permitted by C++ code lurking inside a C-only environment, as distinguished from another set of things explicitly denied to C++ in the same context. But a reviewer asked me to include guidelines on use of inheritance, explicitly spelling out what is considered good inheritance style. In particular, for components intended swappable with replacements, the best approach to minimize future conflict uses pure virtual abstract base class interfaces, with neither inherited state nor implementation to cause coupling in subclasses.

     Naturally I agreed and formalized rules a bit for the wiki page to encourage good inheritance style in the direction of abstract interfaces. But in practice, extremism of interface purism is one I seldom bother to pursue in my own casual projects, like ones I post on this site. In code you see around here, I make no effort to enforce pure separation of abstract interfaces from concrete subtypes implementing them. I could, but I don't. So, why not?

     When you cleanly separate abstract interfaces from concrete implementations, you tend to get at least two sets of names: the abstract ones and the concrete ones, with the obvious one-to-one mapping, of course. Proliferation of code names often upsets many people — or at least most folks I see all the time. I tend to create so many class names in the first place that I already get complaints, even before adding more by separating abstract from concrete.

     The number one request I get is always the same: please write less — less of everything: fewer classes, methods, docs, and checkin notes. Just less, please.

     When I try to make code replaceable by substitutes, I'm the only one who ever does it (currently anyway). And the abstraction making it possible is usually criticized. Because essentially, abstraction is also obfuscation. Code written abstractly is much easier to change, but is somewhat harder to grasp. The former counts for little, and the latter counts for a lot, in day jobs of late.

     Sample code on this site blurs edges between abstract and concrete, to make it more clear and to give me less to write, both. The result is hacking fodder, mostly. Which is more or less what it's for: grist for future mills.

12nov08 war bucks

tinkering

     Rather than writing web pages, I've been tinkering with new code. I was aiming for a new blob api, but I veered sideways into a new file api rev because it seemed plausible to make a blob mimic a file. Then I kept veering into some pseudo memory mapping support for files.

     So effectively I'm cooking up some content for a file demo (which is still a stub) because it will help me frame some context for blob content. And it wouldn't hurt to have demos get slightly code heavy. But I'm sure I can go too far that way, getting too many code names.

     (In code design, when you favor orientation toward data, the result is often more flexible and adaptive. But when each type needs a unique name, you get a lot of names; for a complex task, like paging iovec-based files, it's easy to use many classes with only slightly different names, because their meanings are intertwined.)

     Since it has a high amusement factor just now, I'm not throttling much. So I might just crank out code for several related demos, all oriented around schlepping around bytes in blobs, files, and wherever. I might write demos as if Wil composed code in stream-of-consciousness mode, in response to Zé asking "what if" too many times.

     Half baked results would be both good and bad: fertile suggestions and lines of inquiry, but you'd need to boil it down yourself, if you want minimalism.

tombstones

     This last week I updated code for a closed/probing hash map (using tombstones) so accumulated entropy in the form of tombstones could incrementally boil off. I was seeing too high a percentage of tombstones in map occupancy (bordering on pathological). So I thought about it; then I said oh yeah once I'd simulated it a little.

     In a closed hashing map which probes to find unused slots when the target bucket is used, you only know a search has ended when you see an empty slot. As a result, when you delete a map entry you must replace it with a tombstone if you don't want to rehash, in order to keep searches valid for buckets used downstream from that spot.

     When do tombstones go away? When a successor is an empty slot. So removing tombstones is dependent on getting them next to empty slots. High frequency of empty slots is important to maintain constant time access.

     I decided my last map implementation wasn't working hard enough to remove tombstones — I hadn't asked myself how to make them go away proactively. But once I did, an obvious approach suggested itself.

     Say you search for key K in a map, but the hash H=h(K) points at bucket B, and K is found in a slot downstream to the right of B (because the slots were used when K was added). When some older keys are removed, you might run across tombstones on the way to finding K. If you do, then swapping K with the first tombstone slot improves the map two ways: 1) K is closer to bucket B, and 2) a moved tombstone gets closer to the next empty slot. Both these improvements remove map entropy, and you can do this incrementally in every map lookup.

     So far I call this bubbling tombstones rightward because it pushes tombstones to the right on each swap, getting closer to the next empty slot each time, promoting conversion of tombstone slots to empty slots when they become adjacent.

     When the total empty slot count gets too low, the same tactic can be pursued more aggressively by considering other keys seen in passing besides K, the target of the search. For example, the next used key K' in a slot that is not equal to K might also rehash to an earlier tombstone, which would bubble another tombstone rightward.

09nov08 shanty towns

archeology

     I started trawling through old docs about blob btrees in IronDoc, thinking about how to write a demo and whether to include significant code therein, and I noticed an odd thing: I wouldn't do blobs the same way now.

     So I've been asking myself: What changed? How is it so obvious to me now my earlier approaches were too complex? Did I get dumber? Did my standards change? Or did I learn useful rules of thumb to avoid complexity quagmires?

     The part that seems too complex now inserts an arbitrarily large data fragment in an existing blob stream. My approach aimed to do this in the cheapest way possible, by calculating final end effects on a btree, at the start before making any changes. As a result, my old code touches nodes the smallest number of times to build a new final tree.

     But that wasn't necessary. The analysis for cheapest cost was clever, gnarly, and long — all in a totally inappropriate way if you wanted someone else to grasp it all with little effort. It's as if I didn't care what it took for someone else to come up to speed. I only cared whether I grasped it.

     That's one red flag. Another is absence of the simple approach in old code: I didn't write a simpler but equivalent variant to which the complex version could be compared. Now that just seems wrong, for several reasons, the most important of which is denying others an easy version to walk through. My standards have moved markedly in a direction of letting others review code cheaply. An important quality in code is ease of letting others convince themselves it works.

     Ten years ago I was too eager to optimize code, when my priority should first have been to ensure it was obvious to others. Now I no longer feel very complex code is to be trusted, ever, without a simpler version to which it can be compared for equivalence. And even then only when complex optimization is needed to meet performance goals.

     The next version I write will do only the simple thing. But I don't expect to write a disk-based version using a page cache, because that involves details I don't need at first for a blob btree working in memory only.

     At the moment I'm inclined to insert a large new fragment into an old blob btree by first building a new separate blob btree describing only the incoming fragment. Then I'd write code to insert one blob btree into another at some offset. This would incur extra cost where the two trees join edges. But it's an easy divide and conquer strategy, whose code is re-used when you actually do insert one btree into another.

08nov08 waterfalls of caches

job roles

     I had a really intense week I'm not saying much about, but I was really busy all the time. Friday at work included a meeting about what I'll be assigned to do next, which was largely crafted to suit whatever it was I wanted to be doing. So far it looks like I'll have a lot of interesting stuff to think about, with more on my plate than I'll have time to address, which I tend to like since being bored is worse than being slightly overloaded.

arc addendum

     I added slightly more material to the arc demo so it addresses a few more quirks to arise later in the blob demo. The bottom of each column is longer, with more in the right column.

copy-on-write

     Today I'm planning to write the cow (copy-on-write) demo next because that's a good way to offload more material that might otherwise creep into the blob and tree demos. They might be too long unless I manage to cut potential side issues.

     I was thinking about explaining IronDoc style structured storage transactions, and I realized it was technically another form of copy-on-write at the block API level, and the cow demo is still a stub, so I can cover that there. At the same time, I can ditch any other cow side-topics for blobs and btrees in that demo too.

     Here's a short version of IronDoc transactions. (I'll say this in more detail in the cow demo.) Files are allocated in blocks, mapped in an allocation bitmap of free blocks. A db file actually has two such allocation bitmaps: the old and new free bitmaps. A block can be (re)allocated only when it is both free now and also free previously, since otherwise modifying a block used in the last transaction would lose data you have not committed to discard.

     Any blocks used in the old worldview (not free in the old bitmap) are physically immutable, but can be logically modified because copy-on-write occurs. When modified, each old immutable block at offset X is remapped to a new offset at Y, where Y is allocated from new or previously empty space; for the rest of a transation, any attempt to read or write X actually uses the block at Y instead. Until a transaction commits, both X and Y are maintained independently to preserve both the old and new versions of X.

     Using this model, all changes to a db file occur in space that was either empty (free and containing don't-care values) or beyond the old eof from the view of the old database transaction snapshot. An abort for any reason (including process termination losing any unwritten changes) simply leaves a db file with the same content it had before — in old blocks. Changes occur only in unallocated space, from the view of the last committed global transaction.

     Some folks notice this feels morally equivalent to a log structured file system, since changes log into new space.

     This particular way of doing it was designed to suit end users for Apple MacOS environments, because a single file hosted both pre and post transaction commit data, so any abort or commit always had everything necessary in one file. This way, if a user crashed and afterwards immediately copied one document they knew about to new media, they would always get everything, because no additional special or hidden transaction log files were involved. The first priority was always the same: never lose the user's data.

05nov08 airtight compartments

arc demo

     The new arc demo is now all finished. It overlaps with content to appear in a near future blob demo, while also covering arc oriented ideas applicable other sorts of data structures. (All map data structures indexing data found elsewhere basically contain arcs as entries.)

04nov08 slap buttons

new games

     I bought my sons new games over the weekend, which they've played a lot since. Tonight they discovered a very funny feature of Little Big Planet requiring some explanation. Watching them use it was one of the funniest things I've seen them do in a long time.

     Note Little Big Planet (let's abbreviate to LBP) deserves a long review, because it's really interesting. Watching my two sons start a tutorial for creating new environments in LBP was the first time I thought I was seeing something new in software dev growing in a context of end user game creation tools. However, I'm not going to say much about what's cool in LBP tonight. I'll focus on just one funny part.

     But to set up the joke, first I'll tell you about my younger son (let's call him Fenrir, but that's not his name) when he played Fallout 3 first on Sunday. Fallout is a cool-looking, post-apocalyptic game made by the same folks who created Bioshock, which in some ways resembles an above-water version of Bioshock, in what I see so far. Part of the appeal is retro cultural elements.

     The game Fallout begins with the player's game character growing up — Fenrir was irritated when his character reached the age of ten and had to suffer the social burden of a conventional birthday party with a few post-apocalyptic twists. The game has dialog and permits Fenrir to choose one of several responses he can make.

     When adults patronized Fenrir's character, he said, "I want to slap you. Where's the slap button?"

     For the next half hour he complained about not being able to find the slap button. Slapping other characters was never one of the offered choices. Fenrir and his older brother made a running joke of the complaint.

     Okay, now let's fast forward to Little Big Planet again, still called LBP for short. The two of them played a couple games in two player mode, in which each controlled a little character in the form of a bean bag doll called a "sack person," resembling a generic, detailess beenie baby of a human being. LBP renders sack persons and their environments in exquisite three dimensional detail, mildly realistic physics motion, and good sound effects with wonderful background music.

     The game controls give each player fascinatingly detailed control over the movement and facial expressions of each sack person, so they have enormous personality when a player wants to emote through their sack person.

     So it's quite easy to project yourself onto the sack person you control. My sons spoke of their sack persons as themselves, and acted that way too, with respect to invasion of space and other forms of mutual interference.

     Just imagine their surprise when my older son made a sudden gesture, sending Fenrir's sack person flying in a full body sprawl, along with the meaty and satisfying smack sound of a righteous slap — a bit like a punch sound effect from other games, but more in the slap direction.

     Naturally they explored it more, losing themselves in an epic slapfest competition that was hilarious to watch. I found myself studying exactly what happens to the slapper and the slappee each time a slap event occurs. If you look closely, you eventually notice the sack person delivering the slap shakes his hand and arm afterward as if shaking off the pain of the impact. It's weirdly subtle.

     The slappee — the slapped sack person who falls — does a nice pratfall, typically rotating in the air about 270 degrees to land on their face. Each slap lifts the target sack person up off their feet as they spin in the air before falling in a heap. It's a tiny little physics drama/simulation of puppet scale comic violence. And something about it is really funny.

     As my sons played more game levels, they periodically stopped to slap the other sack person. And on the podium celebrating the winner of each level, they always slapped each other, sneaking up on each other for a sudden backhanded slapstick assault.

     One got up to take a rest room break and warned his brother, "Don't slap me." But he was hardly out the door before the first slap was delivered. The sack person with no controller stood up again after each slap, only to receive another slap meted out by the mischievious brother, which lead immediately to a fit of rage in the brother who left for a break, who shouted, "I told you not to slap me!"

     Anyway, it seemed sociologically interesting.

02nov08 courtly knaves

partial arc demo

     The new arc demo is about two thirds done. I spent a lot of my writing time creating images instead of writing. I started with a very slow exploratory search for techniques that might work. I eventually made fast enough drawing progress, once I found a plausible way to render btrees in a way I could contrast in perspective views. But it was a slow ramp.

btree figures

     I stayed up into the wee hours of the night last night drawing figures for the arc demo. Naturally explanations need to be written for full value. But here's a couple:

     The (near) perspective view below was constructed by hand, shooting for a common vanishing pointing point for both layers. But the rear blue layer doesn't have a vanishing point matching the red layer's close enough. But it's good enough to get across the right idea.

     I added the faint background grid specifically because I knew it would help provide context revealing an intent to render perspective. The two graphs would just look distorted without the grid.

29oct08 ids vs superegos

busy december

     I expect to be rather busy this December and January, so I likely won't do as much on this site during that period except during holidays and weekends when I turn to hobby play. Expecting that slowdown, I might finish a little material before December in new arc, blob, and tree demos I just added as stubs. The latter two will reprise some IronDoc designs in capsule form; I'll try to make the code as small as I can, leaving out details needed for database use, which you might re-add by thinking about other ways to represent an arc besides in-memory pointers favored in demos.

opening paragraphs

     There's not much to read on those pages yet beyond opening remarks. And here I quote all the opening paragraphs, so there's no need to see the new demos until I post more updates.

arc opening

     As described in the graphs subsection of the node demo, Wil sees most computing system representions as graphs in both code and data. Nodes are typically somewhat over-represented; structs and objects for state are typically favored in code description to the exclusion of arcs: the way you refer to something else. But careful arc design is a good way to solve problems too. Tuning graphs can target either arcs or nodes — sometimes both.

     This page uses the term arc to mean the same thing as arrow or dart or directed edge in an abstract pointing role in a graph. You should think "arrow" but focus on graph edges. Concrete arcs can be pointers, absolute offsets, relative offsets, map keys, and/or combinations of these in series. A chain of arrows acts like one arrow with extra internal detail exposed.

     Wil rarely bothers discussing this perspective, because arc design folds seamlessly into effective coding. However, this also means an arc perspective is usually buried, and Wil rarely tells you the other options that would have worked too. So when Wil shows a very specific way to code blobs or btrees, you might have trouble seeing the other ways to code them using different arc formats to point between nodes. To help you with this, Zé offers material below on arc ideas, and later nags Wil to present arc agnostic designs in future blob and tree demos, along with Wil's concrete code.

blob opening

     In the middle 90's Wil started using a specialized btree format to represent blobs in storage systems — where a blob is basically hard to tell apart from a file embedded in a database — in imitation of tree formats described by Exodus at the University of Wisconsin (cf Storage Management for Objects in Exodus).

     Given any block-oriented media for allocating space for use as nodes in tree structures (for example: pages in files using pseudo virtual memory) you can build a "file system" representing blob "files" as btrees in a manner called Exodus-style btrees in IronDoc literature. This page calls them blob btrees.

     Designs and code shown on this page present a way to encode scaling byte sequences as blobs, much like files, using a tree to index leaf nodes of content. Blob btrees index using a simple scheme: inner tree nodes just count bytes of leaf content. So seeking any offset can simply skip bytes to reach a desired point. Seeks, inserts, and deletes are all logarithmic (O(logN) means "log time") at all offsets in a blob file; blob edits have the same order of performance as appending to a Unix file. So for example, deleting the first kilobyte of a 100MB blob is just as fast as truncating a kilobyte at the end — just another efficient btree edit. It's unnecessary to touch bytes downstream from a blob btree edit.

tree opening

     For use in persistent maps (or dictionaries) which scale much larger than main memory, Wil prefers to use a format called a btree (wikipedia) whose name means balanced tree. Traditionally the name is written b-tree, or sometimes b*-tree (denoting a variation with non-root nodes aiming for least two thirds full, not just half). But Wil adopts a common practice: dropping hyphens in frequently used terms. So Wil writes simply: btree.

26oct08 insensitive clods

mental drafts

     During early phases of design or exploring ideas, it often goes faster to imagine everything instead of drafting a code api, because actually writing is slow and distracts with details not pertinent to issues of interest. I've been imagining a mixture of data structures, diagrams, and approaches to plain english descriptions.

     I'm half certain I'll precede a btree blob demo with another about arcs, if you use the term arc as a synonym for pointer, or reference, or some way you refer to something you want as part of a node in a graph. (If you can't stop yourself from thinking of Graham's Arc programming language, just imagine I'm saying "arrow" or "dart" instead, or your favorite synonym for arrow.)

     The arc demo would be "written" by Zé instead of Wil, since it's mostly about ideas and options, and doesn't involve code per se, which is Wil's territory by convention. Zé wants to briefly touch all the many possible options in representing an arc, in order to introduce the idea any particular way you pick to write code for graphs — btrees in particular — will tend to prevent you from representing other sorts of graphs using different arcs.

     The idea would be to introduce a little terminology, and maybe some conventions for representing schematic graphs in s-expression syntax, to enable description of btree psuedocode without committing to a specific way it would appear in C and/or C++, even though Wil's code will embody algorithms in concrete choices.

     I think I'll make one pretty Photoshop diagram of a schematic btree graph with some shared structure. Then I'll present a text representation of the same thing (using Lisp notation for labeling shared structure) to define linear text conventions intended to mean the same thing as pretty diagrams I'm not going to draw. Then all the pseudocode descriptions of btree operartions will be illustrated with diagrams of the linear text sort.

blind chess

     When I was in my 30's I had to actually write code to work out all the details. Staring at an api helped prop up my working memory so I could imagine uses of it and possible conflicts.

     Sometime in the last few years, I noticed I didn't need to do that anymore. By the time I worked at Netscape I could design code right in the middle of conversations where folks stated requirements. But I still got a better picture of the design later when looking at the code. A few years later, it stopped making a difference: looking at code didn't make me work out answers faster or even better.

     And I can actually visualize what C++ code will look like in my mind's eye when I want, so writing it is just transcribing from memory. But usually I don't work out every detail, when it's obvious, unless really interesting. The net effect of this is what you'd expect: I don't need to sit at a keyboard to get work done. I can code when I'm driving, and so I do, of course, as much as it might chagrin you to learn. But usually there's not much reason.

     Sometimes in meetings at work, when my attention isn't really needed, I'm actually writing code or planning how it will go, working out the details since the limit in speed is mainly knowing exactly what the plan really is. Maybe 18 months ago, the project director would occasionally glance at me, and I'd say yeah, I'm working out the code plan — what I'm going to write next when I get back to my keyboard. He'd say, yeah, the plan takes all the time.

     However, actually writing code is when you can try to make it shorter and clearer than it might have been. Size and complexity cost nothing in your mind's eye, but you can see both take up space in black and white text. I imagine another developer reading my code as I write, and guess whether parts will be too unclear. When you're typing is an opportunity to sharpen clarity.

     Or, when you haven't worked out the whole plan yet, you spend half your mental cycles working out the rest of the plan while filling in the place you see right now.

25oct08 big fish eat little fish

btree sideshow

     I keep thinking I'll go right back to toy language material. But it doesn't seem most appealing at the moment. I find myself thinking about btrees again instead. For some reason that seems fun right now. And I try not to question my urges in pursuit of fun hobby stuff, since I don't want to scare off the muse by being a grind when it comes to scratching itches. Being a grind about what you think you should be doing is a good way to leech all the joy out of the process. And I've been looking for the joy.

     There must be a phrase for what I've been doing. (Or maybe I'm recalling part of a science fiction novel.) If there isn't a phrase, we should coin one. Feel free to tell me which one is used most often by folks in creative pursuits. Anyway, I've been courting inspiration by going through the motions in my demos and toy language material. Maybe you thought I was having fun, but I wasn't quite — not just yet anyway. But I was asking for it. I thought if I went about my business as if I was having fun, eventually it would start to happen, and I'd finally be back in the zone: the good place where work and joy meet in a mutual, nurturing, virtuous cycle.

     I thought this would happen first in toy language stuff, but it still has some uninteresting grind phases to go through first, recapturing known past stuff I know without learning anything in the process. I could tell something was wrong because it wasn't quite clear what I was going to do when it was working. Getting in the zone works a whole lot better when everything you do enables the next thing you want to do: when this tool makes it easier to investigate the next one. The trick is partly perceiving a dependency between one hobby pursuit and another, so they amplify one another.

     For example, if I had something in particular I wanted to drive from my toy language, I'd go about it more enthusiastically. At the moment I find I want to play with btrees again, which is great: the important thing is just to want something, because that can be carefully stoked so tech wishes motivate each other. In this particular case, I was planning to code scaling strings as btrees in garbage collected memory, so I can turn a btree junket now into a study for gc runtime btrees for arrays and strings in a language. And I can feed that back into an interaction between language and persistence systems.

     So I think I'm going to write about Exodus style btrees for blobs, with the usual sort of copy-on-write support, but in memory-only form so no one will mistake this for database work right now. It will fall awkwardly short of addressing file formats and gc box formats, by heading somewhere down in the middle, but in spitting distance of both. I'll probably crib from my old IronDoc sources in the process, but I'll write everything using thorn api and class support. Probably it will work as a useful bridge between disk-based and memory-based code in the future, and it will also give me something to eagerly port to toy language builtin data structures.

     This is also a good point in time to document some of my old copy-on-write work since it's completely irrelevant to my current day job, but it's one of those things companies tend to be obnoxiously possessive about when it does have any bearing on work problems (as if it wasn't all sort of basically obvious from first principles).

     Maybe thinking about another IronDoc style page cache recently when writing the thread demo got me in the mood. All I know is I've been thinking about it off and on all day, and I started looking forward to working out some new experimental code for btrees without having it revolve around file formats. I might actually learn something, which is maybe why it has my juices going.

     It will seem more like generic library work than anything else. I'll avoid language and database slants deliberately, which will also make it simpler when it's abstract and app agnostic.

scaling and probability

     I often advertise my interest in scaling, but I suspect some folks think it means something different than what I intend. These days, most of the time folks talk about "scaling" they're talking about building out hardware infrastructure in support of scaling. But that's not what I mean. I'm not into hardware, and I don't design and spec hardware for server farms for scaling. I do local system scaling and not global hardware farm scaling. The only part focused on scaling in distributed systems is making each node cope with the fact all content involved participates in async conversation with many nodes, and coding this for correctness.

     In my day job, when I talk about scaling I often mean something in particular: making every operation constant time if possible. Folks get sick of hearing me say the phrase "order one time" over and over again. What didn't I like about your last idea? It can't be done in order one time. In systems with a big network performance goal, you can't afford to spend much time doing anything. For example, can't physically go visit millions of objects you want to invalidate all at one time, even as a non-blocking background task, if it runs CPU usage up to high levels — that alone can cause network traffic to start backing off. Near 100% CPU utilization for several seconds in a row is a bad idea. You can usually turn this sort of task into a lazy duty performed incrementally as you go about your other constant time tasks, adding a tiny bit of overhead to each one, spreading large costs evenly across a zillion small costs.

     I had really meant to make this blog piece be all about a specific view of probability, but I added scaling to the picture because that's the context in which analysis of probability is interesting: what happens to certain probabilities as you scale something up to use all available resources to do whatever it is?

     I wanted to write a little about my meta reasoning, applied to some topics in dedup, because the way I reason about the issue is generically useful and not specific to dedup in particular. However, I'm not going to tell you any of the results of my reasoning, when it comes to dedup anyway. The results have something of a trade secret flavor to them, because the results were found on the clock in pursuit of day job problems. They're not mine to share.

     But some of my results were found by the manner in which I went about thinking, then testing hypotheses empirically. How I go about reasoning is something I can share. It really amounts to little more than undergraduate discrete math coupled with basic scientific method: trying things and seeing how it fits models you form. There's hardly any trick to it.

     Let's say you write some app, or system, or service that scales really big. And part of this software runtime contains sets of different kinds of objects in relationships satisfying your needs so the app works. The basic question you have to ask yourself is a simple one: what can you conclude about the entire population of these objects, taken altogether? For example, how many of them are there? What are the relative ratios of different sets? What do actual and potential ratios tell you about probabilities of good or bad situations with respect to your app?

     If the actual probability — not the probability you were hoping would obtain — has a big potential impact on the value or quality of your product, then you should find out what it is by generating and analyzing actual populations at the scale of interest.

     Then you can think about making changes altering probabilities in directions to your advantage. And when you already have means to empirically measure factors of interest, you can directly explore the effect of changes you think might work better.

     Some of the data sets and populations are not easy to measure directly off your running system, because you don't see all the potential emergent relationships — just the ones during a current run. To measure all potential cases, you might need to write code explicitly constructing a full set of potential values, even if you'd never see all of them in practice. Otherwise, you won't know certain probabilities. For example, you only have a theory about a count of all potential cases until constructed and measured.

     Any time your code has reasoning that includes parts like "this data set contains a set of things, and I'm going to find some of them" — you should find out how many of those things are actually there to be found. Don't guess or assume.

22oct08 bobbing for apples

finished thread demo

     The thread demo is now fully done. The tail end of the right column contains a small piece of random fiction, added in lieu of writing a much longer piece I'd rather if I had time. However, there's not much reason to add fiction to the demos. In this case, adding a tiny bit lends characters a little flavor, as if they actually have existence apart from foils in dialogs.

21oct08 vacant parking lots

thread demo

     The thread demo is around two thirds to three quarters done now. I'll finish it later this week. It's missing explanations for thread subclasses and the output for sample execution runs. But all the code is marked up and the test runs will show a few interesting effects when I explain them.

Entries appear in reverse chronological order. Content here is transient. Each entry has a permalink () to a long-lived persistent copy elsewhere. Clearly, to link anything, you'd best link a permanent copy.

18oct08 wheels within wheels

pending demo

     I'm working on a thread demo, but so far I've been slicing and dicing sample code and examples in pursuit of meaningful but tiny pieces. It's hard to write a demo of threads small enough to mean anything while avoiding totally fake and arbitrary results.

     It's tempting to write another thread-safe page cache of the sort I used in IronDoc, since I end up coding one in day jobs every few years, and it doesn't change much while being useful, realistic, and informative. The kind I've been favoring for a while now amounts to a simulation of memory mapping, but with more control over when i/o occurs. (It turns out that real memory mapping tends to make terrible performance and latency choices when the OS lacks app specific usage details.)

     But in the short term I'm just going to show basic stuff, like how to manage a thread pool, how to make them stop cleanly using barriers, and how to write a pthread barrier clone when your system doesn't define barriers in pthread.h. (Barriers are present in Linux, but not on my Mac laptop, so I need to write a barrier class if I want it in the demo. I wrote it today but it's not tested.)

non-determinism

     This week was a total grind at work because we reached that stage again where I need to convince folks the dedup codec works according to plan, despite it's inherently fuzzy, heuristic, stochastic, non-deterministic algorithm. Actually, encoding is deterministic if you start with exactly the same system start state and present exactly the same sequence of things to encode in exactly the same sizes of pieces. But if you vary anything, the end result is only approximately similar. It's defined not to be exact.

     I didn't come up with the original algorithm. I just tuned it and reworked all the pieces to leave out unnecessary details, leveraging the most useful parts. I cleaned up definitions, squeezing them for efficiency by exploiting improved definitions. But it's still just as heuristic and inexact in nature as it started. I just increased probabilities of good effects and decreased those of bad ones. But the result is still probabilistic.

     So it's really strange when folks expect exact, repeatable, deterministic behavior when even such small details as size of network frames will alter encoding. Especially when someone testing doesn't guess results are highly content specific. It turns out it matters exactly what was inside network traffic, because there's (basically) an infinite number of legal ways to encode it each time. One preferred approach is pursued, but it's heavily influenced by actual content and past history, both.

     Folks ask, why did it do X here? And I reply, it's defined to be allowed to do X there if appropriate. So they follow up with, but why did it happen this time? So I say, it depends on everything — for example, what was content this time? And how big were the pieces?

     So this week I wrote code every day to reveal what was happening in detail not previously tracked or reported. And it's still not yet sufficient detail to eyeball a report and be satisfied you know why this case acted thus. But the level of detail has already grown to a scale that loses the simplification of models inherent in the algorithm. You can't see the forest because now you see not only the trees, but all the leaves, imperfections in bark, and ants crawling on trunks. It causes detail shock.

     Part of the problem is in what folks expect to be visible in text displays. Even when I write a data structure to build an overall story explaining everything that happened (and where events affected which part of an encoding) when this is printed as a single linear display, it's inadequate. The actual picture of what happens is two dimensional and layered like transparencies on an overhead projector. And I can't show this in a text view whipped together in an hour. It would still be hard to show in a fancy GUI — you'd essentially have to write a codec debugger to really understand without getting lost. (Instead of source code, a codec debugger might display patterns of similarity, search, and references in content flows already too large for comfortable human digestion.)

     Anyway, some people would only be satisfied if encoding was trivial and exact, when in fact it's gnarly and inexact. They confuse the idea of correct behavior with simple and repeatable behavior, when neither of those apply to correct solutions in adaptive, opportunistic algorithms with strong internal feedback looks in system state. We can verify it always generates correct answers. But oddly folks wish only ideal things would happen, as if no heuristic fallback code path would ever be reached.

     It shows a mixture of ideal and non-ideal results, in a mix resulting from heuristic strategies. When that's how it's defined, non-ideal results occur here and there. Whining about it changes very little. You should pursue a more block-oriented approach if you want endless permutations to look simpler.

14oct08 decisions

vacation

     Okay yeah, I've been taking days off my project here to relax. The past couple days I kept planning to start the posix thread demo (because that link still draws a lot of hits) but I let myself read escapist fiction instead: a Heinlein novel I last read over thirty years ago. There's no point in mentioning the title since I haven't anything to say about the content. But it's always interesting to note how my mind has changed since the last time I read something.

     So, I think I'm going to write a thread demo next, featuring a dialog developing a very simple C++ wrapper class to use pthreads using the posix api. But I'm not sure when I'll start. Then I suppose I'll go back to toy language stuff.

opportunity

     I'm trying to make up my mind about an opportunity that's really tempting because it has prospects of really interesting work again, though it would involve working in Java a lot, which I've hardly used. They don't mind my lack of Java experience, but I'm wondering how much I'll mind lack of control when trying to do optimization work.

     I would almost certainly net less income if I make a change, and I haven't finished digging myself out of debt yet, playing a big part in how I think about risks of diving into something new. But worrying about my finances is kind of backward looking, when I should really think about where I want to go instead of escaping money craters.

     What I do now is really narrow, and not very challenging. And the last round of team load-balancing had the effect of isolating me from i/o parts I enjoy more. However, I have a lot of time to work on projects like a toy language on this site, because my current work is not particularly demanding. If I started a new job, I'd spent most of my time coming up to speed for quite some time, and output here might fall off to light, sporadic updates for a few months.

     I've been in my current position two years and I'd like to make it three. I don't want to change jobs often any more. It's just this current possibility is really high on my list of best places to work in terms of fit with my interests. I've had my eye on them several years. So it just has all the signs of the right thing to do, except for downside risks.

     If I take a new job, I'll have to throttle time I spend here even more, to focus on just basic toy language stuff alone, with an aim to get it done and wrap it up.

     The problem with making a decision so far is that I can construct positive and negative arguments for and against making a change that don't resemble each other. The gap is very wide: best and worst case outcomes are far apart, which suggests I'm really guessing blindly, motivated by hope. It's been a while since I hoped for better things.

     Of course, if I do change jobs, I won't tell you. I'll keep my policy of saying as little as possible about day jobs here, unless a really good reason to change arises.

11oct08 places for everything

same bat channel

     I might return to programming material tomorrow. But then I must also put in a extra hours at work tomorrow afternoon, so maybe I won't write much this weekend. I could launch into a little code right now, but I decided instead to shortchange you with a little fluff best described as "blogging." It's topical in a sense of relating to financial accounting systems, which folks recently noticed (again) aren't always all they're cracked up to be. Someone is willing to game every system, so it's inevitable.

accounting

     I know more about accounting than most geeks. At one point when I was around twenty years old, I double majored in computer science and accounting for a year or a bit more. I'd developed a bit of curiosity about business systems when I was working at McDonald's between school years at the end of the 70's.

     I could tell a few funny stories about McDonald's, but actually I don't want to remember. I did a good job of misplacing most of those memories and want to keep it that way. You'd think a guy with an IQ of 160 could find better work than McDonald's, but not where I lived in Iowa in the late 70's, which was — in a word — barren. You guys don't know how lucky you are to grow up in an age where you can find work at a distance using the internet. I grew up in a small urban oasis in the middle of a corn desert. In the Dark Ages.

     But they had books. For example, I read The Sun Never Sets on IBM while working fast food, which told the rise and fall of IBM in a style mocking the rise and fall of the British Empire. I started reading about accounting because businesses seemed a little bizarre. I had once read a science fiction short story where a teenager was caught dabbling in the dark arts of accounting, clearly cast in the role of black magic. When confronted by adults, he protected himself by conjuring a daemon with double zeroes in place of eyes.

     My family had predisposed me to interest in accounting. Visiting relatives, seeing me absorbed in detailed projects around the age of eight or nine, always observed, "One day he'll be a certified public accountant with attention to detail like that." They were all working class, so this was apparently the only use they could imagine for an attention span in a child longer than any adult's. I'd keep books for other people who did something useful, like make money.

     By that time my father had started his own business — in Tennessee, where my status was damn yankee to other kids — selling "business systems" to small business owners, consisting mainly of business records storage (on microfiche for example). If you like, think of his business as selling clay tablets with attached stylus to shop keepers using cuneiform for business records. It's about as comical, and foreshadows the quick collapse of his business during rapid innovation in storage systems. Instead of declaring bankruptcy, which once had quite a stigma attached, he instead spent years paying off his debts and finally hit a net worth of zero again by the end of my high school years.

     He seemed a bitter and miserable man, but I supposed I would do better since I had a gift for arithmetic; simple mathematics, like that tested in SAT's for example, was a breeze to me. In retrospect, it's a little disturbing to find myself sucked into storage systems in professional computing work now and again.

     I changed majors to computer science a couple years into college because programming was a lot more interesting than circuits in electrical engineering, even though I'd aced all the calculus and physics classes. It was clear EE would suck as a job because you could never do anything interesting by yourself, whereas in programming you could do meaningful work all alone. (You can probably see the irony here: over the years it has become very hard to do meaningful work all by yourself in programming.)

     But for a while I also double majored in business administration because I wanted accounting classes to explain how ordinary businesses worked. I walked into local businesses and asked rude questions about inventory practices. I pissed off business owners by asking how they acquired their positions. I toured facilities at my home town's newspaper because I wanted to know how it worked. (For some reason, I was treated like a visiting VIP at the newspaper for reasons completely beyond my comprehension.)

     I read books about the acquisition of wealth. Standard advice was simple: don't spend everything you earn — invest some of it. But a couple years of learning the nuts and bolts of accounting systems exhausted my supply of curiosity, so I went back to just computer science. But not until after I'd aced three quarters of accounting.

     In the middle 80's when I lived in Berkeley, while I taught myself compiler writing before I returned to school to finish my degree, I worked for a while as a clerk in a law office in Oakland. There they set me to work running their accounting systems when I wasn't filing away paperwork for legal cases. I keyed in all time and expenses billed to clients, cleared checks in the general register, and balanced their accounts before generating final balance statements. An actual honest-to-god bookkeeper came in once a month to double check my work. Why did they have me do this sort of work? Because I tended to not make mistakes: I was freakishly precise, and I noticed any data that didn't fit.

     By the 90's I was working full time as a software developer, and in 1994 I worked for a primary vendor of software in the derivatives market segment. They were experts in software for the sort of dodgy financial instruments you have come to loathe in Wall Street's practices. I read books about derivatives and about packaging and re-selling financial instruments like mortgages. For example, the explanation of how Fannie Mae worked — and rationales behind mortgage valuations — had a kind of rank smell to it compared to any more pure form of mathematical system. By the standards of accounting I had learned in college, mortgages had a fishy smell because they were divorced from clear auditing that might provide evidence grounding you in reality. Assuming you cared.

     Sometimes I call myself a plumber in computing systems, but it's actually more accurate to describe my role (some of the time) as that of an accountant. In some tasks, my job is to optimize memory and CPU budgets, since you can't spend more than you have in the real world, unless you have a scam going. At the same time, the idea is to pinch every bit of efficiency out of resources available, like an anal bean counter. When tweaking runtime systems in software, I'm like the accountant who asks, "What do you want the answer to be?" since with a little creative accounting, I can probably manage something like what you have in mind.

     That's what you get if I'm in left brain mode: balanced books with a place for everything, and everything in it's place.

     But I'm much more right brained than left brained, and if I'm in that mode, you get something else — and it's not something they can teach in school. Most folks aren't any good at it. To call it thinking "outside the box" is a dumb way to say it. It's better to think of it as a facility for imagining boxes of all kinds — sometimes very strange boxes — and simulating what happens given sets of different kinds of boxes. Among other things, it leads directly to systems without standard terminology, and it's very easy to imagine the basis of a coordinate system incomprehensible to others (who imagine the only way things could ever be is similar to what exists now). But in practice, with short term goals, only small changes are useful when most things must stay compatible.

     Right brained thinking is most useful when you're stuck. I can imagine being unstuck several different ways. It's easy, and you get your choice of side effects for each way to get unstuck. It's also very useful when you want to know what would break the current system — lots of things will break it. Everything has weaknesses. You can even do armchair debugging this way: when evidence says an odd thing occurs, just ask, what might cause that?

     Right here I'm going to drop a couple digressions and wrap this up by heading for thematic closure. Remember my poor dad? The poor shmuck who got himself in debt up to his eyeballs? I was pretty sure I'd never let that happen to me.

     Turns out, when you get married, you're in business with your spouse, in the eyes of the law. If you marry a complete idiot with numbers, who can't be responsible even in a pinch, you end up having the same result you would if you had selected a moron as a business partner. Which you would know not to do, right? And it would be even worse if your partner was dishonest, much craftier than you imagined, and diabolical as an adversary.

     The phrase "taken to the cleaners" is not just a bit of rhetoric folks started using in the context of divorce for purposes of color. It's a fairly accurate description. But it falls short of what can actually happen. You can lose more than everything you have. In the business world, about the worst thing that can happen is going bankrupt. Going negative is not enforced in business law, but it is enforced in divorce law and child support. You can go negative by an amount that becomes daunting very quickly, especially if you struggle. (Don't do that: just give up fast.)

     Okay, now imagine owing your ex an amount of money that's a sizeable fraction of your gross annual salary. What do you think will be necessary to pay that debt back? As a clue here, try to imagine the folks designing the rules as mathematical idiots, who can't do simple arithmetic. As far as they're concerned, you've got the money.

     First off, you're going to pay almost all the tax on your income whether or not a hefty chunk goes to your ex after taxes, because you pay tax on all the part labeled "child support," and it's the big piece. After taxes, you're going to split the remainder nearly in two, and one piece goes to you ex. Whether or not your ex works, her income now exceeds yours, by a lot.

     Out of your net income, you pay your rent and all other expenses. Probably you'll be asked to pay more expenses for the kids above and beyond what's already paid in child support. And of course, you're feeding, clothing, and supporting them yourself half the time as well, and it's not cheap. (They're on track to cost me a million dollars apiece by end of high school, once I include my ex.)

     You pay your ex back the money you owe from whatever's left. In round numbers, you might need four or five dollars of gross income for each one dollar of discretionary spending after taxes, support, and living expenses. For each dollar you repay against a debt, you also give your ex more than a dollar in support from the same five dollars of gross income, and pay taxes on both.

     The overhead on paying back the debt is spectacular.

     My net worth will be back up to zero shortly, as my older son hits the end of his high school years. So to him I look like the same kind of loser my old man looked to me. It's only taken about four years after the low point to get back up to almost dead broke. All I had to do to get here was trust someone far too much.

     Do I trust you? I never trust anyone, anymore.

08oct08 battening down hatches

home stuff

     Sorry about low output here this week. I've been taking care of the real world at home. It seemed a good time to upgrade my sons' beds, which have been ad hoc and informal for a couple years now. One evening I bought new pillows, sheets, and associated folderol, then washed everything old and new. The next evening I dragged my sons to a mattress place to pick upgrades they like at reasonable cost.

financial fairy tales

     I find the economic financial meltdown a bit distracting. News of this sort tends to engage the part of your mind that projects alternative future possible realities (let's say future sight or some other simulation name) to anticipate how details might affect you personally. Anyway, evaluating risk in the context of alternative permutations of situations is very nearly the same sort of task involved in designing some parts of code, so the code creation part of my mind has been busy in evenings of late. Either that, or I'm rationalizing.

05oct08 splitting hairs

lexical syntax

     I wrote an informal description of lexical syntax on the lisp page for the toy language, after finishing my pass over all the Scheme SRFI's to be included in Aim.

team sizes

     Sometimes I'm asked odd questions I fail at first to see the motivation behind. For example, I was once asked about sizes of teams I have worked on before. I answered the question as if it had been asked, "What team sizes are good? What's normal size in the industry?"

     But afterwards I think the question really meant to ask, "How many people worked on your part of the project?" Ah, now that's a completely different sort of question. Whenever I say in writing (like on a resumé for example) that "I did XYZ," this means the team size working on XYZ was exactly one — me.

     I use the word "we" when we did something as a group. For example, when I explain the objective we were aiming to achieve as an overall product, in order to provide context. I might say we were writing a server to do ABC. But in that server, if I say I did the XYZ part, then no one helped me, and in many cases I did everything from specify the problem, write all the code, test all the cases, and debug all the problems. In such cases, other team members only had the vaguest idea about details, and only when they needed to share my api — for example, when writing a draft VM to execute code emitted by code from my compiler targeting the VM I both invented and specified.

     Of course, I understand why such questions are asked, because when I interview candidates coming into my projects, I want to know the same thing: did you do anything on your resumé? Or did you just happen to be there when it happened? It's very common for people to put stuff on their vitae which summarizes what the entire team was doing, even if they had nothing to do with any of it, in a remarkable display of (either) intellectual dishonesty or failure to grasp the idea.

     Several years ago I had a job at a startup where I did things sounding like output of four or five engineers in the course of a couple years of development. But I did all of it myself. One of the reasons I was so productive was because I personally wrote more than 90% of the APIs I had to use, which meant I never spent any time trying to figure out what something might mean, because I already knew.

     Anyway, almost all productive engineering teams have a fractal structure, with typically three to five engineers doing something similar, with close interaction. Large and complex products can have fifteen or eighteen engineers altogether, and up to that many more folks doing QA, project management, marketing, docs writers etc. A startup might have five engineers that do basically everything, with three of the senior guys doing all heavy lifting. A bigger company has bigger teams, partly because they can afford benchwarmers (for lack of a better word) and excessive depth in QA teams who sometimes seem to do nothing. But there's still usually only a handful of engineers that do heavy lifting.

     I do the heavy lifting when it has to do with data structures, memory management, thread safety, using lots of resources, efficiency and safety of exact algorithms, and reducing exotically messy problems into relatively simple parts. It varies from job to job, but usually any task that requires someone really know what's happening at a low level in C++ is handed to me, because I deliver a superior result in several says: 1) it actually works, 2) it stops having bugs, 3) I can explain what it does, 4) management can understand my docs, 5) I made it simpler, 6) you'll want to patent some clever part, 7) it goes faster, 8) it scales larger, 9) it gets refactored, 10) I keep alternatives you can pick, 11) it has stats revealing inner workings.

     But I can generate results you don't like, especially if you pile on constraints in the hope of killing one of the options by making it too complex to solve. Uh no, I'll still solve it. And then you'll be stuck with this complex thing. If having it be really simple in the end is one of the constraints, you have to say that explicitly instead of pretending all the complex requirements are really critical, expecting that will derail things and make the result moot. I'll make it work anyway, but you'll hate it. Just tell me not to do it if you don't want it finished.

     Usually teams have a few folks you can bounce ideas off, instead of having to work out every detail in a vacuum. I like to bounce ideas off folks so they can tell me not to do something that sounds too complex. But most of the ideas I pursue are ones I had myself, as opposed to those suggested by others — I don't know why other folks suggest few ideas.

     I work on whatever problems need attention. But over time I tend to get stuck with families of related problems once I've gotten deeply involved in reworking how things address earlier problems. Other folks get assigned tasks that sound like "just get something working, even if it's wrong" because I'm more likely to do it right if that takes slightly longer. I'm not very good at randomly scaffolding temporary hacks without any well-defined explanation. But I'm really good at doing parts you want to believe are correct — such as parts that would otherwise lose user data if behavior was erratic.

     I always work closely with several engineers, and we cooperatively decide who is doing what. But parts assigned to me are usually done without help. I'm not sure why that happens. No one minds when it gets done correctly. Usually I work out the answer and then run a solution by whichever engineer has the best grasp of a problem to see if my solution has favor.

03oct08 where magic goes after

lisp specs

     I started a new lisp page for the toy language, specifying early, cursory specs for a Scheme subset named Aim. I was distracted the last few days. This weekend I should be able to fill in that page somewhat more. I need to finish a review of all the SRFI's while picking candidates for inclusion in Aim. And I need to flesh out lexical syntax details.

01oct08 hill views

sample code

     Here's a short intro to code appearing in demos from the perspective of sample code even though all the C++ code here is written in a fairly odd style (as described under thorn by names and style pages).

     Most of the code you find here is actually production quality C++, except for one very unusual practice: using comically terse class names. The extremely short names started as a private experiment to see how short it was possible to make C++ code names and still work.

     Almost none of the code published here uses an acceptable work style, because names are short and cryptic, and whitespace is cramped both horizontally and vertically. However, I have used clones of this code in production servers using more conventional naming and whitespace to match project style guidelines. (Correct code style to use in work environments is always what looks just like code written by everyone else.)

     Viewed as sample code, you can't tell a whole lot from code you see here, except that I know how to do pointer arithmetic and memory management.

     (And liberal use of character dialogs with occasional profane language shows I expect very little professional exposure to this material — this site really isn't for professional purposes. The main purpose is really publishing old and new hobby code related to programming languages, without drawing too much undue attention. For example, you can't find any code in a form that can be downloaded, built, and executed, because if I did that folks would write and ask questions. Currently there's no social purpose to this site.)

     Many of the linked pages are just stubs for future writing, so the todo page should be consulted before clicking on interesting looking links. So the thread page is still a stub, while the mutex page has interesting stuff to read — if the short names don't freak you out.

     More recent code, like the writer which pretty prints Lisp code trees, shows some high level application logic specific to getting some task done. But most of the code is a library of C++ utilities, starting with the extremely primitive pointer-plus-length iovec utility code in the run, which I tend to use heavily (with better names) in most real C++ projects.

     Most of the code is what is called "utilities" in most projects, and clones of many of these classes end up in some utils source code directory shared by all other project libraries because the utilities don't depend on anything else.

     If you're looking for code related for specific topics, you might examine the following demo pages:

  • run - using only pointer and length
  • buf - iovec runs plus max capacity
  • node - refcounting and auto pointers
  • in and out - simple primitive streams
  • heap and pile - simple and exotic heaps
  • stat - variance for app statistics
  • crc - hashing with zlib's crc32()
  • list - simple circular doubly linked lists
  • map - exotic hashmap for pretty printer
  • iter - hand rolled iterators mocking STL

page re-org

     As I mentioned I would several days ago, I started moving exploratory fiction to a separate page so just-for-fun content isn't mixed with language project material.

29sep08 infinite clown-fights

aliasing names

     My next toy language page is still too messy to post even with a caveat about work in progress. Some proposed names I added last weekend need to be removed. In addition to standard names in sch namespace for Scheme, I planned new names in an aim namespace for Aim (typically abbreviations of Scheme symbols).

     Now, the bad thing about multiple names meaning the same thing should be obvious: to read code using a set of names you don't know, you'd need to learn new names, which is somewhat wasteful of your mental capacity.

     That in particular doesn't stop me, because I'd just as soon write Aim code using method names I prefer instead of those standard in Scheme. But then I started thinking about Smalltalk style names in Gab (a mini-Smalltalk clone). Method names in Gab will use Smalltalk style conventions; for example, a method of two arguments like at:put: will be named by a keyword style selector: a sequence of colon-terminated identifiers.

     I realized I was going to use Gab style method names in Aim's Lisp syntax, and these won't have much to do with Scheme abbreviations. So I was actually considering three different sets of equivalent names, which is silly. Two sets of methods names is quite enough in a spec. (I can have more, but there's no point in publishing them now.)

     This means the Aim spec should first list only Scheme methods to appear in a sch namespace. Then as I figure out Gab method names, the Aim version of those should be added to specs for Lisp syntax, say in a gab namespace.

     In other words, I looked ahead too much. I should just write about lexical conventions and post a draft.

memory contracts » (stories)

27sep08 bewitched minds

cheerful

     This last week I've been in a very good mood for some reason. I'm not naturally a cheerful person because I fixate on things that need attention to grow, so I focus in a way hard to tell apart from worry.

scheme stuff

     I've almost gotten my mind back on top-down Scheme related specs, but I haven't put more than two or three hours in it the last three days. (For example, today I took care of things I've been putting off, like changing my car's oil; when the idiot light signals, you're late already).

     What I post first on specs for user-visible features will be loose and ambiguous for some time until I sharpen detail and prune false trails and low yield trivia.

     As I said earlier, the subset of early Scheme included will be called Aim — since aim is one of the meanings of scheme — to just go ahead and acknowledge basic lack of compatibility with real Scheme implementations. One namespace will strive for future Scheme compatibility, but other namespaces will cheerfully vary a lot in names while still aiming for semantics in common with Scheme.

     The page coming next will basically say your knowledge of Scheme is assumed to come from somewhere else. Docs here at first won't do much but enumerate what is to appear, in what order and priority, with new names in different namespaces, with start of cross references with Gab's Smalltalk style api.

continuations

     One of the reasons I want a Scheme-ish system is for full continuations using spaghetti stacks. But I'm having trouble remembering the definition of call/cc, which I always had trouble mapping on to my mental image of continuations as explicit return addresses. So while I can recall how to trampoline C stacks in a VM, I can't recall the relation of the spec for call/cc to an implementation of continuations, and docs for call/cc I've read in the past always had a weird phenomenological definition in terms of how language authors wanted programmers to think in a manner related to lambda calculus, which doesn't sit well with my pragmatic imperative perspective minimizing functional views.

     Anyway, I'm not looking forward to reading about call/cc again until I see both that view and the one I understand both at the same time. If possible, Aim will introduce some other api in a non-Scheme namespace supporting continuations in some other way seeming less weird and more direct to me.

     Update: It only took about an hour of reading online to refresh my grasp of call/cc does, so I was a little too pessimistic about coming up to speed again.

sleep is good

     Sleep strongly affects both my creativity and intelligence, and I almost always try to sleep as much as I can. But this is usually eight hours at most. I tend to wake after seven hours — sometimes after six and a half. I don't think I've slept as long as nine hours in over a year, but I sure try on weekends because it pays back several times over in how cleverly my mind works. I'd pay to get one more hour. Nine hours is like getting free cable in terms of what my mind does without prodding: way better than a sitcom.

     Whether I succeed in falling back to sleep depends on my level of anxiety, and whether some train of thought takes off by itself. After that, I can't turn it off again. Once in a while I manage to drowse another hour while seeming to think for only ten minutes, in a weird form of half a sleep that becomes obvious when I really snap awake.

     Getting too little sleep — say around six hours — is useful when it's necessary to do something ungodly boring at work, since then imagination doesn't distract me as much. It's almost impossible to do a soul grinding task after a full night's sleep, but it's not hard when blunted by fatigue.

24sep08 voodoo engineering

style sheets

     Tonight I'm experimenting with style sheets for Scheme feature docs. I wish I recalled where I saw some really nice code syntax coloring styles. What I have will do. But later I'm apt to tweak it for legibility.

23sep08 lost in translation

heartbeat

     I'm still on a mini-vacation, not working evenings this week so far, though I guess I'm half drafting some ideas for specs as a background idea.

menu » (stories)

Sep 2008 This way to Sep 2008 entries.

Aug 2008 This way to Aug 2008 entries.

Jul 2008 This way to Jul 2008 entries.

Jun 2008 This way to Jun 2008 entries.

May 2008 This way to May 2008 entries.

April 2008 This way to April 2008 entries.

March 2008 This way to March 2008 entries.

February 2008 This way to Feb 2008 entries.

January 2008 This way to Jan 2008 entries.

December 2007 This way to Dec 2007 entries.

November 2007 This way to Nov 2007 entries.

October 2007 This way to Oct 2007 entries.