Þ   briarpig  » log  » jun09


June 2009 This way to June 2009 entries.

26jun09 consolation prizes

edges

     One of the things I do in my day job is closely related to edge detection: finding boundaries in content so components can be identified. When working in dedup systems, what you really want is fewer edges, because then fewer components are identified and less work is necessary.

     In particular, if content changes, you want to find the same old edges if possible. When you perturb old content by inserting or deleting new content, it would be nice to find the same old edges in content that hasn't changed. This is hard to do since fast algorithms to find edges in linear content are slightly sensitive to content offsets. What you want is offset-independent edge detection.

     Last winter I found a really good way to do this; my search for an ideal algorithm turned up a scheme of edge detection that re-synchronizes on the old stream of edges after a content perturbation. As a main effect, this reduces indexing costs.

     But I'm not going to tell you how it works. I'm also not quite going to define what I'm talking about either. Half the problem is specifying the problem very clearly, so you can write code that carefully measures whether you hit the target.

games

     I beat inFamous twice, once with good karma and once with evil karma. It's a lot of fun. I spent more time playing this game than any I've played before. (Most video games don't seem very interesting to me.) Something interesting happened to the way my eyes scan in dreams; the effect was really intense a couple weeks ago when I was ill.

     In rapid eye movement dreams, and in half awake transitions into and out of sleep, my eyes jerked back and forth horizontally much more than usual in aggressive environment scanning. The effect was mildly disorienting, like watching a Michael Bays movie with whipping camera movement. Anyway, it's odd to see my brain try out very different seeing strategies at my age.

vacation

     I think I already mentioned I'm on vacation. By that I mean I'm not working much on hobby project code this summer. And this month I've taken paid time off each Friday for three day weekends. But I've been trying to frame how I want to resume hobby code. So, what am I planning?

     As many folks have noted before, complex systems that work invariably derive from simpler versions. (Yes, yes, I cited a platitude to speed things up.) So I want to simplify what I've been doing before making it more complex again. To do this, I plan to remove some choices in infrastructure, so fewer degrees of freedom distract less.

     In part this stems from my dislike of writing systems in C. I prefer to use C++, because slightly higher level is helpful when doing hairy systems with complex memory management issues.

     After finishing some edge detection upgrades in old systems at work, I returned to the simulation of new designs still in progress. Trying to re-find my mental context was hard, and I was more than a little aware how complex my simulation's first draft had progressed to date. It was obvious I had made something more complex than necessary. But it took me a while to clarify my mistake.

     Half my mistake was using an api specified in C, with very few organizational guarantees. A C api was strongly desired by folks who want a system I'm prototyping. So I catered to wishes by first drafting a C api, but this made things harder than necessary, and now everything has too many moving parts.

     What should I have done instead? First, I should have started with a api in plain English, with a strong mathematical flavor: a very high level spec. And then I should have done a C++ api with as high level an api as I could specify while still matching the api with a math focus. Then after getting a prototype to work, a C api could be exposed, making the end result more complex—but only after I reached a simpler working version.

     While thinking about how I should have approached this task at work, I decided when I resume my programming language project—and the simulation runtime infrastructure—I should cater to as few nice end qualities as possible at first. Maybe my next update will include more detail. It's time to call it a day and wrap up this post.

19jun09 twilight visibility

math teachers

     Sometimes I puzzle over how my sons are not as good at math as I was. Now and then I suppose the problem was my ex, who wasn't very bright. (Yes, in retrospect I wish I had held out for a woman who was either bright or good-looking—or good natured for that matter; I sold myself very cheaply.) It's tempting to pat myself on the back and tell myself I was a prodigy, but I know teachers make a difference.

     I think I was lucky to be gifted with good math teachers. They were available to teach math for the same reason I had rotten options after high school: there wasn't a lot else to do in Iowa back then. Today they'd all be working in high tech. For two years in high school I had Mr. Carroll, who was a very smart man—in fact, a math geek. He knew exactly what he was about at all times, and he studied you carefully as he spoke, to see how closely you were following his current line of explanation. He was as good as me at math.

     From what my sons report, none of their math teachers are any good. The best of them are droning robots, who parrot without concern for ideas. The worst don't even explain subject matter. My younger son is in advanced math (he placed first among several schools on a test last year) and his teacher sounds like my 7th grade math teacher. What was wrong with her?

     My 7th grade teacher gave the entire class an identical math test at the start of the year, and again at the end. All of our scores fell by the time the test was repeated at year's end. She harangued us for an hour, furious that we would make her look bad. Finally she looked near head-in-hands weeping at her desk.

     But after my 7th grade year, all my math teachers were men—very smart men, typically, who today you'd never catch in a classroom. In 8th grade we actually had a team of male teachers who taught math like it was a show choreagraphed to entertain. In my freshman high school year I had Mr. Rash for algebra, who was young, dynamic, and funny like Robin Williams, down to incredibly furry forearms. (He predicted I'd become a nuclear physicist; it was the only thing he felt was a good use of math.)

     What? Did I imply men might be better at teaching math? It wouldn't be politically correct to say that, now would it. What I really mean to say is this: math should be taught by folks who rank in the top several percent of their class in math talent. (If this is men now, that's not my fault.) But I suspect it's untrue today.

     This last year I helped my younger son with math homework every week for a few months, and I often expained how things worked, when his teacher had said nothing at all. She didn't cover examples in class; how can kids learn math from nothing but a book?

grandfather

     I guess my talent in math comes from my mother's father. It's funny, but as I get older, I can look back and read adults now in hindsight who, as a child, I watched without seeing anything but an adult. My maternal grandfather was an interesting guy, and his sons where joke-cracking live wires.

     His nickname was Woody. Everyone made a point of noticing how I took after him. At the time I thought this meant I looked like him, which I did. But maybe it was more than that, since by grade school my parents expected me to solve puzzles easily.

     Woody worked in the local meat packing plant. (This was Iowa, remember?) I'm not sure of his job title, but he made things. When we visited my grand parents, Woody was usually in his garage workshop, tinkering away at something. One year he built his own vacation home trailer, from scratch. He was a tinker, and had perennial hobby project syndrome.

     His eyes were always far away, thinking about something. And when he talked to others, it was only incidentally related to what he thought about, as if he didn't expect anyone to follow what he was working on; or maybe he learned others grew bored easily. He seemed to drink a lot when he was older, trying to drown out something that hurt.

     I thought he was charming when I was a kid. But now I can see he was profoundly stir crazy in an environment where ambition had nowhere to go, and now it seems sad more than anything else.

15jun09 empty coffers

fingerprints

     Why do we have fingerprints? Before you test any pet theory, maybe you should think of more plausible hypotheses. I rubbed my thumb and index fingers together while thinking about my sense of touch, and had an idea.

     What if fingerprints provide a scale reference? A lot of perception in cognitive psych has a relative quality (we perceive things relative to other things in context, not in absolutes). When I rub thumb and index finger together, I can feel texture of my fingerprints. This means I might use that sensation as a baseline for size when touching other small objects. My fingers are not callused, but I imagine fingerprints as scale reference would help even more when fingers are thickly callused.

     Any more hypotheses? Sure, here's another which need not be independent of the last one. (I mean fingerprints could help with both, not just one or the other.) What if fingerprints help discretize tactile sensations? Let me explain that in terms of a well-known perceptual trick.

     If you touch a part of your body with low touch resolution—say the back of your arm—at two different points but at the same time, say half an inch apart, often one cannot tell this apart from a single point of contact. Ask someone to close their eyes, then see if they can tell whether you touch them with one finger or two fingers separated by a small gap. (With hairy arms like mine, you might need to touch a place free of hair—hair may increase distance resolution.) Often two points close together seem a single point of contact.

     If we had no fingerprints, contact with a surface would be more continuous, perhaps making it harder for our brains to map surface contacts. Perhaps otherwise continuous touch becomes instead a net of discrete contact points as a result of fingerprints. But how would you test that?

     Of those two notions I'd expect scale calibration to matter more. If you grab a tree limb and a sharp point gouges your hand or finger, you'd want to know how big it was before you really put your weight on it. You'd need to know in far less than a second, at least subconsciously.

     Or, how about this one: Have you ever picked up a rock and thrown it overhand? You're not looking at the rock while throwing it, and yet you have a good idea of it's orientation relative to you by touch. All the extra data points you get in touch detail might increase your proprioceptive sense. What if the rock was not round? What if it was a spear?

     Have you ever thrown darts? Typically darts have rough bands of highly textured grips, presumably to avoid slips. But texture of a dart's grip tells you a lot about the exact spatial orientation of a dart's axis when the dart is out of sight behind your head.

     Anyway, I'm not suggesting any specific tool is old enough to have mattered in context of evolution. I'm just illustrating an idea that might apply even in less clear examples. Maybe anything increasing total data points inbound from touch gives you proprioception advantage.

     The folklore on safecrackers says sanded fingertips may increase sensitivity. Is this a case of dampening perception of irrelevant surface texture when perceiving lock tumbler motion is what you care about?

Entries appear in reverse chronological order. Content here is permanent: Each entry has a permalink () to the long-lived persistent copy here. Clearly, to link anything, you'd best link the permanent copy.

12jun09 pulling weeds

dance card

     I wrote and deleted a couple late night pieces last week about what I'm doing, because description wasn't as interesting as the tasks themselves.

     More fiction is in the pipe, but I'm still composing. Fairly soon I'll have to start dumping some of it before it gets too complex to render. So far I plan to resume exactly where I left off.

     But the last two weeks I spent far more time designing something in code: a new runtime for async execution, with a model I find engaging. It's a lot more interesting than I'm making it sound; however, writing about a new thing with some parts still fuzzy is hard to do with useful clarity.

     I tend to think about it when I go to sleep and when I wake up again, and any time I lay down to take a short nap. In other words, it's my default task, and it's easier to do when I can visualize more fully in half sleep, or with closed eyes.

     You might think this would be easier when staring at code or at diagrams drawn on paper or on a whiteboard. But such concrete details distract when they represent conventions at odds with what you want. So for example, if I write in C or C++ notation, it tempts me to adopt a view expecting a C runtime model to be normal, when in fact the C runtime is incidental and merely required in the simulation of what I want, since I'm not going to write in assembler.

     The next section below lightly covers the organization, using terms borrowed from mathematical thought without any particular attempt to follow through as math. The idea is to invoke relevant metaphors to supply meaning. Note I tend to pick loosely geometrical models because that's how I think; I need terms meaningful when viewed as spatial roles.

     (Note I only describe this at all because you might find it interesting. I don't have any need for someone to like or understand this design, and I don't plan on getting anyone else to use it too. In fact, I considered never writing about it, just so folks can't bother me too much.)

spaghetti stacks

     If I wrote a dozen different motifs as mini spec descriptions, you could combine them all to get the spec I have so far. One motif is trampolined style, and another is explicit continuation-passing style, also known as CPS. What I'm planning to do is write code in C (actually C++) with explicit continuations. If you have any idea what I'm talking about, you know I'm suggesting something very hairy; I'm trying to plan a way to make it less hairy.

     The trampoline style in C notation will make it look seriously disturbing to folks who find pointers challenging, because function calls are done as returns to the trampoline. Stackless discipline requires thinking about where memory lives, which explains what I've been visualizing.

     I ask myself a large number of questions and work out the answers. For example, my current representation of a continuation is something like an object with both code and data, which many folks also recognize as a closure. In my mind's eye I see this closure as 4-tuple (f, g, n, p) which I actually express in notation fgnp standing for function, generation, node, and pointer. Of these four, f and p are simplest: just a C function pointer and a void* address. Node n is an optional refcounted something that keeps the pointer alive. And generation g validates and specializes f and/or p, so the meaning of both code and data can be spun on an individual closure basis.

     Think of a single fgnp instance is a point in space. If you like vector spaces, then fgnp is an arrow from an origin to the point in space where the continuation can be resumed. The np part of fgnp is a closure context; if you like Smalltalk runtimes, then np might be seen as a reference to a block context and a point inside.

     You might be asking: What is the call signature of function f? Yes, that question has a big tree of options to search. Below is what I had last night. (Note I made up these names on the spot, so take them lightly.)

typedef int (*f_t)(u_t* u, sad_t* frame);

     Here f_t is the type of f and u_t is the type of some universe u object representing the runtime to avoid using globals to represent the runtime (allowing multiple disjoint runtime instances in one C address space). The interesting argument type here is sad_t, representing the stack frame of a call represented as triple (s, a, d) where sad stands for source, argument (or arc, or arrow) and destination.

     Source s and destination d are both fgnp closures, representing a call from s to d, meaning s is the return address continuation and d is the callee. Thus the f_t function called above is f inside d's fgnp.

     The arc argument a might as well be another fgnp instance, even though the f part is not strictly needed. (But if we did not use fgnp for a, and later you wanted to pass a first class function as the sole argument, you'd have to embed fgnp in the frame that np indicates, and this forced case seems ugly.) So let's say each sad instance is a triple of fgnp instances.

     For purposes of visualization, I see each sad triple as a triangle with one point at origin u for the universe, and the other two points at s and d, representing points at the end of vectors from origin u to where closures for s and d are located. The side of the triangle between s and d is arc a representing the argument passed from s to d. (Note when you need N args, then a is obviously an N-tuple.)

     It's tempting to think of arc a as the unique vector from s to d, but since we can pass many different arguments to d, and not just one, this idea is wrong unless you allow N possible vectors with the same ends (a non-euclidean metaphor where many lines can pass through two points).

     The integer value return by calling each f tells a trampoline dispatcher what to do next with the returned sad arg—an in-out parameter—which is modified by f to tell a dispatcher whether the meaning is return, call, or tail-call, etc. If the local stack of f needs the original sad kept alive, it should go into some space allocated and presumably protected by the new context of the source s returned, representing a return continuation closure.

     Say you're in function A and you want to call function B and then return at some point in the middle of A. This means A is equal to sad.d.f when A is called, but in order to call B the new d must be a fgnp instance denoting B and it's stack frame, and the new s must be a fgnp instance denoting A's continuation. By returning to the dispatcher with a call opcode, A indirectly calls B by having the dispatcher do it. This means we effect the equivalent of B(a) by returning from A; this is apt to confuse folks reading code.

don't do that

     You can use the scheme above to implement stackless runtimes to do whatever you want. But with many more options than usual, your ability to create chaos is worse, not better. For example, the same way you can exhaust memory resources in one stack by infinite recursion, you can just as easily exhaust memory using N trees of spaghetti stacks. The rule is always simply: don't do that. Make sure you don't exhaust resources, even though rules of use don't automatically protect you.

     I could tell you how I plan to organize stack frame discipline to fix some of the risk inherent in totally dynamic typing of stack frames and arguments—magic numbers and lots of invariant assertions—but that's just busy detail and doesn't explain the meaning.

     Just the same, hanging yourself (or shooting yourself in the foot if you like that metaphor better) ought to be very easy unless you add abstraction shielding raw detail so it's harder to fubar. I expect to add abstraction by building an interpreted language using this runtime, then using language primitives when possible. But since I also plan to write libraries in C++ using raw CPS trampolines directly, I also need a few abstractions at that level, too, to protect myself from mindless errors quite common in the absence of strong static type checking in compilers.

knitting

     I came up with a visual metaphor from seeing each sad as a triangle in a stack frame space. You can hook them together like a chain-link fence, or like rows of stitches in knitting, in order to weave patterns of spaghetti stacks that can build up and later unravel at need. So you can view the state of async computations as a cloth composed of stitches in memory.

fgnp contracts

     Each function f needs a standard contract describing what actual usage is supported, since not everything you can write does something meaningful. So some of them are callbacks for async calls, others are useful in coroutine dispatching, and others might be ordinary single use function calls.

     Nothing about raw C or C++ typing will stop you from using a function or fgnp incorrectly, because dynamic typing will hide contract requirements at compile time—unless I think of a way to use macros or other conventions to add static typing so correct contract usage is enforced by compiler.

starting over

     I reached this design accidentally while planning to rewrite my thorn library with a consistent non-blocking api. After stubbing my toe repeatedly on C stack winding, I decided I should just go stackless. I had already worked out this general sort of runtime in the middle 90's when I was planning a spaghetti stack version of Lathe (which I called Mithril then). But that was when I had a programming language focus.

     Now my focus is "I want to write async simulations" for driving distributed, async, peer-to-peer software stacks. So it's funny to come up with the same sort of answer—except it might be a hammer I try to apply to any nail I see (making me see everything as a nail, of course).

     What's the difference? Now I'm just looking for the simplest thing that will work for writing simulations, as long as the basic idea doesn't cause a lot of trouble later. Value in efficient language execution isn't very important, since memory bandwidth is usually the limiting factor in contexts I write code for these days.

     There you go: that's the intro to this particular nascent runtime. I might not write about it again, unless I run across something fun as a surprise.

03jun09 stubborn lotus eaters

self flattery

     Yes, many folks define their own level of natural ability as the ideal, and they really believe it down deep—it's not rationalization—so self definition stops perception of unflattering evidence.

     However, this is just a subcase of a larger delusional pattern; it's not limited or even focused on self definition. Folks tend to avoid contradiction of all kinds, as if all negative feedback causes cognitive dissonance. You're a rare person if clarity trumps the pain of finding errors in plan or concept.

     When I was a teenager I was often surprised to hear folks emphasize the role of negative evidence in science: that one must seek out data undermining a theory. My reaction was always: of course. How else would you tell whether you were on track?

     I only really understood the issue in the last few years. Most people avoid negative evidence with a passion, and hang the consequences of being wrong. (You can just deny being wrong later, expecially if you prefer self delusion.)

     Where I work now I had a very rational coworker for a couple years, who often puzzled over why no one noticed a course of action was dubious. Folks often took care to avoid getting exposed to negative evidence. I kept telling him that's how most people are: they don't want to find out they're wrong, and they'll go to some trouble to stay ignorant.

     In short, most people are anti-rational. They don't want to see the world varies from wishes, and informing them makes you unpopular. A typical person can be so successful at willful blindness they are actually unaware they cultivate blinders.

02jun09 dreamy misfits

seeming smart

     In writing I seem nearly as smart as I am, because I learned how not to sound too dumb in writing. It just takes practice. However, since I never say more than a fraction of what I can, I never convey as much as I know; some views of intelligence include breadth of knowledge, oddly enough, and many folks assume you ran dry when you stop speaking. Go figure.

     Verbally, when talking, I seem much less smart than I am because I have neither gift of gab nor charisma. Also my manner is slightly diffident, because this is my idea of polite behavior, due to where I was raised. But by nature I'm not diffident at all. Most folks erroneously see diffidence as either uncertainty or disguised ineptness.

     In person I come off about two standard deviations below my actual ability. In writing, about one standard deviation. I use very short words—when I can—because one syllable words are clear and fast to say (helping folks who dialog internally when reading). But my vocabulary is huge; I just don't use it, since it often impedes clarity.

     I seem smarter in writing than in person because I'm visual, and writing is visual to me, even though words are involved. Consequently, at work I write great tech memos, but present badly in meetings, which often become attack festivals because it's feasible and makes an attacker seem clever. (Something in my manner must beg for it, like wearing a sign reading, "Kick me," because folks respond.)

single threading

     No, that's not how to best serialize access to single-threaded system calls, like those altering shared global (static) memory. You can explicitly use posix thread api to serialize—that works—but then you've added a dependency on pthreads. What do I recommend?

     For improved elegance, just use one thread only with non-thread-safe api. Have one particular thread access a method that isn't thread-safe. Yes, this implies the way you communicate with that thread is through a thread-safe queue of some kind. (I've seen really baroque communication paths, though, such as queueing one way, then sending events in return a completely different way.)

     However, you only know a single thread does that in your global perspective, where you see everything. Local code should not know another specific thread exists, since api is just a queue. The right api puts requests in a queue, then receives replies later. When a thread is involved, it's no one's business except code consuming the queue.

     It's an open question how many threads you need to handle calls that must be serialized. You can use one, but then that one thread is a bottleneck serializing every single threaded call, so each prevents any other until the queue reaches a given request. But, using N threads for N different calls needing serialized access may be too many threads. So compromise; I'm not even going to outline how to pick sensible tradeoffs: anything reasonable works.

     A code api should see message queues that are thread-safe, if you use threads, but remain agnostic about whether to use threads, or how many. I'd use one queue per seralized call, and some queues might be the same queue when one thread handles all those queues.

     (If some calls use others in the same public api, you need to know this and ensure all get processed by the same thread. Good luck if no one figured this was an important piece of semantics to document.)