XaiJu
Andy Matuschak

Andy Matuschak

patreon


Andy Matuschak posts

Doing-centric explanatory mediums: board game instruction manuals and an unusual Figma document

Publicly accessible version of this post: https://andymatuschak.org/doing-centric/ 

It’s board game night in a post-COVID world. You and a few friends gather around a table to try out a new game. The thing is: there are a lot of these little cardboard tokens and playing cards and—I’m not even sure… totems?

You pick up the instructions and begin to read aloud to the group, but after a few minutes, everyone becomes restless. So you figure you’ll just start and figure it out as you go. Now the first turn is taking forever because you’re returning to the instructions every ten seconds, but you keep losing your place and needing to reacquire it each time; and after the second turn’s over, you realize you need to unwind the first turn because you made an invalid move which will screw up the game; and you’re constantly straining to remember what you’re “supposed to do”; and it’s hard to emotionally commit to the game when you know you’ll only be able to play a few moments before returning to the manual.

Maybe let’s just watch a movie instead?

I’ve written plenty here and elsewhere about one problem with books: that we tend to rapidly forget all but the gist. So here’s another problem with books, and specifically with books meant to help you learn a skill: the medium makes it difficult to collapse the distance between prose and action. Books rarely involve doing what they’re about. Most books, even books intended for skill-building, are only about what they’re about. Reading about the board game, not playing the board game. Reading about counterpoint, not composing counterpoint.

So what? I’ll elide formal learning science here in favor of an appeal to experience.

If you ask people about the highest-growth periods of their life, you’ll notice that the most enabling environments tend to involve doing. A summer spent preparing intensely with your team for an upcoming competition; a startup which failed but taught many valuable lessons; a challenge accepted to write a new song every day for a month; a multiweek meditation retreat; an overwhelming apprenticeship; etc. Books are sometimes a source of knowledge in these stories, but they’re often secondary to a great mentor, teammates, contextual motives, etc—and crucially, to doing.

The analogue here for our board game problem is familiar: consider playing a board game for the first time, but with someone who’s played before. This is a completely different experience from the “cold start” I illustrated earlier. In this situation, the experienced player might give you a brief introduction—nothing which would strain your memory—and then you’d simply begin playing. They’d handle setting up the board, either by themselves or by telling others to shuffle these cards, distributed those tokens, etc. They might narrate: “I’ll go first, to demonstrate. So I start by drawing two cards, then I can choose to either move here or play this card. I’ll move here, which will block John from moving into this open area. Now you’re up. Your goal is to XYZ, and you might start by moving this way. Now if you drew an action card, you can play that immediately if you want; otherwise, read what it says on the card…” As events occur throughout the game, they might continue, narrating what you need to know, just-in-time, explaining the options you might consider, offering feedback. So long as your experienced friend is graceful enough to avoid veering into overbearing Clippy territory, this is a much more pleasurable—and effective—way to learn to play the game.

This might be a consistently great way to pick up a board game, but skill-building books have many practical advantages. Consider information density. If you want to learn to program a quantum computer, there’s a huge amount of material you need to absorb before you can “play” much yourself. This might involve tens of hours of explanation from your experienced companion, which would quickly become burdensome for most people. Explanatory prose might lack personalization and interpersonal connection, but it can be more carefully honed; it does not tire and is ready whenever you are; it can embed figures and abstract notation; it can be consumed non-linearly; it can be read more quickly than speech can be heard; and so on. Perhaps most importantly, it’s a mass medium. The deepest experts and sharpest communicators in the world can craft a book on this topic, and millions of people can hold it in their hands for effectively zero marginal cost.

So: how might we create a mass medium which possesses the book’s advantages, but which is situated in doing? How might we create an explanatory mass medium which feels more like playing a board game with an experienced friend than playing a board game while juggling its instruction manual?

The role of the dynamic medium

One reason it’s hard to create books which are situated in doing: the book is static, fixed. You, the reader, must ferry its words to an environment where you can “do”. And even then, there’s little opportunity for interplay between the doing and the words on the page. Authors can advise how to reflect on an exercise and generate your own feedback, but these are scripts you must execute in your own mind. Video doesn’t appreciably change this situation. But the promise of computers, and the dynamic mediums they enable, is representations which behave and respond.[1]

People often propose we leverage this property to integrate simulated environments for doing. Maybe the biology textbook can embed a little simulated petri dish, which you can use to “do” certain kinds of cell biology, reducing the distance between text and action.

But at least where it’s possible, I’m much more excited about what happens when a computational environment becomes the authentic—not simulated, not “educational”—environment for doing. A non-linear video editing interface isn’t a “toy” way to edit a film; they’re how professional filmmakers actually edit films. Mathematica isn’t a “toy” way to manipulate symbolic expressions; it’s how certain kinds of mathematical work are authentically best done. And so a dynamic “book” about video editing need not involve a “toy” environment for doing, a simulated petri dish. Rather, it can situate itself in the same sort of environment used to edit the best films on Earth.

But what does it mean for an explanatory medium to “situate itself” in an authentic environment like this? How might the explanatory content interact with the contents of the environment?

In the last decade, authors and programmers have created dozens of interactive articles which might inform our answer (see Communicating with Interactive Articles for a good overview). I find the work in this space very inspiring, personally speaking. But I don’t know of any which quite matches the aspirations we’ve discussed so far. These articles may be interactive—may involve some doing—but the doing is situated in little purpose-built sandboxes, not the actual environments you would use to deploy the skill being built. They’re integrated with simulated petri dishes, not an actual lab bench.

An article on a topic in programming, for example, might be structured as a long text interspersed with small interactive code editors you can use to explore a concept. This is certainly an improvement on the typical paper! But relative to our aspirations, “doing” is very much the secondary activity here. These editors don’t much resemble an actual programming environment; you’d have to jump through hoops to take anything you’d done and apply it to a real program. This is a bit like reading a board game instruction manual with interactive pictures depicting simplified parts of the board game. Or, somewhat more unfairly: it’s a bit like an elaborate pop-up book. You’re still not really doing the thing.

If these interactive articles are often structured as wide seas of prose which contain islands of interactivity, one path to integrating authentic environments might be to invert this structure. How might we create “articles” which are primarily interactive environments, but with embedded islands of prose? Taking programming as an example: rather than a textual document with embedded source listings, could we move the whole experience into your IDE of choice, while somehow still presenting the explanatory text?[2] Could we move the YouTube lesson on 3D modeling with Blender into Blender?

Video games excel at this. Sometimes tutorials appear in non-interactive cutscenes, sharply delimited from ordinary play, but better examples (e.g. Portal) present instruction and narrative as a seamless element of the interactive environment, never “stealing” the camera or the controls away from the player. The result is rich immersion in the game environment—a stark contrast with board game instruction manuals.[4]

Video games also improve upon another problem of interactive articles: the challenge of separating prose and dynamic representation. These articles close the distance between text and interactive elements, for instance by linking a number in the prose to a parameter you can directly manipulate in a figure. But in most cases, they’re still physically separated, not visually integrated. The reader’s eye bounces back and forth between the text and the interactive elements, churning working memory to attach referents to objects. This isn’t just a problem for dynamic elements. Static figures in traditional texts have the same problems. But the solutions described in Edward Tufte’s books are rarely applied in the dynamic domain, perhaps because the authoring tools are more complex and isolated. Almost everyone, almost always, is still “separating by mode of production”.

Video games take advantage of audio to layer instruction onto what the player’s seeing, but even when using only text, they can position that text right next to the relevant part of the action. This arrangement allows games to avoid disruptive ping-ponging between instruction and interaction, as we experience with board game manuals. And because the narrative communication is integrated into a dynamic environment, it can behave and respond just like the rest of the environment’s elements. In good games, the authored narrative feels like a continuous response to players’ actions. This collapses the sense of distance one feels when reading a text separated from the “doing” it’s about.

Enter the Figma document

Oddly enough, the proximal cause of this post was an extremely unusual Figma document.

(If you’re not familiar: Figma is, roughly, a collaborative tool for designing the visual representations of software interfaces.)

I’ve been assembling notes on this article’s topic over the past two years. At some point, I’d like to build some prototypes around these ideas and publish a much deeper treatment. I don’t feel I've gathered strong enough ideas for that yet. But you’re reading this now because a Figma document demanded that I write a preliminary “IOU” of sorts.

Figma changed the way copy and paste works in their interface. They made a document to explain the change to their users. I know this sounds awfully mundane. Stay with me. I encourage you to take a look yourself before continuing: click the “Duplicate” button to get started, then zoom in on the upper-left frame. You can explore the document with a free account from your web browser.

The document initially reads like a hypertext slide deck. It shows visually how the new clipboard functions behave. It’s cute that it’s a document about using Figma, both created and consumed in Figma, but that’s not so special. Interleaved with explanatory preamble, the document turns control over to you:

This is where we break down the wall between authored material and authentic “doing.” You’re invited to manipulate the objects which the author has created. The objects aren’t special; they’re the same “kind” of objects which you could create yourself elsewhere. You can copy and paste them into a new document. And when you manipulate these objects, you use the same tools the author used to create the document. More importantly: you use the same tools which you yourself would use to do authentic work in this space.

This could have been a blog post, with little interactive “demo” areas interspersed between paragraphs. But instead, you’re interacting with this document in the full-blown Figma environment. In the design world, this is a lab bench, not a simulated petri dish. Other than the scaffolding text, there’s no distance between the “doing” here and the “doing” in your own creative work. Reading this document, I found myself curious how the paste behavior would work if the groups were structured differently, so I simply used the tools I already understood to set that up, and answered the question for myself. Then later that day, I found myself—without even really thinking about it—using one of the new paste behaviors in a layout I was designing.

It’s important to recognize that there’s a lot of text in this document. That’s part of what makes it so interesting. “Worksheets” aren’t so unusual—there are lots of Figma documents like this which give you an exercise and some context to play in.


By comparison, the copy/paste Figma document is unusual because it’s roughly a thousand words long. There’s a lot of explanatory material here, and there could be even more. The possibility of expansive, in-depth documents opens the door to canonical works in the medium. Separately, the interaction between the authored material and the activities is much finer-grained in the copy/paste document. This creates shorter, more precise feedback loops, closer to the experience of a video game tutorial or to playing a board game with an experienced friend.

This is a Figma document about using Figma. That’s useful, insofar as an elaborate instruction manual can be. But it’s not a stretch to imagine a much more significant variation: a “textbook” about interface design, written in Figma[3]. In this primer, you wouldn’t just be reading about how to design. You’d actually be doing design, embedded within the environment you’d use as a professional. Because there would be no artificial boundaries between explanation and action, I believe such a book could come much closer to the feeling one gets playing a well-designed video game tutorial.

Elaborations on the Figma document concept

Of course, the Figma “meta-document” medium can itself be pushed much further.

The explanatory text in the copy/paste document does not behave and respond. It’s not really a dynamic medium for authors. The fine-grained exchange between explanation and action makes it easier to help readers generate their own feedback, but the medium could go further by reacting to the reader’s actions. Earth, a Primer demonstrates one simple approach, “checking off” suggestions as readers complete them:


We can also imagine topic-specific computational elements. A chapter on color theory could use linked representations to visualize secondary and complementary colors in response to your choice of “main” colors. A chapter on grid systems could help you visualize how different choices of type hierarchy ratios influence the baseline rhythms in your design. A chapter on accessibility might embed contrast ratio meters into your design canvas. Scaffolding elements can fade according to your comfort level with the material. And so on.

Another interesting direction—and one which fits well with Figma’s multiplayer primitives—would be to integrate opportunities for collaborative learning into the text. A standard “move” in collaborative learning is: introduce a problem which can be solved with several different approaches; juxtapose several contrasting students’ solutions and help students learn from each others’ ideas. A Figma design “textbook” could incorporate both solo and “shared” artboards, and potentially asynchronous orchestration features, to help students exchange ideas. The approach could even involve a facilitator and include tools like Desmos’s to separately support their work.

More broadly, the interactive explanatory medium I’m suggesting wouldn’t be limited to “educational” scenarios. They could also be quite useful in the course of working on some meaningful project. For instance, if we’re working on a design for a new operating system, and I invent some new control, I might present it to the team by creating not just a static Figma document, but a document which, through “doing,” helps you see how to use it in a design of your own. Such documents would also be useful as just-in-time professional reference: for example, something like this would come in handy if I’m designing for expanded-color-gamut displays for the first time. “Portals” in these documents might allow you to bring your own design projects “into” the explanatory document, so that you can understand the concept in the context of some real work you’re doing.

Extensions to other environments

Figma is certainly not the only environment which could support a format like this. Where else could we instantiate something similar? What qualities must a system have to enable this kind of document? I can outline at least a few.

Documents must offer some way for authors to communicate explanatory content, potentially at length. Figma has text elements; programming environments have comments. In other environments—like an audio production tool—we may have to add such affordances to the document model. Or maybe the explanatory content can be delivered through audio (or video) channels, though it would be important to ensure that readers could manipulate objects in the document without disrupting playback of the author’s material.

Authors must be able to establish relationships between passages of their content and corresponding elements in the reader’s interactive environment. In Figma, we can position text immediately above each element you’re meant to manipulate. In Finale (a music composition environment), text can be interleaved between musical staves or positioned above specific phrases. In a Roam-based medium, the outliner’s hierarchy can be used to relate author text and reader text.

Authors need ways to linearize and structure their explanation; readers need affordances for “navigation.” In Figma, artboards are arranged in a sequence which can be easily navigated by scrolling or through keyboard shortcuts. Hyperlinks and a “table of contents” artboard support navigation. And the hierarchical list of layers offers a persistent table of contents of sorts. In a programming environment, a sequence of tabs might approximate Figma’s artboards. Or perhaps, as in Natto’s tutorial, we can improve on these with special affordances.

In what other environments might we easily imagine creating such documents?

  • A Hypercard document about writing good choose-your-own-adventure games (surely people have already used Hypercard in this way?)
  • A primer on game development presented as an Unreal Engine document.
  • A workshop on harmonic analysis presented as an extremely elaborate Finale score.
  • A hands-on adaptation of How to Take Smart Notes as a Roam graph.

Here, I’ve limited myself to documents one could create in environments as they exist today. But rather than constraining ourselves to existing affordances, perhaps we’ll one day design systems like Figma so that they better support documents of this kind.

————————

Thanks to Molly Mielke for discussion on the background of these Figma documents. Thanks also to Michael Nielsen and Jonathan Blow for past discussions which have helped shape these ideas.

————————

[1] This framing comes from Bret Victor’s “Stop Drawing Dead Fish”. Here, immutable figures (whether printed or animated) are “dead fish.”

[2] What about computational notebooks like Mathematica and Jupyter? I don’t think these qualify; I don’t know of any strong examples which are “doing-centric”, where the code listings interspersed throughout the text are really about doing the topic in question. As Pavel Pancheckha pointed out to me in discussion about his Web Browser Engineering book: this format seems focused on helping readers understand the code the author has written; it’s not so well suited to supporting the reader as they write the code themselves.

[3] Of course, some of the very best examples (including Braid and The Witness) use no overt explanation at all. This is an extremely powerful approach I’ll have to discuss at another time.

[4] Such a primer would be particularly valuable since, oddly, there is no “standard text” for user interface design.

View Post

Architectures for a more flexible mnemonic medium

I thought I’d try something different for this month’s update—sharing some rough in-progress design work. The tension here is that it takes a huge amount of work to legibly present a design process to others, especially when large swaths of it are unresolved. Good storytelling requires lots of renderings you wouldn’t otherwise have made. Designers who work for agencies or on very large teams at product companies are used to paying this heavy tax, but it doesn’t make sense for me working solo. Thankfully, you all have a great deal of context, so I hope you’ll bear with me and bring your imaginations. That said, though much of this discussion is conceptual, it will rely on actual pictures of interfaces, and so there’s no audio version of this post.

Last month I introduced a key design problem with the mnemonic medium: it puts the author in the driver’s seat, assuming you want to read—and memorize!—linearly and completely. This is appropriate in some situations, but in many contexts readers will (appropriately) want to drive, reading selectively and strategically.

I’ve been rethinking the primitives of the medium to make this possible. The goal is to avoid making several different “modes” of the medium, but rather to identify some elemental representations which can be recombined in different ways in different situations, by both author and reader. When you find the right abstractions, it feels a bit like carving “with the grain” of the universe—more like discovering something that already is than creating something new. VisiCalc cells have that energy; so do files and folders; so do Beziér curves and their corresponding pen tools; Photoshop layers; etc.

Reading scenarios and behavioral axes

Finding such a primitive usually means identifying some pattern common to many kinds of activities. So I’ve been analyzing a variety of reading situations, trying to perceive some shared structure in the differences.

Four scenarios have emerged as useful anchors. These aren’t collectively exhaustive, of course, but they span a large space and illustrate some of the problem’s texture:

  1. Reading a guided primer to gain one’s footing in an unfamiliar subject of significant personal interest. Quantum Country is a good example if your interests align.
  2. Extracting pearls from important publications relevant to a creative project or interest. For instance, if you’re a machine learning researcher, you probably studied the GPT-3 paper with careful interest.
  3. Consulting a reference, almanac, or handbook for information on a specific question. This would include Wikipedia and many articles on Our World in Data. But it might also include handbooks like “The Great CEO Within”, which is structured to be read piecemeal in response to your situation’s needs.
  4. Reading for edification. New Yorker articles; reflective blog posts; general-audience non-fiction like Sapiens or Thinking, Fast and Slow (assuming those topics aren’t a creative focus for you). These are usually read with a light touch, in large part for enjoyment.

Should the mnemonic medium target all these different situations? It’s not clear to me. The benefit is much more obvious for primers, for instance, than for edification reading. Practically speaking, it would be easiest to focus on “linchpin” texts with a large number of readers who have consistent interests. But one principle I’ve learned in design is that while the 80/20 principle is useful, it’s often quite valuable to design for a much larger scope than you intend to immediately implement. The latter approach can help you find more general primitives, primitives which you can “grow into”. 80/20-ing often produces architectures which will make future expansion difficult. Worse—once implemented, limited architectures sometimes constrain how you can think about pursuing future expansion. The trick, of course, is to figure out exactly where to place the bounds on your problem. The design problem becomes much more difficult as you expand the scope, and often much less connected to the pressures of some authentic situation in which you’ll be prototyping. Should I try to produce a design architecture which incorporates future machine-learning-based prompt generation interactions? Which integrates fully with the rest of one’s personal knowledge management system? For the moment, I choose to answer “no”, but I’ll try to handle a large range of different reading contexts. These are all scenarios for which I feel expert-authored spaced repetition prompts could be tremendously useful.

One important axis of variation in these scenarios is completeness. If you’re reading a guided primer or studying an important publication, you may want to remember all the key details. But if you’re consulting a reference like Wikipedia, you’re probably reading quite non-linearly, focusing only on a few sections of the text—and the memory system should behave accordingly. Likewise, you’re usually less interested in completeness when reading for edification: you may read the whole essay front to back, but you mostly just care about a few high-level “take-aways”. Note that these latter two situations differ in the nature of their incompleteness: reference readers will want a memory system which supports selectivity and non-linearity, while edification readers will want a memory system which supports higher granularity and a lower level of detail.

Another important axis of variation is legibility of reader intent. A primer positions itself as an explicitly instructional resource, so authors can assume that readers really do want to learn the material. It’s reasonable for the text to pause intermittently and challenge you to remember key details. But in most of these other situations, embedded review activities can feel misaligned with reader intent. We hear this clearly, for instance, in reader feedback to David Chapman’s mnemonic essay, “Maps, the territory, and meta-rationality” . You’re reading for edification, and then—bam—an interface element appears demanding that you remember some fine detail of the argument. As one reader noted, it “feels like school”: the demand is out of proportion with your level of commitment. In many situations, embedded review will seem pompous or presumptuous, as if the author’s proclaiming that readers must hang on their every word. The trouble here is that some readers of Chapman’s essay approach it as it if were an explicitly instructional resource. For these readers, the embedded reviews are a boon, but the author can’t know ahead of time which reader is which. The medium must permit self-selection without feelings of imposition.

A final axis I’d like to introduce: readers’ propensity to write their own prompts. Say you’re reading an important paper related to a creative project. You may want to remember all the key details the author describes, but the most important prompts may be about that paper’s implications for your own ideas. Likewise, if you’re reading an essay about relationships, you may find yourself wanting to write a prompt or two connecting it to your own life. But when reading a reference you’re typically just grabbing what’s there; when reading a primer, you generally don’t know the topic well enough to draw many connections of your own.

So we’d like to identify primitives for the mnemonic medium which can be used in different ways—by both readers and authors—to address different positions on these axes.

Workflows and primitive desiderata

One way to think about this is to consider typical workflows for different positions:

  • when completeness is high, “add all prompts in section/page” is the common-case interaction… but those prompts may get refined iteratively in future review sessions
  • when completeness is low, the medium should offer lightweight paths for opting into prompts about personally relevant details…
    • … either through outlets for spontaneous impulses of interest, in the moment while reading,
    • … or by retrospectively deciding in bulk what details seemed important at a suitable stopping point
  • when readers legibly intend to study, it’s welcome for the medium to intermittently prompt them to recall what they’ve read
  • when reader intent is not legible, the medium must not alienate readers by pressuring them to review
  • when readers are likely to write their own prompts, I see two workflows of interest:
    • when a reader suddenly notices a connection they want to remember in the middle of a passage, they should be able to immediately capture it
    • in my own practice, I find it natural to reflect on “so what?” at the end of a piece (or, for longer pieces, at section breaks), and to write prompts on my observations
  • when readers are unlikely to make personal connections, affordances for prompt-writing should not impede the workflow for collecting authored prompts

This list of workflows suggests three significant presentation contexts for prompts, in addition to the existing review context:

  • bulk sets of prompts presented retrospectively at section breaks or the end of a piece
  • inline contextual interactions for responding to in-the-moment impulses while reading (“ooh, let’s make sure we remember that!”)
  • iterative refinement during/after/between review sessions, in the context of the Orbit app rather than in the context of an author’s web page

Readers may fluidly move between these contexts, so we don’t want to create artificial boundaries. We’d ideally like some elemental primitive which can represent a prompt across many contexts—much as a Tweet feels like an elemental object equally at home embedded in isolation, displayed in a list, or partially constructed in a composer interface. The detailed affordances might differ in each of those contexts, but the form of a Tweet holds solid and feels like the “same thing” across those different environments.

Because some low-completeness workflows will involve readers quickly scanning and choosing from a list of potential prompts, it’s important that prompts can be displayed in a compact representation. This will likewise help in machine learning-driven contexts, where we probably can’t reliably generate exactly the prompt you want, but we may succeed often enough if we can generate ten prompts and let you choose.

In his 2019 paper “Agency plus automation,” Jeffrey Heer identifies an important pattern for designing artificial intelligence into interactive interfaces: shared representations which can be created by either human or machine. This allows the machine to suggest possible domain actions which can be fluidly adapted/adopted by the user, and for the user’s actions to be understood in the same terms by the ML system. Google’s search result suggestions (as seen below, excerpted from Heer's paper) are a good simple example. This principle may apply directly if we pursue ML-based prompt generation in the future, but even in the present, it applies well to the problem of co-mingling author and reader prompts. Author prompts should be the same “kind of thing" as reader prompts. Readers should be able to co-opt and alter author prompts with minimal ceremony, just as they do with Google search results suggestions.

There’s a related cluster of desiderata for a list representation, which I’ll summarize as fluidity. Given a list of potential prompts (either by author or by ML model), readers should be able to quickly choose the ones they want and make any necessary edits. If a reader discovers while writing a prompt that it actually wants to be two prompts, this shouldn’t involve moving through multiple “screens” or “modes.” Prompts are like sentences, and they’re often edited holistically; when writing or editing a prompt, I should always maintain “peripheral vision” of its “neighbors”, just like editing a sentence in a paragraph in a text editor.

For the last two years I’ve been writing prompts in plaintext files in a text editor, and I’ve come to really value the flexibility that comes with seeing and editing multiple prompts fluidly and simultaneously.[1] I’d like to bring this fluidity to embedded reading contexts, but with more structure than my freeform text files. I’ve been particularly inspired by to-do list apps, which often feature the fluidity of a continuous textual canvas. Apple’s Reminders does roughly what I have in mind:

A related benefit of the continuous textual canvas metaphor is that it suggests a malleable interchange format. You should be able to select many prompts and copy them—they’re just Markdown, perhaps with some extra markup. So you can paste them into Twitter, or into a PDF annotation, or into a source file’s comment section, or whatever. And you can likewise copy plaintext prompts from such places and paste them into Orbit. Complex prompt “generators” can be implemented as macros in your editor of choice. I think such malleability may lead to surprising user behaviors.

Even though it’s often not appropriate to impose tests on readers, section breaks remain an important place for embedded interactions, particularly in long pieces. Assuming the author’s doing their job, the section already represents a somewhat cohesive cluster of material. By anchoring an interaction at the end of the section, we can lighten the burden of evaluating and indicating relevant prompts: in many cases, the sections will naturally reflect boundaries of reader interest. Readers who were keenly interested in the section may “add all” then remove a few less interesting prompts; readers who mostly skimmed through the section may skip the associated prompts, or skim them and add one or two.

Separately, Quantum Country readers reported a few interesting implications of embedded interactions at section breaks. Those review sessions provide a sense of safety: if you’re feeling unsteady while reading, you can take some comfort in knowing that a review will soon reassure you that you understood what you were meant to—and if not, that you’ll get some feedback on what needs more attention. For some readers, the embedded reviews create a visceral sense of “progress” while reading: the reviews help them feel their progress in understanding the topic. For other readers, the embedded reviews help them regulate their reading: “I finished this section and thought I understood—but answering these questions, I found I had absorbed absolutely none of it! So I went back and re-read more carefully.” Of course, these effects are double-edged. They’re helpful if your stance towards the text is explicitly one of a diligent student; they’re representative of the medium’s inappropriate imposition when you’re reading more casually or tactically.

Solution sketches

I’ve been exploring the space of representations which might satisfy all those properties. It’s always an iterative process, and what I’ll show here is certainly interim work.

First, we should do no harm to the primer scenario. For texts like Quantum Country, we’d still like the “default” reading behavior to involve the interleaved review breaks, and to result in “saving” all the prompts encountered. But in the context of these defaults, we’d still like to offer more control. If a question doesn’t seem meaningful, you shouldn’t be studying it. If you’d like to reword a question, you should be able to do so. If you’d like to add a prompt of your own, you should have that ability.

My approach here is:

  • primers and other explicitly instructional media can embed interleaved review areas which initially behave much as they do today
  • but during the review, readers can incrementally assert control as they feel so inspired:
    • they can skip a prompt they find uninteresting
    • they can edit a prompt whose wording they find unappealing
    • they can write a prompt of their own if they feel so inspired
    • they can switch directly to a list view, which I’ll describe in a moment
  • prompts the reader reviews (but not those they skip) are automatically added to their account; prompts they skip are not
  • when they finish the review set, they’re presented with a birds-eye list of the prompts, where they can:
    • undo the automatic behavior of adding the prompts they reviewed
    • selectively add/remove individual prompts
    • edit and add new prompts as they like

The practical upshot: readers get the current default behavior with zero additional interactions, but they have a smooth gradient of opportunities to modify that behavior before, during, and after the review.

You’ll notice that the latter screenshot demonstrates a list-style interface. The idea is that it’s a continuous textual canvas, like the Apple Reminders interface above. And so writing new prompts is an inline experience, not a modal one:

(view larger image)

The same list representation can be naturally repurposed in the Orbit app as a “library” view, for iteratively refining prompts over time or writing new ones.

The workflow I’ve depicted above makes sense for primers and other contexts in which readers legibly intend to study. In most other scenarios, it’s inappropriately pushy, but the same embedded element may work well in many contexts if we simply invert it—that is, if it defaults to displaying the list view, rather than the review experience. With that change, the element’s emotional posture shifts substantially: it becomes a resource at the reader’s disposal, rather than the demand of an insensitive instructor. If the reader does intend to carefully study the material, they can start a review from the list.

I think this kind of embedded list makes a lot of sense at the end of a blog post, and perhaps interleaved into explanatory material which may have readers of many different interests levels. It still strikes me as a notch too presumptuous for the context of many papers (it’s a large visual element which implies: “my findings are worth memorizing”). And it’s too coarse a workflow for particularly low-completeness reference scenarios, like an Our World in Data article.

For these latter scenarios, I think an inline contextual interaction will be valuable. At a high level, in some situations you want to be able to simply point at things you find important while you’re reading, and trust that you’ll magically internalize those details. Apart from the obvious tractability concerns, there are conceptual issues. Analyzing Quantum Country’s prompts, I found that about one third didn’t correspond exactly to any text range. Instead, they roughly synthesize the material presented in a general vicinity (e.g. “What’s the inverse of the X gate?”). These somewhat higher-level questions are often more conceptual—and often more useful!—than questions which represent some declarative fact verbatim. And so in many cases, you can’t really point to a particular phrase that would naturally correspond to the most useful prompts.

Rather than insisting on precise inline ranges, one interesting approach is to introduce an inline sigil which indicates “prompts are available covering this general vicinity”. General, synthesis-oriented questions might appear in one of these sigils at the end of several paragraphs introducing the concept.

(view larger image) 

Note the same list representation repeated in the popover behind the sigil.

Another somewhat more outlandish approach is to allow readers to select anything at all, and we’ll simply display all prompts associated with that vicinity—and the synthesis-oriented prompts I mention are associated with large text ranges. Readers can pick and choose which make sense.

Yet another approach is a persistent sidebar which again recapitulates the list representation, but in a fuzzily-anchored, Google Docs comments-inspired layout. Orbit’s art direction is too overbearing for this kind of persistent presentation.

Irrespective of which path we pursue, it should be possible to write your own contextually-anchored prompts anytime the inspiration strikes you, even if there aren’t any author-provided prompts nearby.

There are a number of challenges in conceptually integrating the embedded blocks with these more inline interactions. In the coming weeks I’ll continue to iterate on these architectures and move into interactive prototypes. More when I have it!

————————

There’s a new systematic review paper about retrieval practice in the classroom (Agarwal et al, 2021). On a whim, I live-streamed writing notes and SRS prompts about this paper. Those interested in my note-writing practices might find that interesting.

I’d like to thank Ozzie Kirkby for prototypes and discussions which contributed to the ideas presented in this post. Thanks also to Nick Barr, Sara LaHue, Marcos Ojeda, Taylor Rogalski, and Gary Wolf for helpful feedback and discussion. Finally, thanks to all of you for the ongoing support which makes this work possible.

————————

[1] I should note that RemNote has been pursuing a similar idea commercially. I’ve not used the product in my own work, but I was impressed by their recent redesign, which streamlined a bunch of formality which previously felt overbearing. My primary interest remains prompts in the context of communications media, so I’m very happy to see these folks making progress on prompt-writing environments in the context of personal notes!

View Post

Revamping the mnemonic medium around reader control

When you read non-fiction, you’re in the driver’s seat. You can skip to the last page and read only the conclusion. You can riffle through the pages, reading only the headings; or you can spend a week reading ten pages with extreme care. You don’t need to focus on just one text: you can compare one book’s ideas to another sitting by its side. Great non-fiction authors exert careful control over their prose, but once a book arrives in your hands, it becomes a tool in your service. You’ll use each book according to your own needs and interests.

This reader-centricity distinguishes non-fiction texts from other informational mass mediums, like videos and lectures. Those forms make it much more difficult for viewers to “drive” the experience, for instance to focus especially on the parts they find most interesting. In fact, abdication of control is part of the appeal. It’s fun to put yourself in the hands of a master explicator like 3blue1brown. I watch his videos when I don’t want to be in the driver’s seat; I want to sit back and enjoy seeing the topic through Grant’s eyes.

When the goal is enablement, though, reader centricity offers some important advantages. Authors model what their readers might already know and what they might be interested in, and then structure their texts accordingly. In most contexts, a one-size-fits-all (or even -fits-many) solution is impossible, but that’s okay: readers can collaborate with the author to mold the experience to their interests. In this way, texts enable readers in a wider range of contexts than author-centric mediums can reach. Perhaps more importantly, texts support readers in remaining relentlessly focused on their own sense of what’s meaningful.

All this said, I can now articulate a key problem for the mnemonic medium: it glues authors to the driver’s seat. Its key insight is combining spaced repetition memory prompts with narrative prose. Those author-provided memory prompts make it easy for people to remember what they read, but at the cost of sharply shifting control back to authors. Reading a mnemonic text is very unlike reading a normal text. The current interactions demand not only that you read a text in full, but that you repeatedly study—and commit to memory—whatever the author thinks is important, in whatever form the author chooses. The memory system isn’t “yours”; it’s on loan from the author, kept under glass. As Gary Wolf has pointed out to me, it’s an authoritarian medium.

Where today’s mnemonic medium succeeds and fails

Quantum Country succeeds despite these limitations because it’s a primer in a well-established technical field. Because it’s a primer, it can safely assume that most readers have little prior experience. They may be especially willing (and it may be especially appropriate) to defer to the author. Because Quantum Country is an introduction to a well-established field, there’s a set of topics it’s expected to cover. Its table of contents is partially a matter of authorial choice, but much is a reflection of general consensus. Readers who want to understand the field won’t commonly feel the need to pick and choose from these foundational concepts. And because quantum computing is a technical topic, the content of the prompts is less contingent on an author’s choice of metaphor or phrasing. It’s closer to capturing some standardized representation of physical law. So readers are more likely to be happy internalizing prompts as written, rather than wanting to rephrase them in terms which better match the way they think about the topic.

The mnemonic medium’s current design works much less well in contexts where these assumptions don’t hold. For instance, in my own mnemonic essay on How to write good prompts, many readers have substantial experience with prompt-writing, while others have never written one before. Readers’ areas of interest and levels of commitment vary widely. The essay’s topic is not well-established or standardized: I’m inventing my own abstractions, which resonate well with some readers’ experiences and poorly with others’. And because it’s a non-technical topic, the prompts are much more contingent on my authorial choices of metaphor, naming, phrasing, etc. Based on reader feedback, other authors’ similar mnemonic essays have experienced similar problems.

This isn’t just a problem with relatively informal topics, or with less-committed readers. One valuable application of the mnemonic medium is to augment important academic papers. Summer intern Ozzie Kirkby has been exploring topics in decentralized technologies, so as an experiment, I adapted the Interplanetary Filesystem (IPFS) paper into the mnemonic medium, and he logged his experience reading it. Many of my prompts worked as written, but others focused on aspects he didn’t personally care to explore. Ozzie found himself wanting to internalize much more detail in some sections, so he wrote additional prompts in his log (pretending he could add them to Orbit that way). A couple prompts felt too complex given his prior knowledge, so he split them into smaller prompts. I believe that issues like these would occur with most paper-reading experiences, since contexts and motivations vary so widely, and since papers are read much more tactically than essays.

Edit and delete buttons aren’t enough

Practically speaking, the reader feedback I get sounds like simple requests for control: can you add a button that lets me delete prompts? Can you let me edit the author’s text? I’ve been hesitant to implement these simple features because I think the mnemonic medium’s fundamental model needs re-shaping. A traditional text is a tool to be used as the reader sees fit. The reader’s in the driver’s seat. But edit and delete buttons aren’t enough to remove mnemonic medium authors from the driver’s seat. Even with those buttons, the reader would just be along for the ride, perhaps making adjustments along the way.

Part of the problem here is the medium’s positive starting assumption that the reader is expected to collect all the author’s prompts (except those they veto). This creates a school-like learning aesthetic, a sense that the author is assigning you to get X, Y, and Z out of the text. Some readers will respond to this with frustration and abandonment; others will respond by dutifully but passively studying things they don’t really care about, in practical misalignment with Orbit’s values (“helps you deepen your relationship with whatever you care about most … not for things you think you ‘should’ be engaged with … not ‘educational’ in tone”). This mismatch is really what keeps me from advancing the mnemonic medium (as it exists) only in Quantum Country-like “primer” contexts, contexts in which many readers don’t mind abdicating control: I want to promote a more active, less dutiful stance towards learning.

Another part of the problem is that an editing affordance, absent a more complete authoring experience for readers, would still strongly privilege the author’s writing. Such a medium would permit readers to adjust the author’s wording to better match their understanding, but wouldn’t permit readers to capture an interesting connection they noticed to some idea outside the text. Readers couldn’t capture a detail they found interesting, but for which the author didn’t provide a prompt. Readers couldn’t capture an idea the text inspired the next day. More philosophically, this asymmetry promotes the idea that prompts are something you consume, not something you create. Imagine if you were forbidden to use any writing materials of your own while carefully studying a book: you’re only permitted to write in the page’s gutter margins. Your thoughts would shrink accordingly.

Of course, a simple solution here is to add an authoring interface to Orbit. One possibility I’m excited about is that authors’ prompt-writing practices will scaffold readers’ own prompt-writing abilities. But this won’t happen if readers can’t fluidly write prompts when inspired to do so. People today can add their own prompts in Anki or SuperMemo while they read a mnemonic essay. But my instinct is that it’s harmful to create a strong separation between engaging with author-provided prompts, and writing prompts of your own—for instance by requiring readers to visit a separate app or page to write their own prompts while reading. If you’re looking at prompts, you should be able to write prompts. Author-provided prompts should feel like material in your hands, malleable to your needs and interests, not a different “kind” of thing from prompts you write for yourself. I want the fluidity of plaintext: copy and paste between documents, edit and combine coarsely or finely, bulk-manipulate, generate, grep, pipe, tweet, etc.

Memory fantasies

Let’s step back and examine the medium’s original goals. To indulge in science fiction fantasies for a moment, it’d be great to jack a plug into our neck, flip a switch, and deeply understand any topic. That’s not possible (for now)—but how close can we get?

Imagine that whenever you read—or thought—something interesting or important, you’d simply remember that idea. Learning wouldn’t be quite as easy as plug-and-play, but you could read a book (once) and remember every meaningful detail; you could tinker with those ideas on your workbench and remember everything you noticed as you put them into practice; you could discuss those ideas with a collaborator and remember all the implications that arose. To fend off dystopian objections, imagine that you can also effortlessly correct any memories which turn out to be false or unhelpful. All this is also impossible, but we can at least come somewhat closer.

Spaced repetition memory systems let us remember a specific detail at the cost of 20-60 seconds over the course of the first year, and < 10 seconds per year thereafter. Not quite effortless, but also not terribly onerous—if you have a good prompt already written. Writing good prompts requires much more effort than reviewing does, and it requires a skill few people have yet developed. This suggests a slightly more achievable version of our fantasy: a genie automatically creates spaced repetition prompts for everything interesting you read, think, or hear; you remember all those details at the cost of, say, ten minutes’ daily review.

Our genie would need two skills: it must read your mind (i.e. to notice what you find interesting, in what context, and understood in what way); and it must write effective spaced repetition prompts (i.e. which cue retrieval of the details which constitute understanding). As a more plausible approximation, perhaps you can imagine hiring someone to follow you around all day, to sit in on your meetings, to read what you’re reading, to listen as you think aloud—and to write memory prompts for everything which seems important. You could call them your “chief of memory,” a nod to your chief of staff. This would be awfully expensive (and intrusive), of course; and because this assistant can’t read your mind, their work would be imperfect. But it’s interesting to consider as a non-sci-fi model you could actually implement if you had the means. Can we approximate this model more affordably?

Books are surprisingly analogous to at least part of this situation. If you’re wealthy, you can hire a personal tutor to teach you about a topic. This is inconvenient in some respects, and of course, it’s quite expensive. Happily, thanks to the printing press and the internet, you have the option to read a book about the topic instead. In some respects the book will offer a better experience than you’d have with a tutor: when you buy a book, you can (indirectly, partially) hire the greatest domain expert in the world to teach you. The book’s prose will (hopefully) represent careful editing and sculpted narrative, rather than improvised explanation. And of course you can read much more quickly than you can listen. There’s no equivalent conversational analogue to “flipping through the pages” or “scanning the headings.”

We can apply a similar logic to our “chief of memory,” at least for the portion of your day spent reading. The idea here is that many of the details you’d find important in those texts would overlap with details other readers would find important. And so perhaps the work that your “chief of memory” would need to do as you read this text would overlap substantially with the work that others’ “chiefs of memory” would need to do. If there’s enough overlap, it might become a high-leverage opportunity for a domain expert—perhaps the author, perhaps a different expert—to write prompts covering these overlaps. Indeed, that domain expert may be able to write higher-quality prompts than a general-purpose “chief of memory” could for that specific text.

Now we arrive at an idea resembling the mnemonic medium—and still not the mnemonic medium. In our thought experiment, the idea is that you effortlessly remember everything you find interesting or meaningful. The author-provided prompts are just material for that process. More concretely, imagine that as you read, you talk aloud to your “chief of memory” by raising an eyebrow or gesturing at material you find important. If the author’s already done the work to write prompts about that bit, and the prompts match what your assistant thinks you found interesting, they can take a shortcut and use the author’s prompt. Otherwise, they have to take the slow path of new writing a prompt by hand. Perhaps in some cases you speak aloud: “this detail actually relates to the problem I’ve been having with my research project; it’s a reason to consider doing X instead of Y.” Unlike the mnemonic medium, this workflow is driven by the reader’s interest, rather than the author’s specifications. The author-provided prompts are a shortcut, not a list of expectations.

Okay, now imagine you have no “chief of memory.” I recognize that we’ve come awfully far from “I know Kung Fu” at this point, but if we restrict ourselves to just the portion of your day spent reading, how close can we get in software using author-provided prompts? Ozzie and I have been exploring some approaches along these lines. Our prototypes are quite nascent, but perhaps you can imagine highlighting details you find interesting while you read; if the author has provided relevant prompts, you’ll collect those; otherwise, we might provide a bulk-editing interface for shaping annotations into prompts, including those you might create from nothing.

Trading away effortlessness

There are many problems with this approach! One central tension is that in almost all cases, the correct amount of authority to assign to the author is not zero; the correct amount of control and responsibility to assign to the reader is not 100%. How should we negotiate the spectrum of control between authors and readers?

The author’s prompts are not in fact just a shortcut, as I’d described earlier. They also carry meaning. Prompts signal what the author finds important. They communicate a norm around what it means (or at least what the author believes it means) to understand a topic. They give authors the opportunity to communicate the prose’s ideas in a different way, or even to create conversation between the prompts and the prose—but that’s a topic for another essay. They cue attention and participation; when they’re working well, they create a feeling of support and safety.

And so maybe we should still present the author’s prompts, perhaps as annotations, but ask readers to “opt in” to each prompt they’d like to collect. The trouble here is that interaction is a cost center in interface design. If 80% of the time, 80% of the users want to collect 80% of the author’s prompts, a naive opt-in mechanic would require readers to perform a huge number of interactions to indicate the common case. Imagine reading through Quantum Country’s first chapter and clicking on 112 prompts in the margins to collect them all. You can perhaps improve the situation by batching those interactions in end-of-session “review areas,” akin to Orbit’s current review areas—but I don’t think that’s enough.

If we’re not careful, we won’t just require users to perform excessive interactions: we’ll also distract them and create decision fatigue. Imagine that as you’re reading a text, you’re constantly evaluating prompts that appear in the margin alongside the text: do I want to collect this prompt? Or do I want to write my own? Your eyes dart back and forth between the text and the sidebar; your attention is drawn away from the text. The frenzied nature of this design can be improved by bulk operations at the end of sections, but it’s still hard to imagine asking users to explicitly decide whether they’d like to keep each of the 112 prompts in Quantum Country’s first chapter. It’s already quite an imposition to ask them to try to remember the answers.

Another difficult problem is reconciliation. Imagine that you find a detail particularly interesting, so you highlight it. You see that the author’s provided a prompt about that passage—great. But now you need to decide: is the author’s prompt about the sense I found interesting? You need to form some picture of the prompts you would write, if you were to write prompts, then read the authors’ prompts and perform a sort of diff. Worse: if you don’t quite trust the author as a prompt writer, you also need to evaluate the quality of their prompts. My experiments with reading others’ mnemonic texts suggest that both these activities are quite taxing.

I think a successful approach here is likely to be more incremental, annealing the prompt set through a number of stages. Perhaps you mark passages you find interesting with some very coarse interaction. If a passage inspires some specific prompts you know the author won’t cover—for instance because they’re about a connection to your present project—you can write those inline as you read. Intermittently, at the end of sections, you review the author-provided prompts from the passages you’ve marked, both to reinforce your memory and to offer a lightweight opportunity to discard or edit those which obviously don’t work. Perhaps you notice at this point that the author didn’t provide prompts about some detail you found important, so you write some on the spot. Then, over the following weeks in review sessions, you refine the prompts from this text, modifying and trashing those which don’t inspire, removing inadvertent duplicates, filling in details with new prompts, adding connections you hadn’t noticed, and so on. But critically, the text’s prompts feel like yours; they’re co-mingled with your own prompts about your own ideas. Taylor Rogalski suggested a metaphor I like here: someone sent you their Google Doc, then you clicked File > Make a Copy so you could scribble all over it with impunity.

A postscript on machine learning and language models

I imagine that many of my readers have been chanting it this whole time. What about language models?! Why insist that authors (or those who adapt their texts to the mnemonic medium) do all this work? What about the long tail of texts which will never be adapted?

I know. I’m interested in these questions too. I’ve experimented with several approaches along these lines, and my impression so far is that some partial automation here is possible, for certain kinds of prompts, but that a decent solution will require a great deal of work and a great deal of insight. I don’t believe the simple approaches floating around are likely to represent viable paths. But I do think this work is worth doing; if you’re interested in and capable of taking it on as a research project, and you’d like me to supervise (and possibly help fund) or advise your work, please reach out.

My instinct is that we’ll be best off approaching this as an augmentation rather than an automation problem, at least in the near future. I have some specific workflow ideas along these lines, but they’ll have to wait for another essay.

————————————

I’d like to thank Ozzie Kirkby for prototyping ideas around this problem with me this summer; Nick Barr, Ty Jung, and Taylor Rogalski for extended discussions and whiteboarding; and Gary Wolf for valuable correspondence on this topic.

And I’d like to thank all of you, Patreon sponsors, for your kind support. It’s quite remarkable to have the opportunity to pursue open-ended exploration like this. Your contributions make it possible, and I’m grateful.

View Post

Armories for tool-maker / tool-user collaborations

I’ve previously argued that great tools for thought rarely come from contexts focused on creating tools. They’re usually created in the course of deep creative work in some domain, almost as a byproduct. And they’re usually made by people with significant expertise and investment in those creative problems. Stephen Wolfram created Mathematica to accelerate his original research on cellular automata; Alan Kay created Smalltalk to support PARC’s personal computing experiments; Dan Bricklin created VisiCalc in business school to help solve financial models. All three of these inventors were serious computer scientists invested in some original domain problem.

In these cases, the inventors’ domain problems were actually about math and computing, which certainly makes the overlap in skills likelier. We find another common source of overlap with computer scientists inventing tools in relatively universal domains. For instance, Charles Simonyi (inventor of the WYSIWYG word processor) was a computer scientist—but as a knowledge worker, he was also naturally a prodigious producer of memos. It’s telling that the affordances in Bravo (and later Word) are much more suited to memo-writing than for, say, novel-writing.

Outside these common areas of overlap, it’s harder to find great computer scientists who are also great domain practitioners. In most domains, great tool-makers are rarely great tool-users, and vice-versa. Michael Nielsen has pointed out that you would probably rather have Stradivarius make your violin than Joshua Bell, but you’d probably rather hear Joshua Bell play. Each activity—violin-making and violin-playing—requires independent virtuosic skill and a lifetime of practice. It’s rare to find both abilities in the same person!

Even if you did, there might be some advantages to separating the work: the tool-maker could constantly consider abstraction and generalization, and the tool-user could focus on meaningful problems at hand without being distracted by issues of systematization.

So certain kinds of progress in “tools for thought” may depend on figuring out a way for “tool-makers” to work very closely with “tool-users”—people who are in some sense devoting their lives to playing the instruments we create. This goes beyond the typical lite-ethnographic “design research” practiced by firms like IDEO: luthiers must become at least modest violinists themselves, embedded in those communities, and capable of making original (but probably limited) contributions to their problems. Likewise, the tool-users must become active participants in the design project, learning enough about how the tool is built to contribute meaningful ideas.

Photoshop is a good example: it was a collaboration between Thomas Knoll, a computer science graduate student; and his brother, John Knoll, an artist working on special effects at Industrial Light and Magic. I pursued a similar arrangement with Michael Nielsen when developing Quantum Country, and with teachers like Scott Farrar when researching novel educational environments at Khan Academy.

Orchestrating this kind of pair collaboration is much more challenging than the simpler case of a single effective tool-maker/tool-user. I’ve been collecting notes on the space; today I’d like to describe one strategy that seems particularly important in the space of medium invention.

"How can working on the problems of another discipline, for the purpose of enhancing a collaborator, help me as a computer scientist? In many ways:
* It aims us at relevant problems, not just exercises or toy-scale problems.
* It keeps us honest about success and failure, so that we don’t fool ourselves so easily.
* It makes us face the whole problem, not just the easy or mathematical parts. In computational geometry, for example, we can’t avoid the cases of collinear point triples or coplanar point quadruples. We can’t assume away ill-conditioned cases.
* Facing the whole problem in turn forces us to learn or develop new computer science, often in areas we otherwise never would have addressed.
* Besides all of that, it is just plain fun to look over the shoulders of those discovering how proteins work, or designing submarines, or fabricating on the nanometer scale."
The Computer Scientist as Toolsmith II, Fred Brooks (1994).

Building an armory

In a collaboration focused on medium invention, the tool-user might be an author, filmmaker, or artist. They’re the one with the authentic problem—a question to be answered, something to be said, an aesthetic to explore. Practically speaking, if we expect the tool-user to drive the context of use in these collaborations, they’ll be the primary instigator of the pair’s creative projects: hey, this story idea I have really wants to be expressed in a non-linear medium. Now there’s a project, and the tool-maker can start riffing on ideas around mediums which would support those creative aims.

But there’s a tricky chicken-and-egg problem here. If great creative work should drive the invention of new mediums, how can the initial idea driving the creative work get started in the first place?

In this model, the instigating context starts in the tool-user’s mind, likely the result of semi-private tinkering or reflection. For our example of expressing a story concept in a non-linear medium, maybe the tool-user got their idea when reading an article about Twine, then ran up against some of its limitations and mentioned them to a tool-making collaborator. But this means that as the tool-user tinkers with ideas which might become new creative projects, they’ll mostly be incorporating tools that are already almost “within grasp.” If a pair hopes to explore some idea in tool-space through a serious creative project, that idea must already be concrete and accessible when the tool-user’s conceiving of a next project. The pair must proactively develop an armory of tool ideas (both embryonic and mature) to equip the tool-user’s explorations.

Tool ideas in the armory don’t have to be working software. They just have to be solid enough to stand on—understood well enough that the tool-user can tinker with them in emerging creative projects. For example, when Michael started writing Quantum Country, the software was only an early sketch: the important thing was that he understood the idea’s shape well enough to imagine a book which powerfully implemented it. While sketching the book, it was fine (for a while) to simply write “Question: How many dimensions does a qubit’s vector space have?” in the text. The core idea behind the mnemonic medium was already “in the armory,” built on years of experimenting with spaced repetition systems.

The armory has important implications for the pair’s relationship and the division of labor. The tool-maker isn’t conceiving the creative projects, so if there’s an idea the tool-maker finds promising, they must drive its development to the point that the tool idea is “in the armory,” so that it can then inspire future creative projects. On the other hand, the tool-user’s mostly not developing tool ideas, so if they're struck by a tool idea—but that idea is not yet solid enough to be “in the armory”—they should be able to lean on the tool-maker to drive the idea’s development until it’s ready for the armory.

In both cases, the pair must develop ideas for the armory without assurance that they’ll necessarily be used by a future project. Entering the armory is just a necessary (but not at all sufficient) precondition for use in a serious creative project.

This conception is mostly useful in the context of repeated collaborations, particularly across distinct projects. The pair can learn plenty about their tool ideas as the tool-user deploys them in serious creative projects, but each project has limited scope for iteration. That’s because it’s difficult to maintain emotional connection to a creative project across delays and breaks.

To see why, imagine that the tool-user has written a first draft of a book in a new medium. Lessons from that draft prompt new iterations of the pair’s medium ideas. Though the tool-user may be plenty interested in those explorations, they're likely not interested in totally rewriting the book using a new iteration of the medium. The medium can evolve a bit—particularly early in the drafting process—but it rapidly pushes against a gumption limit.

If the tools involved in each creative project can only evolve a limited amount during the course of that project, then each project’s tool frontier is substantially determined by the state of the tools at the start of the project. This challenge suggests that the pair can best evolve their ideas through a sequence of creative projects.

There’s a tension around the size of the projects in the sequence. Smaller projects will support more dramatic iteration in the tools—a higher “learning rate,” if you will. But small projects may not be serious enough to develop the tool ideas. Ideally, I suppose, the pair pursues the smallest sufficiently-serious projects they can.

I see these dynamics play out in practice as I speak with authors and teachers in the context of potential collaboration. They’re often excited about the medium ideas I’ve expressed, and so they suggest projects or contexts which make use of the concepts I’ve published. Very naturally, their ideas for projects skew towards the most understandable paths for those new medium concepts. But of course, the way to push these ideas forward is to focus on the least well-understood notions.

If I want to make that happen, I need to make those ideas more concrete and graspable—in other words, I need to add them into the shared armory. And, of course, I need to much more deeply internalize the creative domains of my collaborators, so that I’m stocking the armory with concepts best suited for their problems.

View Post

Finding research–context fit

The life of an early startup revolves around a desperate search for “product–market fit”—a state in which you’ve found a solution so compelling in some market that the world starts yanking the product out of you faster than you can make it. That’s when the exponential flywheels can start spinning and a startup can start to make good on its rocket-fueled ambitions.

Aspiring inventors of “tools for thought” aren’t chasing rapid customer growth, but their research velocity does critically depend on a related phenomenon: research–context fit. I’ll illustrate what I mean by describing my own struggles to find this fit.

Quantum Country’s poor research–context fit

I’ve been trying to create environments that make it much easier for readers to engage with complex ideas—environments that aim to substantially expand what people can think and do. In 2019, Michael Nielsen and I published Quantum Country, a textbook on quantum computation written in an experimental “mnemonic medium.” The medium integrates powerful ideas from cognitive science intended to make it much easier for people to remember what they read.

Now imagine that most of Quantum Country’s readers are new graduate students. They’re about to embark on original research in a new field; they’re trying to find themselves as independent thinkers; they’re desperately trying to learn a notoriously challenging new field; they’re perhaps more than a little overwhelmed. They’d viscerally feel the successes or failures of a resource like Quantum Country in their day-to-day experience of the most important activity in their lives. Conversation with students would drive forward the research on the medium. The qualitative impact on students’ understanding would suggest important new research questions. In this world, I’d have research–context fit.

But that’s not the situation. Most of Quantum Country’s readers—even readers who keep up with its review sessions for months—are simply curious people, interested to learn the basics of quantum computing. Quantum Country is a more elaborate version of some other dense non-fiction they might read on a Sunday morning with their coffee. These readers might enjoy the book, but the success or failure of the medium isn’t viscerally apparent in their experience.

This context doesn’t create enough pressure on the medium or its ideas. Where are its biggest deficiencies? Along what axes can it meaningfully expand readers’ capacity? When can a person perfectly recall hundreds of details but still fail to understand or act? It’s hard to say. Worse, the questions which naturally arise in this undemanding context tend to emphasize a weak framing for the medium, one focused on helping casual readers enjoy the feeling of learning. I aspire to a much more powerful framing: to develop a medium which significantly expands readers’ capacity to do whatever they find most meaningful. The best way to develop a system like that is in a context which would ravenously metabolize its benefits. Intense relief and deep frustration would drive both questions and answers.

The problem I’m describing is common in the “tools for thought” space, particularly for inventors focused mostly on augmenting other people, rather than themselves. As Michael and I noted in How can we develop transformative tools for thought?:

There’s a lot of work on tools for thought that takes the form of toys, or “educational” environments. Tools for writing that aren’t used by actual writers. Tools for mathematics that aren’t used by actual mathematicians… It’s very easy to slip into a cargo cult mode, doing work that seems (say) mathematical, but which actually avoids engagement with the heart of the subject. Often the creators of these toys have not ever done serious original work in the subjects for which they are supposedly building tools. How can they know what needs to be included?
Suppose you want to build tools for subject X… Unless you are deeply involved in practicing that subject, it’s going to be extremely difficult to build good tools. It’ll be much like trying to build new tools for carpentry without actually doing any carpentry yourself. This is perhaps part of why tools like Mathematica work quite well – the principal designer, Stephen Wolfram, has genuine research interests in mathematics and physics.
There’s a general principle here: good tools for thought arise mostly as a byproduct of doing original work on serious problems. They tend either be created by the people doing that work, or by people working very closely to them, people who are genuinely bought in. Furthermore, the problems themselves are typically of intense personal interest to the problem-solvers. They’re not working on the problem for a paycheck; they’re working on it because they desperately want to know the answer.

It’s worth emphasizing: this principle seems to explain so much of the failure among aspiring inventors of tools for thought! I find it incredibly difficult to heed, myself. Like so many other technologists, I have a natural tendency towards tool-fixation. If I leave that tendency unchecked, I slip into exactly the kind of failure mode that passage describes.

As far as developing a tool for authors, Quantum Country managed this principle reasonably well. My co-author, Michael, was also the co-author of the standard textbook in quantum computing. He’s also quite serious about the challenges of being an effective writer, both generally and about this topic specifically. Likewise, in my work to expand the medium to other domains with Orbit, I’ve spent a lot of effort building relationships with potential authors for the mnemonic medium, so that their problems become my problems.

But I only recently realized that I’d been neglecting the same principle as it applies to readers. For those of us interested in creating communications media (like the mnemonic medium), we must create a close working context with both the authors and with the readers/consumers—or else we must authentically inhabit one of these roles ourselves.

Finding a better context

It’s not enough to just work much more closely with readers: I need to find readers in a context strong enough to support my research. Here are a few properties which seem important for good research–context fit, not only for the mnemonic medium but for tools for thought in general.

  1. Strong signal. You want to be able to run experiments and make observations with enough clarity to actually answer your primary research questions. With Quantum Country and Orbit right now, it’s quite hard to answer basic questions necessary to developing the medium. Very broadly, this might mean: what’s working? What’s not working? What is the actual impact of the medium on readers’ capacities? And more narrowly: how do specific changes to the medium affect its impact on readers, and on what it enables them to do? What sorts of understanding can different kinds of interactions support, in what topics?
  2. Real stakes. You want a context where the impact of your work has at least the potential to be transformative. You can’t expect instant success, but to guide your work, you need to be able to see at least glimmers of non-linear returns. One way to think about the goal of a project like the mnemonic medium is to create an environment that’s radically more enabling for some significant set of readers—to over-simplify, a 10x reading environment. But no amount of medium-level improvement can “transformatively enable” readers who aren’t interested in engaging with the subject beyond indulging casual curiosity.
  3. Drives questions. Perhaps most importantly, you want more than a context which can help answer your research questions: you want a context immersive and demanding enough that it defines its own compelling research questions. Ideally, you’re not coming with your own strongly-held research questions at all. Instead, they should arise naturally in response to the context.

One confusing aspect of this discussion is that I’m actually quite interested in augmenting curiosity-driven readers. Sunday morning reading might start casual, but it’s often the seed for later meaningful creative work. How can I reconcile this observation with the concerns I’ve been describing? Very roughly: I think there’s significant path dependence in this research space. Like a biologist without a microscope, I need higher amplification to understand the phenomena I’m studying. This seems like a common story in tools for thought. Tools are initially developed in some critical context, intense enough to foment particularly powerful ideas, and then they’re later deployed more broadly in less demanding contexts. Of course, that last step isn’t automatic. Wolfram created Mathematica to support his needs as a researcher; I’m sure it needed extensive adaptation to support more casual tinkerers. Likewise, an environment developed to augment deep reading will require careful modification to augment informal reading. But I suspect it’s probably much more difficult, or impossible, to evolve a medium in the opposite direction—to evolve Calca into Mathematica.

At the moment, my best-guess “ideal” reader context looks like: serious people trying to enter a difficult new field (probably technical) for some purposeful creative project, like original research or a startup. Because I’m developing a communications medium, I need to find the intersection of such readers and a highly-motivated author. To get enough signal, it would be best to find a tight, energetic community of such people, and to immerse myself in it.

Simultaneously, I’ll need to find domain-expert authors eager to help new entrants—and to collaborate deeply with them. It’s not enough to give an author a pre-made tool and to answer their questions as they try to write. Their sense of what the topic needs, and of their readers’ challenges, must guide the medium’s evolution. A powerful new communications medium must be radically enabling not just for readers, but also for authors.

More concretely, the two most promising author/reader contexts I’m exploring:

  1. A professor’s monograph/textbook/notes written to get new grad students in their department up-to-speed on key topics necessary for their research.
  2. An industry leader’s book meant to help people start or join companies in a challenging new field (e.g. biotech, machine learning). Readers might have either a scientist’s or a technologist’s background.

To illustrate the issues involved in choosing a context, I’ll discuss a few other contexts which seemed promising initially, but which I now fear present serious problems.

University courses

One key question I’d like to answer is: how does the fluency you build through retrieval practice relate to your capacity for understanding and creative problem-solving? University courses have built-in structures (like exams, projects, essays) which could help me explore that question.

But readers in this context have a different problem than the one I’m trying to solve. Most of them are not learning as part of some broader meaningful creative activity. Most of them are responding (appropriately!) to external incentives: learn this set of things you’re supposed to know; pass your classes; get good enough grades; etc.

I remember what it was like being an undergraduate. I was taking five classes. I was somewhat interested in half of them, and the others were requirements. I wanted to learn, sure, but campus life held lots of other fascinations. Besides, most undergraduate courses have low expectations of their students’ fluency. If a professor had told me “use this system for two hours throughout the semester to reliably remember everything from my class,” I might have believed their claim, but I probably wouldn’t have followed their advice. My existing practices seemed to work “well enough,” even though of course I knew that much of what I learned wasn’t “sticking.” And indeed, this attitude seems to match our experiences in an experiment with an undergraduate class this spring.

Lots of people have made systems meant to help people get better grades in their classes more efficiently and reliably. By contrast, I’m interested in systems which expand people’s capacity for thought and action, around whatever they find meaningful. The two goals are related, but they’re not the same. Test scores are a proxy for a certain kind of capacity, but the pressures they apply to the medium are unlikely to shed light on the problems I want to solve. I have spent enough time in the field to understand that “education” is a mighty force. It is much more likely to subvert my work than I am to subvert it.

[Medical / law / business] students

Part of the problem with the undergraduate context is that those students don’t actually need fluency to achieve their proximate goals. But medical students sure do. There’s already a huge community of medical students using spaced repetition to internalize the huge body of knowledge they need to learn for their work. I suspect that law and business students may be in a similar situation.

But as I read the medicalschoolanki subreddit, my sense is that these students are primarily driven by a desire to get a good grade on their high-stakes examinations, and secondarily by an abstract pleasure in accumulating knowledge which might someday be needed. My wife’s a physician, so I asked her what she thought drove medical students’ study practices. Her instantaneous response: “fear!” Fear of not passing, fear of a grade too low for the residency you want, fear of embarrassment in front of an authority figure. But not fear of harming patients (there are several layers of supervision); not fear of lacking knowledge essential to research projects; not fear of failing to understand something you desperately want to understand.

I’m sure that some students in these environments feel differently. Perhaps I can figure out how to work with them. But my growing suspicion is that these contexts won’t supply the right pressures for my research.

Onboarding

If you’ve just joined a new company, you’re (hopefully) eager to be productive as quickly as possible. But there’s usually a huge amount of basic knowledge necessary before that can happen. New employees at Stripe, for instance, must rapidly learn a huge volume of company- and industry-specific terms, concepts, and procedures. On top of that, a new developer will need to learn all kinds of details about the company’s internal infrastructure. This could make for quite an interesting context for the mnemonic medium.

One challenge here is that formal corporate learning systems are almost always soul-sucking monstrosities that feel like the worst of school. What banal horrors come to mind when you hear “compliance training” or "reskilling”? You can tell that every action you take when you use these systems is being compiled into a “reports dashboard” for some administrator somewhere. There’s a whole industry around this stuff called “enterprise learning and development.” Cynicism aside, I know of several well-intentioned startups (some dead, some still trying) which have applied spaced repetition in this space. I worry that they’re making enormous sacrifices to the medium's potential in order to appease their buyer. One huge challenge is that the buyer in this instance is not in fact the user, and so the expected principal–agent problems prevail.

The trick in this space would be to find a company which is big enough for better onboarding to seem quite important, but small enough for it not to be awful-by-default. My instinct here is to reframe the interaction so that it’s an employee-centric tool: point your magic wand at anything which seems important to you while you’re coming onboard, and you’ll remember it effortlessly! The tool serves you, not some enterprise dashboard. This is more or less the opposite of the usual employer-centric curriculum-on-rails which “feeds” employees tasks. Framed in terms of a startup, such an effort would be an exercise in “disrupting” the enterprise learning and development space by introducing a “grassroots” tool which employees excitedly adopt on their own, before advocating for broader adoption within the organization. A bit like Slack’s path, I suppose.

This all sounds quite miserable to me—of course I’m trying to find a research context, not to “win” a “market segment”—but I still believe this path could be very interesting with the right partner.

High-stakes life changes

Certain key life events tend to be associated with a spree of book-buying: having your first child, founding your first company, building a house, grieving a loss, and so on. Many less-technical topics like these are nevertheless quite high-stakes, and they’re connected to deeply meaningful activities. Since many people in these situations are already buying books, perhaps there’s potential impact in making those books much more effective?

I hesitate here because while all these things are difficult, I’m skeptical that “learning complex ideas” is really the most important limiting factor for these situations. Of course, there are other interesting opportunities for better books in these domains, opportunities which aren’t about learning complex ideas. But I’ve spent a lot of time thinking about how to use the mnemonic medium to help people learn complex ideas, and I feel it would be a shame to shift focus without pushing harder on that problem.

I have similar concerns when considering the potential of augmenting books about personal development. For example, I’ve argued that a book like Atomic Habits could be much more powerful if the authored experience were extended over time, to help readers integrate its ideas into their lives. But my instinct is that taking such a challenge seriously would mean developing a very different medium. To give another example, studying the Buddhist dharma does involve internalizing a fair amount of precise knowledge. Maybe the mnemonic medium could help with that. But retrieval practice for the eightfold path and the four noble truths is probably not the most important opportunity to augment such books.

————————

Part of me is worried that I’m overthinking this and letting it stop me from building momentum within some “good-enough” context. Another part of me is worried that I’m not worrying about this nearly enough—that it’s actually by far the most important problem facing my work, and it should be the exclusive focus of my attention until it’s resolved.

I’ve had enough experience with the problems of poor research–context fit in my work at Khan Academy that I’m inclined to believe good fit is an essential condition to good work. It’s subtle. It doesn’t show up as an obvious blocker like missing equipment or skills. But if you aspire to augment other people by building systems for them, your domain of insight is substantially determined by their domain of use.

View Post

Crowdfunded research vs. the NSF CAREER grant; open-sourcing Orbit; new technical collaborators

Transcript:

Hello everyone, and happy May. I think celebrating is always more fun in video format, so I'm making this impromptu video today to celebrate a couple of exciting pieces of news with you all.

The first is that the Patreon community that you all are a part of, or perhaps you're visiting today, has hit an important milestone. I wanted to share that and some thoughts about what it might mean for crowdfunding research in general.

The second is that I'm open-sourcing Orbit today, and there’s some interesting discussion to be had about funding models, research, and that decision.

And the third is that I now have a modest budget which I'm hoping to use to involve some more people in my work.

Crowdfunding 2/3 of an NSF CAREER grant

First: this funding milestone. Now, if you were a new tenure track professor at an American research university, one of the main really career grants that you would be seeking is this grant the NSF provides called the CAREER grant. Now, it's not a sure thing—funding rates are about 15% to 25% of applicants—but it's meant to support early career researchers for about five years, at a relatively modest rate. They may need to supplement it with some other funding, but it’s enough that they can get their research going in a meaningful way. If you're at a top tier research university, you'll see pretty much all of the faculty in the sciences having one of these grants.

While it's a relatively modest grant, it is our existing institutions’ most common entry-level grant for new investigators. And so it's an interesting thing to compare against, when talking about alternative funding models.

So the exciting milestone for this community is: that we're now crowdfunding at a level about two thirds of one of these NSF CAREER grants—which is really meaningful! We're starting to come up on now substituting for this existing institution’s job in a real way. But of course there are some important differences between these grants in crowd-funding land, and the NSF-provided CAREER grant.

I thought it would be interesting to talk about those differences.

Now, one of the most important differences is really a weakness in what I've been doing so far. These career grants for new investigators are meant to include a really substantial section about your research plan, but also a really substantial education plan, and a plan for how you are going to integrate education into your research. That could mean a variety of things. It could mean mentoring post-doctoral scholars. It could just mean undergraduate education. But ideally what these grants are trying to support is a synergy between research—developing new knowledge on the one hand—and education—supporting the future generations of knowledge creators on the other hand. Part of the point of the grant is to subsidize those future researchers that you're going to be training. I don't have really any of that as part of my work right now, other than the lightweight community engagement that I do on Twitter.

It's really interesting to me to think about what it would mean to try to ramp up that component. One of the things that concerns me is that it's difficult to engage in education in a field, which has such poorly defined methodology and practice. It's not like there's a textbook or a curriculum that I could refer people to, or really like a strong preexisting tradition. Obviously there's a lot of prior art in human computer interaction and in cognitive science for the kind of work that I do. But this inventing novel cognitive-augmenting interfaces thing? We haven't, we haven't really figured out a great theory or framework for how to do that.

So I've been wary of engaging in any kind of large-scale education activities. But I do want to try to ramp that up over the coming years. So I'm investing a fair amount of effort in to trying to capture what it is that I'm doing, as I'm doing it. I’m hoping to synthesize that into something that I might actually be able to use, both to mentor other people who might want to research with me, and also maybe to publish something which might help people getting into the space. But this is a spot where I would not qualify for one of these grants with what I'm doing. You can also see this as an advantage: you can say, well, these education components in new fields are something that just slows down the research, so I have more of a position as a “research scientist,” where I just get to focus on that. I don't really have to worry about teaching and service. There are goods and bads associated with that.

And of course, hiding in all of this is the fact that graduate students and post-doctoral scholars are a source of labor. I mean that not in, not in the code-implementing way (although we'll talk about that a little later) but in the sense that there are avenues that I'd like to explore, that I just don't have the capacity to explore right now. And it would be nice to have other people doing that with me.

But I also feel there's a moral challenge to taking on students for me in this field. If I were a professor at a university and an established field, and you come to me as a graduate student, then there's this implicit deal. You work with me for a few years, and after a few years, you are going to step out as an independent investigator of your own and establish your own agenda. You’re hopefully going to be able to do that by, say, getting a faculty slot at some other university. It's a pipeline. And of course you're not guaranteed a faculty slot. In fact, there's far too many graduate students for faculty slots, but that's the understanding that you’re taking on this kind of apprentice position. You're rolling the dice and saying: I'm competing for one of these faculty slots that are available at various institutions, and funding slots for those positions, which are provided by grant makers.

Now the moral issue is: that pipeline, that path, those institutions, and those funding sources don't really exist in this space. And so if I'm taking on someone who's excited about doing independent research of their own, they can work with me for awhile and that might be helpful to both of us, and maybe push my ideas forward, and maybe help them develop their own practice. But, there's really no end state where they get a shot at doing it on their own. They'd have to create that for themselves. And arguably, they'd be put in a pretty bad position to create it by working with me as an apprentice for a few years because they're not going to be amassing capital that they could use to support themselves. Maybe they could have been developing relationships with funders or with an audience or with grant makers, but they're going to have to figure that stuff out on their own.

This lack of a next step for a student is a problem. It's both a practical problem, but also a moral problem. I feel bad for taking someone on in this way when there's not clearly a place for them to go next. So that's one key difference between these funding sources.

Another interesting difference between an NSF career grant and the Patreon crowd-funding model is that the latter is a highly continuous, incremental process.

If I'm applying for a CAREER grant, that's kind of like a “big bang” process. I'm going to spend a a hundred-plus hours on the application for that. I'm going to submit this thing and there'll be a long process, which as many as six months later will maybe give me a decision. But that decision, if it's positive, will support me for five years. And so for five years, I basically just don't really have to worry about that source of funding, I guess I have to be worried about my next grant, you know, maybe a year or two before the five years is ended, but it's this very discrete funding model.

The Patreon model, on the other hand, has this really interesting property that it's continuous and incremental and ongoing. This is both good and bad. The good part is that you don't need this really well thought through plan in order to get started and get some funding. To get an NSF career grant, you're supposed to have a pretty clear picture of how you're going to carry out this research and what you hope to achieve with it, which arguably limits the types of research you can do, or requires that you've already done a bunch of the work upfront. In fact, it's pretty common for junior faculty to apply for this NSF career grant in their second or their third year of being faculty because of that. So it's kind of a shame. The discrete nature of the NSF career grant seems to discourage—or rather it's not meant for funding highly exploratory, speculative research that the investigator can’t articulate very clearly, or where there's going to be a lot of bricolage involved.

This incremental, continuous funding is also an opportunity for people who don't yet have much standing, or even potentially much skill, to incrementally build up their funding source. Something I'm excited about is the idea that maybe you have a day job, but in your nights and weekends, you start exploring some projects that you find interesting. And maybe you can build up a small crowdfunding community around that and start earning enough money from that to make you feel like those nights and weekends are like really well invested. And so you kind of double down on it, and maybe eventually you get to this point where you can leave your day job. Now, obviously there's a big jump there, but I think that's an exciting property of the crowdfunding versus the grant model.

Another interesting thing about the continuous, incremental nature of the crowdfunding model, is that it doesn't expire, per-se. The NSF CAREER grant is for five years, and it's granted to about 300 researchers per year, but the Patreon funding is potentially indefinite. So as long as you keep doing work that people want to support—and that's both good and bad.

It's good in that you don't have to completely change up your funding model and have this all-or-nothing switch in year four of your work. But it's bad insofar as there's this feeling that you've got to keep producing, or maybe it's not as stable or secure-feeling as having all of these funds in a guaranteed way, as with the CAREER grant.

The last difference that I wanted to discuss between the NSF CAREER grant and the crowdfunding approach for research is this question of who gets to decide what to fund.

It's really different answers in the two spaces. For an NSF CAREER grant, the most important part of the decision process is what's called a merit review, wherein your proposed research is reviewed by about three peers or informed experts in the space that you propose to investigate. They will issue written comments and scores and have some discussions with coordinating officers who will then decide what to fund. It's a relatively small number of high-expertise people, and that has advantages and disadvantages. So you don't have to convince that many people, on the one hand. But there’s also a lot of problems with peer review. Perhaps these are people who are established in your field, and you're trying to take your field in a weird direction, or maybe there isn't really a field around the work you're doing. There's some concern that peer review tends to perpetuate stasis, or more conservative ideas. I think that's probably at least partially true. Certainly in my adjacent field human-computer interaction, peer review seems to promote a set of fairly conservative values and processes that would really impede my work, I think.

In the crowd funding model, on the other hand, you have lay-people or interested fans deciding that work is interesting, and you have to persuade hundreds of them, instead of just a couple. That seems harder. But then, you don't have to persuade them very hard. Merit review is kind of a high stakes thing. I think these reviewers feel like they're setting standards for a very important institutional body, so they're going to be kind of defensive about what qualifies and what what's worthy. But a patron deciding to toss you five bucks, even five bucks a month? It's just not that high stakes, by comparison. So while you have to convince a lot more people, you maybe don't have to convince them very hard. And that's interesting.

It's also interesting that they are much less likely to be, say, experts or peers or something like that. Now, obviously I do have a lot of funders who are experts and peers, and I'm grateful for you. Uh, but a lot of them are just like random internet people who find the work interesting. Which is great! This is good in that it means if you have this work that doesn't really like fit in, but other weirdos (who maybe aren't high-status or high-power) think it's interesting, then you can maybe cobble together some funding through crowd sourcing that might be difficult to achieve through traditional channels.

But a challenging part of this is: to what extent can the crowd actually evaluate research? In my case, I worry that, if I were investigating something that was like equally important, but less obviously legible to casual observers, that I would have a lot more trouble getting funded. And that's a real concern. I worry that this general effect might make crowdfunded research more boring. Or maybe more driven by fashion or mass market appeal.

Anyway, long story short, I'm really excited that this crowdfunding community has managed to go much of the way to matching our main government funding structure for new faculty in the sciences. I mean, that's really, really remarkable. I definitely would not have expected that crowd-funded research could do that two years ago.

Hopefully, it's something which can generalize to more than just me.

Open-sourcing Orbit

So all this leads us to the second topic I wanted to talk about, which is that I’m open sourcing Orbit, which is the platform I’ve been developing for the mnemonic medium, and programmable attention in general.

I'm doing that because we have reached this funding milestone.

The path that I've been on is—when I did not feel safe in my source of financial support, I felt like I had to leave my options open. Like I've spent a year working on the software, now jeez, am I going to have to like commercialize this? Am I going to have to raise VC money around this?

These are not things that I wanted to do, but I felt, well, unsafe. And so I felt like I had to leave that option open. That's why I had not open-sourced Orbit up until this point. Now I feel sufficiently safe that I am past that point.

But it's probably important to discuss: why open source this kind of thing? Something like Orbit, which is about knowledge management about your mind—I think it really wants to be open-source. There's a practical perspective on this, which is that the whole conceit is you're trying to kind of bring in these micro tasks, these pieces of knowledge from all kinds of different places. That feels, if not open-source, like it wants to be open formats, open protocol, things like this.

And then also just ideologically, it's very intimate. I think in terms of trust, I don't like to spend a lot of time putting my mind into a system that could potentially just be taken away someday. Or that has the smell of lock-in. Or that feels like some weird format that I'm never gonna be able to parse. I'm hoping that by open-sourcing, I’ll create some trust. Hopefully it'll engender some interesting mashups and experimentation.

Stepping back, I view my role as provisioning public goods. And open source software is generally in software land a better public good than closed source software. Now, there are types of software which really benefit from being commercial. In fact, the public benefits from those pieces of software being commercialized. Stripe, for instance, is an important piece of infrastructure. You could even argue that it’s like a public good. But it needs like a huge number of people maintaining it and all these operational people and there's like partnerships and stuff. All of that needs like a lot of resources to keep it running. But something small like this? I'm hopeful—though these words may bite me—that I can keep this service running with relatively little effort. I feel very anxious even just saying that. We'll see.

It is possible that long-term, the Orbit data formats and the application, and even the server (if you want to self-host)—all of them are open source, but maybe running the service is so taxing eventually that some fees are necessary in order to make sure I can hire someone to be on PagerDuty orwhatever.

I’m not really thinking about that right now. But it does come into play in the licensing model that I chose, which is kind of interesting. I haven't seen many projects do this, though a number have done things kind of like this. So what I'm doing, licensing-wise, is that all of the libraries are licensed under the permissive Apache license. And then the main application and the main backend server are dual licensed under AGPL (which is a, uh, strong copy-left license, and it’s viral over the network), as well as the Business Source License software license. It’s a relatively new software license that allows you to do whatever you want for non-production purposes. And, and after some number of years (three years in Orbit’s case), it turns into the permissively licensed Apache software.

This licensing scheme is meant to deter cheap copycat commercializations. Khan Academy had this problem where when it was open-sourced, a bunch of people downloaded the app, changed the name, put it on the app store, and charged $10. And at Khan Academy, we’d get these angry letters. It's really bad. If they're using your name, you can yell at them for trademark purposes, but sometimes they'd actually do it in a way that was totally compliant. And it just felt really bad. I don't want that. But then also, if I do end up needing to someday charge for the service model, I'd like to reserve some asymmetric power to capture some of that value as a payment for my upfront development costs. But like I said, that's not something that I'm thinking about now. It's also not something that I want to do. I’m not interested in commercializing this. It sounds boring, basically.

Along with the open source comes some interesting dangers. I have run, or been a large part of, two large source projects before. Both of them were very taxing experiences. And I do not seek to repeat that experience!

In recent years, platforms like GitHub have kind of redefined people's norm for open-source to be a project which is community run. And the community decides on the roadmap and everybody submits issues and anybody can propose features and so on. That's fine and good for a lot of projects. But it would tax me beyond my abilities for Orbit. So we're going to have something… not like that. We’ll have to articulate what it will be, but essentially the idea is that at least for the near future, this thing is primarily a vehicle for the research that I'm doing, and perhaps that collaborators will be doing with me.

So I'm going to try to be minimizing extensive open-source community interactions that are not in direct service of those research goals. Probably that will open up over time. There's more informationin the repository you can read about this.

That’s a lot of cynicism or negativity. But I do believe in, and have experienced, the positive side of open-source. Where generous contributors have delivered a lot of value, both to me and to the community, by coming in and implementing a feature that needed to get done, but that I didn't want to do. Or by identifying some horrible bug that no one knew existed but that I really cared about.

I look forward to those kinds of contributions and I really am excited to be open-sourcing the project.

In the coming weeks and months, I will probably launch some kind of mailing list for people who are interested in participating in, or following along with development decisions, roadmaps, things like this. But it's too early for like a user's list. Most of the things that people tell me that they want, or that are bad? I know! They bother me a lot too. I'm mostly not looking for that kind of engagement right now.

More technical collaborators

The final thing that I wanted to talk about is experimenting with expanding the set of people working on these tools. As we start approaching the funding level of an NSF CAREER grant, my expenses are increasingly taken care of, and maybe I can even save a little bit for a rainy day, which is nice. It makes me less nervous.

At this point, or relatively soon, marginal funds start being not for my pocket or for my mortgage, but rather for getting poured back into the project, which will almost invariably take the form of paying other people for their time working on it.

Ideally I'd love to be able to fund both technical staff who can really help me out by implementing important pieces of infrastructure and improving on my, my very rapidly written research quality code, as well as research staff who can meaningfully investigate some of the open questions and design challenges that seem to bear. And actually, if the crowdfunding model does continue to grow at its current rate, it looks like by the end of next year, I should—maybe—be able to afford to pay an intern or two, or a contractor for half of the year who's relatively affordable. Something at that level. That's pretty exciting! If we do actually continue on pace to that, it means, maybe in a year or a year and a half, that this is a model that can support not just me, but potentially others, too.

This mirrors what the NSF CAREER grant is meant to do as well. It's something that is supposed to also potentially support at least part of maybe a graduate student's stipend, or possibly stipends for undergraduates who are assisting with research. This kind of thing.

But that's a little ways away: the end of next year. And one thing that I've talked about here before is that there's this substantial challenge in switching back and forth between “research mind” and “implementers’ mind”. I find that even just trying to do the two in the same week is very difficult. So the, the model that's worked best for me is spending months at a time in one of those mindsets and then switching back and forth between.

So, for two-thirds of 2020, I was in “building things mind” with Orbit, and then actually for much of this year, I've been back in “research mind.” You've seen a little bit of that output, but more to come from that period. And now I'm starting to switch back to the “implementing mind,” but all of this is frustrating.

When I've been in research mind for awhile, as I have been now, I'll feel like: “Oh my gosh, there's all these things that I know would make this better. And it would let me answer these questions.” But I can't. No progress has been made, and months have gone by, so it feels like the project is kind of staying still. And then when I'm spending a lot of time implementing, like I was last year, I feel like there are these dozen research questions that are articulated at the start of the year, and I've made no progress on them, and I'm not actually being a researcher. I'm just like being an engineer. And that feels bad too.

Ideally, it would be nice to have some technical help before the end of next year. So to that end, a couple of exciting notes.

There've been a few volunteers who've reached out to collaborate with me on Orbit, maybe on their nights and weekends. And one of them has actually made his first commit to the repository. I'm very grateful for Ozzie’s help.

And the other exciting news there is that a small group of generous donors have made some one-off gifts to allow me to this year have a modest budget to hire a technical contractor. That's very exciting—to maybe have a little bit of professional help!

I've started the process of trying to find the right person for that. Ideally they are able to be somewhat technically independent, and take on moderately-sized infrastructure projects without a ton of oversight, which is a very difficult property to achieve simultaneous with my very modest budget for contractors at this time. I think the best candidates will be people who are already excited about the research, excited about contributing to open-source software—and are perhaps willing to accept more modest compensation in conjunction with those things. And perhaps in conjunction with getting to run some of their own ideas about related projects by me, and possibly get some mentorship there. We can do a light “barter” kind of thing.

So like I said, I’ve begun conversations with a number of people about that. But I am still on the lookout for a contract hire. And so if what I've just described—that set of impossible conditions—sounds like you or someone you know, please do send them my way.

————————

So to wrap things up, I just want to express a lot of gratitude. Really, this is a lengthy video about this bizarre funding model for research that, uh, I didn't think was possible. And it really is only possible because of you all. It really is a remarkable thing—you're doing something quite unusual here. Time will tell whether this is generalizable, or even whether it's sustainable for me personally. But I’m really grateful to be exploring this with you all.

View Post

[Audio version] Crowdfunded research vs. the NSF CAREER grant; open-sourcing Orbit; new technical collaborators

I'm terribly sorry for all the duplicate emails: Descript garbled my exports. Still learning about these "content creator" tools… anyway, here's a fixed version!

And a transcript:

Hello everyone, and happy May. I think celebrating is always more fun in video format, so I'm making this impromptu video today to celebrate a couple of exciting pieces of news with you all.

The first is that the Patreon community that you all are a part of, or perhaps you're visiting today, has hit an important milestone. I wanted to share that and some thoughts about what it might mean for crowdfunding research in general.

The second is that I'm open-sourcing Orbit today, and there’s some interesting discussion to be had about funding models, research, and that decision.

And the third is that I now have a modest budget which I'm hoping to use to involve some more people in my work.

Crowdfunding 2/3 of an NSF CAREER grant

First: this funding milestone. Now, if you were a new tenure track professor at an American research university, one of the main really career grants that you would be seeking is this grant the NSF provides called the CAREER grant. Now, it's not a sure thing—funding rates are about 15% to 25% of applicants—but it's meant to support early career researchers for about five years, at a relatively modest rate. They may need to supplement it with some other funding, but it’s enough that they can get their research going in a meaningful way. If you're at a top tier research university, you'll see pretty much all of the faculty in the sciences having one of these grants.

While it's a relatively modest grant, it is our existing institutions’ most common entry-level grant for new investigators. And so it's an interesting thing to compare against, when talking about alternative funding models.

So he exciting milestone for this community is: that we're now crowdfunding at a level about two thirds of one of these NSF CAREER grants—which is really meaningful! We're starting to come up on now substituting for this existing institution’s job in a real way. But of course there are some important differences between these grants in crowd-funding land, and the NSF-provided CAREER grant.

I thought it would be interesting to talk about those differences.

Now, one of the most important differences is really a weakness in what I've been doing so far. These career grants for new investigators are meant to include a really substantial section about your research plan, but also a really substantial education plan, and a plan for how you are going to integrate education into your research. That could mean a variety of things. It could mean mentoring post-doctoral scholars. It could just mean undergraduate education. But ideally what these grants are trying to support is a synergy between research—developing new knowledge on the one hand—and education—supporting the future generations of knowledge creators on the other hand. Part of the point of the grant is to subsidize those future researchers that you're going to be training. I don't have really any of that as part of my work right now, other than the lightweight community engagement that I do on Twitter.

It's really interesting to me to think about what it would mean to try to ramp up that component. One of the things that concerns me is that it's difficult to engage in education in a field, which has such poorly defined methodology and practice. It's not like there's a textbook or a curriculum that I could refer people to, or really like a strong preexisting tradition. Obviously there's a lot of prior art in human computer interaction and in cognitive science for the kind of work that I do. But this inventing novel cognitive-augmenting interfaces thing? We haven't, we haven't really figured out a great theory or framework for how to do that.

So I've been wary of engaging in any kind of large-scale education activities. But I do want to try to ramp that up over the coming years. So I'm investing a fair amount of effort in to trying to capture what it is that I'm doing, as I'm doing it. I’m hoping to synthesize that into something that I might actually be able to use, both to mentor other people who might want to research with me, and also maybe to publish something which might help people getting into the space. But this is a spot where I would not qualify for one of these grants with what I'm doing. You can also see this as an advantage: you can say, well, these education components in new fields are something that just slows down the research, so I have more of a position as a “research scientist,” where I just get to focus on that. I don't really have to worry about teaching and service. There are goods and bads associated with that.

And of course, hiding in all of this is the fact that graduate students and post-doctoral scholars are a source of labor. I mean that not in, not in the code-implementing way (although we'll talk about that a little later) but in the sense that there are avenues that I'd like to explore, that I just don't have the capacity to explore right now. And it would be nice to have other people doing that with me.

But I also feel there's a moral challenge to taking on students for me in this field. If I were a professor at a university and an established field, and you come to me as a graduate student, then there's this implicit deal. You work with me for a few years, and after a few years, you are going to step out as an independent investigator of your own and establish your own agenda. You’re hopefully going to be able to do that by, say, getting a faculty slot at some other university. It's a pipeline. And of course you're not guaranteed a faculty slot. In fact, there's far too many graduate students for faculty slots, but that's the understanding that you’re taking on this kind of apprentice position. You're rolling the dice and saying: I'm competing for one of these faculty slots that are available at various institutions, and funding slots for those positions, which are provided by grant makers.

Now the moral issue is: that pipeline, that path, those institutions, and those funding sources don't really exist in this space. And so if I'm taking on someone who's excited about doing independent research of their own, they can work with me for awhile and that might be helpful to both of us, and maybe push my ideas forward, and maybe help them develop their own practice. But, there's really no end state where they get a shot at doing it on their own. They'd have to create that for themselves. And arguably, they'd be put in a pretty bad position to create it by working with me as an apprentice for a few years because they're not going to be amassing capital that they could use to support themselves. Maybe they could have been developing relationships with funders or with an audience or with grant makers, but they're going to have to figure that stuff out on their own.

This lack of a next step for a student is a problem. It's both a practical problem, but also a moral problem. I feel bad for taking someone on in this way when there's not clearly a place for them to go next. So that's one key difference between these funding sources.

Another interesting difference between an NSF career grant and the Patreon crowd-funding model is that the latter is a highly continuous, incremental process.

If I'm applying for a CAREER grant, that's kind of like a “big bang” process. I'm going to spend a a hundred-plus hours on the application for that. I'm going to submit this thing and there'll be a long process, which as many as six months later will maybe give me a decision. But that decision, if it's positive, will support me for five years. And so for five years, I basically just don't really have to worry about that source of funding, I guess I have to be worried about my next grant, you know, maybe a year or two before the five years is ended, but it's this very discrete funding model.

The Patreon model, on the other hand, has this really interesting property that it's continuous and incremental and ongoing. This is both good and bad. The good part is that you don't need this really well thought through plan in order to get started and get some funding. To get an NSF career grant, you're supposed to have a pretty clear picture of how you're going to carry out this research and what you hope to achieve with it, which arguably limits the types of research you can do, or requires that you've already done a bunch of the work upfront. In fact, it's pretty common for junior faculty to apply for this NSF career grant in their second or their third year of being faculty because of that. So it's kind of a shame. The discrete nature of the NSF career grant seems to discourage—or rather it's not meant for funding highly exploratory, speculative research that the investigator can’t articulate very clearly, or where there's going to be a lot of bricolage involved.

This incremental, continuous funding is also an opportunity for people who don't yet have much standing, or even potentially much skill, to incrementally build up their funding source. Something I'm excited about is the idea that maybe you have a day job, but in your nights and weekends, you start exploring some projects that you find interesting. And maybe you can build up a small crowdfunding community around that and start earning enough money from that to make you feel like those nights and weekends are like really well invested. And so you kind of double down on it, and maybe eventually you get to this point where you can leave your day job. Now, obviously there's a big jump there, but I think that's an exciting property of the crowdfunding versus the grant model.

Another interesting thing about the continuous, incremental nature of the crowdfunding model, is that it doesn't expire, per-se. The NSF CAREER grant is for five years, and it's granted to about 300 researchers per year, but the Patreon funding is potentially indefinite. So as long as you keep doing work that people want to support—and that's both good and bad.

It's good in that you don't have to completely change up your funding model and have this all-or-nothing switch in year four of your work. But it's bad insofar as there's this feeling that you've got to keep producing, or maybe it's not as stable or secure-feeling as having all of these funds in a guaranteed way, as with the CAREER grant.

The last difference that I wanted to discuss between the NSF CAREER grant and the crowdfunding approach for research is this question of who gets to decide what to fund.

It's really different answers in the two spaces. For an NSF CAREER grant, the most important part of the decision process is what's called a merit review, wherein your proposed research is reviewed by about three peers or informed experts in the space that you propose to investigate. They will issue written comments and scores and have some discussions with coordinating officers who will then decide what to fund. It's a relatively small number of high-expertise people, and that has advantages and disadvantages. So you don't have to convince that many people, on the one hand. But there’s also a lot of problems with peer review. Perhaps these are people who are established in your field, and you're trying to take your field in a weird direction, or maybe there isn't really a field around the work you're doing. There's some concern that peer review tends to perpetuate stasis, or more conservative ideas. I think that's probably at least partially true. Certainly in my adjacent field human-computer interaction, peer review seems to promote a set of fairly conservative values and processes that would really impede my work, I think.

In the crowd funding model, on the other hand, you have lay-people or interested fans deciding that work is interesting, and you have to persuade hundreds of them, instead of just a couple. That seems harder. But then, you don't have to persuade them very hard. Merit review is kind of a high stakes thing. I think these reviewers feel like they're setting standards for a very important institutional body, so they're going to be kind of defensive about what qualifies and what what's worthy. But a patron deciding to toss you five bucks, even five bucks a month? It's just not that high stakes, by comparison. So while you have to convince a lot more people, you maybe don't have to convince them very hard. And that's interesting.

It's also interesting that they are much less likely to be, say, experts or peers or something like that. Now, obviously I do have a lot of funders who are experts and peers, and I'm grateful for you. Uh, but a lot of them are just like random internet people who find the work interesting. Which is great! This is good in that it means if you have this work that doesn't really like fit in, but other weirdos (who maybe aren't high-status or high-power) think it's interesting, then you can maybe cobble together some funding through crowd sourcing that might be difficult to achieve through traditional channels.

But a challenging part of this is: to what extent can the crowd actually evaluate research? In my case, I worry that, if I were investigating something that was like equally important, but less obviously legible to casual observers, that I would have a lot more trouble getting funded. And that's a real concern. I worry that this general effect might make crowdfunded research more boring. Or maybe more driven by fashion or mass market appeal.

Anyway, long story short, I'm really excited that this crowdfunding community has managed to go much of the way to matching our main government funding structure for new faculty in the sciences. I mean, that's really, really remarkable. I definitely would not have expected that crowd-funded research could do that two years ago.

Hopefully, it's something which can generalize to more than just me.

Open-sourcing Orbit

So all this leads us to the second topic I wanted to talk about, which is that I’m open sourcing Orbit, which is the platform I’ve been developing for the mnemonic medium, and programmable attention in general.

I'm doing that because we have reached this funding milestone.

The path that I've been on is—when I did not feel safe in my source of financial support, I felt like I had to leave my options open. Like I've spent a year working on the software, now jeez, am I going to have to like commercialize this? Am I going to have to raise VC money around this?

These are not things that I wanted to do, but I felt, well, unsafe. And so I felt like I had to leave that option open. That's why I had not open-sourced Orbit up until this point. Now I feel sufficiently safe that I am past that point.

But it's probably important to discuss: why open source this kind of thing? Something like Orbit, which is about knowledge management about your mind—I think it really wants to be open-source. There's a practical perspective on this, which is that the whole conceit is you're trying to kind of bring in these micro tasks, these pieces of knowledge from all kinds of different places. That feels, if not open-source, like it wants to be open formats, open protocol, things like this.

And then also just ideologically, it's very intimate. I think in terms of trust, I don't like to spend a lot of time putting my mind into a system that could potentially just be taken away someday. Or that has the smell of lock-in. Or that feels like some weird format that I'm never gonna be able to parse. I'm hoping that by open-sourcing, I’ll create some trust. Hopefully it'll engender some interesting mashups and experimentation.

Stepping back, I view my role as provisioning public goods. And open source software is generally in software land a better public good than closed source software. Now, there are types of software which really benefit from being commercial. In fact, the public benefits from those pieces of software being commercialized. Stripe, for instance, is an important piece of infrastructure. You could even argue that it’s like a public good. But it needs like a huge number of people maintaining it and all these operational people and there's like partnerships and stuff. All of that needs like a lot of resources to keep it running. But something small like this? I'm hopeful—though these words may bite me—that I can keep this service running with relatively little effort. I feel very anxious even just saying that. We'll see.

It is possible that long-term, the Orbit data formats and the application, and even the server (if you want to self-host)—all of them are open source, but maybe running the service is so taxing eventually that some fees are necessary in order to make sure I can hire someone to be on PagerDuty orwhatever.

I’m not really thinking about that right now. But it does come into play in the licensing model that I chose, which is kind of interesting. I haven't seen many projects do this, though a number have done things kind of like this. So what I'm doing, licensing-wise, is that all of the libraries are licensed under the permissive Apache license. And then the main application and the main backend server are dual licensed under AGPL (which is a, uh, strong copy-left license, and it’s viral over the network), as well as the Business Source License software license. It’s a relatively new software license that allows you to do whatever you want for non-production purposes. And, and after some number of years (three years in Orbit’s case), it turns into the permissively licensed Apache software.

This licensing scheme is meant to deter cheap copycat commercializations. Khan Academy had this problem where when it was open-sourced, a bunch of people downloaded the app, changed the name, put it on the app store, and charged $10. And at Khan Academy, we’d get these angry letters. It's really bad. If they're using your name, you can yell at them for trademark purposes, but sometimes they'd actually do it in a way that was totally compliant. And it just felt really bad. I don't want that. But then also, if I do end up needing to someday charge for the service model, I'd like to reserve some asymmetric power to capture some of that value as a payment for my upfront development costs. But like I said, that's not something that I'm thinking about now. It's also not something that I want to do. I’m not interested in commercializing this. It sounds boring, basically.

Along with the open source comes some interesting dangers. I have run, or been a large part of, two large source projects before. Both of them were very taxing experiences. And I do not seek to repeat that experience!

In recent years, platforms like GitHub have kind of redefined people's norm for open-source to be a project which is community run. And the community decides on the roadmap and everybody submits issues and anybody can propose features and so on. That's fine and good for a lot of projects. But it would tax me beyond my abilities for Orbit. So we're going to have something… not like that. We’ll have to articulate what it will be, but essentially the idea is that at least for the near future, this thing is primarily a vehicle for the research that I'm doing, and perhaps that collaborators will be doing with me.

So I'm going to try to be minimizing extensive open-source community interactions that are not in direct service of those research goals. Probably that will open up over time. There's more informationin the repository you can read about this.

That’s a lot of cynicism or negativity. But I do believe in, and have experienced, the positive side of open-source. Where generous contributors have delivered a lot of value, both to me and to the community, by coming in and implementing a feature that needed to get done, but that I didn't want to do. Or by identifying some horrible bug that no one knew existed but that I really cared about.

I look forward to those kinds of contributions and I really am excited to be open-sourcing the project.

In the coming weeks and months, I will probably launch some kind of mailing list for people who are interested in participating in, or following along with development decisions, roadmaps, things like this. But it's too early for like a user's list. Most of the things that people tell me that they want, or that are bad? I know! They bother me a lot too. I'm mostly not looking for that kind of engagement right now.

More technical collaborators

The final thing that I wanted to talk about is experimenting with expanding the set of people working on these tools. As we start approaching the funding level of an NSF CAREER grant, my expenses are increasingly taken care of, and maybe I can even save a little bit for a rainy day, which is nice. It makes me less nervous.

At this point, or relatively soon, marginal funds start being not for my pocket or for my mortgage, but rather for getting poured back into the project, which will almost invariably take the form of paying other people for their time working on it.

Ideally I'd love to be able to fund both technical staff who can really help me out by implementing important pieces of infrastructure and improving on my, my very rapidly written research quality code, as well as research staff who can meaningfully investigate some of the open questions and design challenges that seem to bear. And actually, if the crowdfunding model does continue to grow at its current rate, it looks like by the end of next year, I should—maybe—be able to afford to pay an intern or two, or a contractor for half of the year who's relatively affordable. Something at that level. That's pretty exciting! If we do actually continue on pace to that, it means, maybe in a year or a year and a half, that this is a model that can support not just me, but potentially others, too.

This mirrors what the NSF CAREER grant is meant to do as well. It's something that is supposed to also potentially support at least part of maybe a graduate student's stipend, or possibly stipends for undergraduates who are assisting with research. This kind of thing.

But that's a little ways away: the end of next year. And one thing that I've talked about here before is that there's this substantial challenge in switching back and forth between “research mind” and “implementers’ mind”. I find that even just trying to do the two in the same week is very difficult. So the, the model that's worked best for me is spending months at a time in one of those mindsets and then switching back and forth between.

So, for two-thirds of 2020, I was in “building things mind” with Orbit, and then actually for much of this year, I've been back in “research mind.” You've seen a little bit of that output, but more to come from that period. And now I'm starting to switch back to the “implementing mind,” but all of this is frustrating.

When I've been in research mind for awhile, as I have been now, I'll feel like: “Oh my gosh, there's all these things that I know would make this better. And it would let me answer these questions.” But I can't. No progress has been made, and months have gone by, so it feels like the project is kind of staying still. And then when I'm spending a lot of time implementing, like I was last year, I feel like there are these dozen research questions that are articulated at the start of the year, and I've made no progress on them, and I'm not actually being a researcher. I'm just like being an engineer. And that feels bad too.

Ideally, it would be nice to have some technical help before the end of next year. So to that end, a couple of exciting notes.

There've been a few volunteers who've reached out to collaborate with me on Orbit, maybe on their nights and weekends. And one of them has actually made his first commit to the repository. I'm very grateful for Ozzie’s help.

And the other exciting news there is that a small group of generous donors have made some one-off gifts to allow me to this year have a modest budget to hire a technical contractor. That's very exciting—to maybe have a little bit of professional help!

I've started the process of trying to find the right person for that. Ideally they are able to be somewhat technically independent, and take on moderately-sized infrastructure projects without a ton of oversight, which is a very difficult property to achieve simultaneous with my very modest budget for contractors at this time. I think the best candidates will be people who are already excited about the research, excited about contributing to open-source software—and are perhaps willing to accept more modest compensation in conjunction with those things. And perhaps in conjunction with getting to run some of their own ideas about related projects by me, and possibly get some mentorship there. We can do a light “barter” kind of thing.

So like I said, I’ve begun conversations with a number of people about that. But I am still on the lookout for a contract hire. And so if what I've just described—that set of impossible conditions—sounds like you or someone you know, please do send them my way.

————————

So to wrap things up, I just want to express a lot of gratitude. Really, this is a lengthy video about this bizarre funding model for research that, uh, I didn't think was possible. And it really is only possible because of you all. It really is a remarkable thing—you're doing something quite unusual here. Time will tell whether this is generalizable, or even whether it's sustainable for me personally. But I’m really grateful to be exploring this with you all.

View Post

[Audio version] Too easy to be effortless

As an experiment, I've made an audio version of the most recent post, Too easy to be effortless. Is this useful? Would you value consuming the material I write here as a feed in your podcasting app? Email/comment if so!

It does consume some extra time, so it's probably not something I'll do unless it's something you all would find valuable.

(Thanks to Bryan Clark for suggesting that I try this!)

View Post

Too easy to be effortless

Now that a few Orbit experiments are in flight, I’ve spent much of the last month digging back into data from Quantum Country. I’m struck by a surprising problem: basically everyone remembers basically everything, basically all the time.

Feelings-driven optimization

How effortless can memory be?

At the limit, we can imagine automatically remembering everything we perceive. We might not want that—savants like Shereshevsky often report curse-like symptoms of their perfect memory. Perhaps we’d settle for the ability to remember or forget something as easily as moving a muscle. What would be true of such a world? Certainly schools would not exist as we know them, but what of workplaces and studios? What of relationships? Borges, Chiang, the Wachowskis, and other great science fiction authors have dramatized these implications, but I’m also interested in the mundane: shifts in the give-and-take of workplace collaborations; coincidences and contradictions suddenly more salient.

(Of course, effortlessness is just one of many useful lenses! A contrary lens points out that maybe effortfulness is exactly what you want from your interactions with memory. You want to constantly be questioning things you think you “know”; you want everything to stay molten so that you can form new connections and see things in new ways; etc etc…)

Even with today’s systems, memory is far from effortless. How close can we get? The usual approach is to treat this as an optimization problem, but I find it generative to recognize that effortlessness is a feeling. Powerful technologies feel like an extension of the body. The edges melt away; the space between intention and action closes. Strap a brick to your pencil, though, and it ceases to feel like part of your hand. Likewise, learning can seem effortless in an energetic discussion with friends, but in a boring study hall, the same ideas may demand more effort than you can muster.

This lens gives us a different way to think about how we might “optimize” tools for thought. What kinds of interactions create a sense of separation, of dutifulness, of boredom?

In any kind of computerized learning system (including spaced repetition systems), one reliable source of boredom is material which feels too easy. This material isn’t the good kind of effortless. Flipping through this stuff feels almost like speed-running a license agreement prompt in a software installer. “Yeah, yeah, I know, I know.” I don’t really have to think; I’m not really engaged; I resent being asked. Sometimes the problem is that I don’t actually care about the material, in which case I should really remove it (perhaps fuzzily). Quite often, though, I do really care about the material. I’d engage more seriously if it felt less trivial in that moment.

This observation devolves into a classic problem in learning technology: correctly estimating the state of a student’s knowledge to optimize a study plan. The difference is that if we hold onto our feelings-based lens, we don’t see optimization itself as the problem to be solved. Our central goal is a feeling of effortlessness. Model optimization is an instrumental lever for that feeling. But there are other levers. You can’t play The Witness without memorizing many complex rules, but you’ll do that naturally as you interact with the environment: memorization itself is not the effortful part.

Quantum Country’s over-easy effortfulness

Having paid this lofty penance, let’s turn our attention to the performance of an unusual memory augmentation system: Quantum Country.

Please note: this is an informal discussion of data from Quantum Country. The analysis is preliminary and shouldn’t be cited or excerpted in other work. I’m working with the garage door up here.

On the one hand, Quantum Country delivers on its promise to help people remember what they read. After the fifth repetition, most readers have been able to recall 95%+ of questions across intervals of more than a month. That’s pretty remarkable. In my past experiences reading textbooks, I’d be lucky to remember a fraction of the details after a month.

Another way to look at this is “maintenance cost.” To maintain the first essay’s 112 questions for the first year, the median reader performs 567 reviews, consuming ~1.5 hours. Readers report that the first essay takes 2-4 hours to read, so we can frame the first year’s reviews as a ~50% extra time cost these readers could choose to pay to durably remember all the key details from that essay. I expect the second year to have roughly half the time cost, but we don’t have the data for that yet.

The problem, I suppose, is that Quantum Country works “too well.” Basically everybody remembers basically everything basically all the time.

The trouble we’ll discuss begins at the start of what I call the “maintenance” phase. For a given reader and question, histories are generally clustered into two phases: an initial (usually short) “learning” phase, in which readers absorbs the material enough to remember it across sessions; followed by a (much longer) “maintenance” phase, in which repetitions mostly serve to combat the erosion of forgetting. You can approximate the delineation pretty well by saying that people transition to the maintenance phase after their first successful repetition.

After the first successful repetition of a given question—once they’re in the “maintenance phase”—the median reader answers 95% of subsequent repetitions correctly. In fact, 82% of all first-year question histories contain zero forgotten answers after that point (which is indeed what you’d expect from the typical first-year repetition count given a binomial variable with p=0.95).

That’s a bit abstract. To make it more concrete: after their first successful repetition, the median reader forgets just 15 times out of 448 reviews over the following year, across the 112 questions in the first essay.

A whole year of diligent reviewing and just 15 misses! 433 successful recollections! The problem here isn’t exactly one of efficiency. Talking to readers, plenty of them would be (and have been) happy to pay a 50% time cost to thoroughly internalize the material. It’s not that 448 is too many reviews, or that it takes too long. The problem is that it feels tedious, like wasted time, to review material that you already know perfectly well. And that’s mostly what people are doing.

But actually, the forgetting is even more skewed than I’ve let on. If those 15 misses were drawn with equal probability from all the questions, it might not feel so bad: any question might be the one you miss today! As it happens, though, half of all long-term lapses come from just 12% of questions. Emotionally speaking, those are the questions which generate “oh, no, that question again…”. By contrast, the median question produces only one lapse for every ten readers across the entire first year of the “maintenance phase.” For 95% of questions, the median reader never forgets in the first year of the maintenance phase. So most reviews probably feel tedious and unnecessary.

We might worry that perhaps everything’s fine for the median reader, but many less-capable readers are struggling. After all, questions are highly power-law distributed in the lapses they produce. But readers are not nearly so sharply distributed. Our 25th percentile reader forgets 35 times in 483 repetitions over the first year of maintenance. The 10th percentile reader forgets 59 times in 516 repetitions. And again, this forgetting is localized in a relatively small pool of questions. The vast majority of questions produce no forgetting, even for relatively less successful readers.

When forgetting does happen, it’s usually not that bad. One way to look at this is to ask how often questions are forgotten multiple times back to back, so that the reader fails to recall a prompt across an interval they could previously span. This happens almost never: on about 2% of first-year reader/question histories. So our “demonstrated retention” progress metric is a pretty good one. Once you’ve demonstrated a given interval of retention, you’re very unlikely to lose it if you keep reviewing. And if a lapse does occur, it has only a 7% chance of “backsliding” to the point that a reader can no longer span five days. As a reminder, Quantum Country roughly halves the review interval when a question is forgotten. Anki’s default behavior of resetting the interval to zero upon every lapse seems particularly inappropriate in our context given this data.

The implication here is that we should probably be much more aggressive with our expanding review schedule. Yes, this would make the experience more efficient; but what I really care about is that it would probably make the experience feel much less tedious.

What should the schedule be, exactly? Many papers suggest dynamic and complex models for these schedules, and perhaps I’ll implement one at some point. An ideal schedule would weigh tedium-avoidance with other important feeling-variables: connectedness to the material, the frustration of forgetting the same thing repeatedly, predictability of session timing. In terms of low-hanging fruit, it’s amazing how far simple heuristics could go. For instance, when readers begin by successfully answering a question both while reading the essay and in their first review session, 96% of those histories include zero lapses in the next year. It’s probably safe to stretch them out a great deal.

Just by focusing on too-easy questions, it’s pretty easy to imagine cutting the number of repetitions necessary for the first year of maintenance down by half, or perhaps more. If we did that, we’d cut the number of reviews in the first year from 567 down to 343, a 40% reduction. The marginal time cost for the first year of retention would drop from 50% to 30%.

The data I’ve presented don’t have much to say about the counterfactual. If the intervals had been twice what they are, would we see only a bit more forgetting, or would we see bedlam? I’ve been running controlled experiments along these lines, and they’ve been producing very interesting and confusing results… which will have to wait for another time.

Scheduling for the mnemonic medium versus existing SRS modalities

Almost all work around spaced repetition systems—both academic and commercial—has focused on definitions: vocabulary for language learners, terminology for medical students, people and events for history classes, etc. This kind of knowledge tends to be arbitrary and disconnected, and so I suspect it’s forgotten much more rapidly.

Quantum Country’s schedule is pretty aggressive. We start at a five-day interval and grow by 2-3x on each repetition. By default, Anki starts at a one-day interval and grows by 1.8x. And yet we’re still seeing very little forgetting. I don’t think the problem is that Anki’s wildly conservative: I think it’s that conceptual knowledge, introduced in a narrative arc and thoroughly connected to prior knowledge, has very different memory dynamics from vocabulary words. Scheduling for the mnemonic medium should probably look quite different from scheduling for traditional spaced repetition systems.

SuperMemo models something like the effect I’m describing with “item complexity,” but because each user makes their own databases, it must estimate each item’s complexity from just a few point samples. The mnemonic medium’s shared questions create an interesting opportunity: item complexities can be estimated by pooling many prior users’ attempts, and a new user’s pre-existing proficiency with the material can be estimated by comparing their in-essay performance to that of prior students. This type of approach has been used in a model for scheduling Spanish vocabulary practice, and I’m interested to explore how it might fare on more conceptual topics. One distinguishing challenge for mnemonic essays (unlike vocabulary lists) is that the questions are highly interdependent. Reviewing one question makes readers more likely to be able to answer various other related questions. So I’ll probably need to mix a model like the one I’ve described with something like deep knowledge tracing, which can account for inter-item interactions.

I’m not yet sure how deep I want to go on such optimization. There are so many opportunities to explore in this space, and my hours are so few! In fact, there are many simple levers for a feeling of effortlessness which don’t involve actually reducing the number of repetitions. For example, Quantum Country readers felt reviews were much less burdensome when we “batched” them so that small review sessions on adjacent days were combined into a single full-length session.

In a future post, I’ll explore how multiple experiments are struggling to measure any appreciable forgetting-over-time at all on Quantum Country. Until next time, thank you as always for your support.

View Post

Ratcheting progress in tools for thought

There are some people trying to develop tools for thought, but there isn’t yet a meaningful field around tools for thought. The difference is that a field is about ratcheting: developing a growing shared corpus of general knowledge and methods which allow projects to meaningfully build on each other, across researchers and across years, on and on in an upward cycle. Individuals here and there have contributed powerful insights, certainly, but true success for this field would mean a new practice of human tool-making, and the creation of many tools which transform what people can think and do.

Michael Nielsen and I have suggested that the most powerful tools for thought express deep insights into the underlying subject matter. Creating them involves what we call “insight through making,” in which powerful subject-matter ideas enable new systems, and observations of those systems enable new insights, and so on. An idealized cycle of activity might involve something like these key steps:

  1. Identifying powerful insights about some subject domain or about cognition in general which might be fruitfully systematized
  2. Building systems which express those insights in their primitives
  3. Observing serious use of those systems in authentic contexts, and of your theoretical insights refracted through them
  4. Distilling generalizable insight from those observations which produce new understanding about the subject domain or cognition, and which permit new, better systems to be built
  5. Disseminating that insight so that others can build on it

This high-level view glosses over a great many details, and every one of these steps is a complex practice in which one can build skill over decades. But we can see these practices in the “golden era” design work of computer-powered tools for thought: Engelbart’s NLS, PARC’s Alto, Sutherland’s Sketchpad, and so on. Unfortunately, if we look at the contemporary proto-field, we’ll find that most people interested in tools for thought (myself included) are not reliably performing all these steps—which has left our struggling field without a functioning ratchet.

Common failure modes

One failure mode, particularly common in academia, comes from lacking a serious context of use. Often this just means that the observation step centers on misleading artificial environments. Sometimes this happens because researchers don't invest seriously enough in the system being built, which makes it difficult to distill powerful new insights from what they’ve made. But more subtly, without a serious context of use driving the research project, their initial subject-matter insights are likely to be limited or misdirected.

Startups and tech businesses are powerful venues for tool development. They’re generally not trying to push the field of tools for thought forward. But we might hope that they happen to push it forward anyway. Unfortunately, a few common patterns prevent most tech industry efforts from contributing to a ratcheting field.

Perhaps the most common pattern of all is that people in the tech industry focus mostly on building systems. Those systems are usually expressions of technological or market insights, rather than of fundamental insights about a subject domain or about cognition. That’s not a problem as far as the business or its users is concerned! But if their systems don’t reify new ideas, we can’t draw much field-level insight from observation.

Another common pattern for tech companies is that they’re founded on the premise of some powerful insights, insights which motivated the founders to start a business (or for an existing business to conceive a new product line). The company instantiates those insights in systems, observes them in serious contexts, distills new ideas from observation, and improves the product. This is great! But the object of all this iteration is rarely “generalizable subject-matter insight,” and for good reason. These businesses are trying to improve their product’s performance in its customers and market. Sometimes we get lucky, and this happens to produce generalizable insights—typically only after others reverse-engineer and disseminate them. But often this difference in focus means a narrower scope, as far as the field is concerned. They’re fixing pain points, adding line extensions, smoothing workflows, adding features… generally without changing any of the foundational theory of the product. In fact, changes to the foundational theory of the product are usually (and often rightly) “off limits”, or just not even salient for product teams. I think good tools-for-thought research often focuses on transcending and discarding the current system, asking “how should we build the next system?”. But good business usually don’t throw out their core product and build a meaningfully different one every few years.

Dissemination may be another challenge for tech companies contributing to this field. It’s fascinating to see companies like AutoDesk, Adobe, and Epic Games publish incredible papers about fundamental problems in computer graphics. But if tools-for-thought-ish companies are producing powerful new theories about cognition or their subject matter, we rarely see those ideas published.

Some of my favorite work in tools for thought comes from idiosyncratic Twitter tinkerers. This group often produces fascinating work, but it’s usually missing one or more of these steps. The most common pattern seems to be: a bricoleur identifies some powerful idea about a representation and designs a prototype, but then fails to engage seriously with observing and deriving insight from the systems they’ve built. Sometimes this comes from technical barriers—the prototype is too quick-and-dirty to be used in a serious context, so their insight is limited. But I think there’s also a cultural gap here, a missing research practice of careful, diligent observation and synthesis. Too often these projects have the flavor of “Look, I made a thing! Isn’t it cool? How many people can I get to use it?”. But the question they need to be answering is: “What powerful, generalizable ideas can we learn from this project? How should the next wave of systems build on this?”

Designing research insight production into the system

Institutional incentives aside, one reason it’s so rare to find this cycle operating smoothly is that it’s incredibly difficult to do. The steps are interdependent. You can’t just have a powerful idea, then design a system which expresses it, then observe it, and so on. You have to design the system and manipulate it in a way which will reveal insight about the idea the system represent. Or, to put it another way, the system has to be shaped in a way which allows you to ask the questions you want to ask. But often you can’t even identify the right questions to ask before you see the system in operation!

For instance, one of the key ideas behind Quantum Country is that authors can help readers deeply internalize complex, abstract topics by interleaving narrative and retrieval practice. We built a system which expresses that idea. We have lots of data. Readers seem to feel it works. We can see that people do indeed retain the material they read. But what should the field learn from this experiment? What generalizable conclusions can we draw? How can we improve our understanding of that initial idea so that we can create a better future medium?

Sometimes interesting answers come by accident. For instance, when interviewing readers about their experiences, we were surprised to discover that memory effects aside, the regular review sessions meaningfully changed how people related to the material. It caused them to think of themselves as “doing quantum computing” in a much more serious way than they would if they’d just read some essay on one afternoon. That insight gave rise to Timeful Texts and other related directions.

But many insights can’t be explored through passive observation and open-ended interviews. How exactly do the narrative and the retrieval practice interact? Is the main effect convenience—i.e. the prompts are delivered while reading, whereas if they came later, you wouldn’t bother? Or is it meta-cognitive—i.e. the prompts’ feedback regulate your reading, causing you to re-read passages where you fared poorly? Or is it mostly about memory—i.e. the most efficient retrieval practice schedule involves practicing and reinforcing knowledge immediately, rather than a few days later? Or something else entirely? Different answers to these questions would point to substantially different paths for the evolution of the mnemonic medium.

To answer these questions, the system has to be designed in a way which produces the necessary observations. Or you have to manipulate the system with an experiment, which may be difficult if you didn’t initially architect your system with those questions in mind. Somewhat more subtly, I’ve found that designing tools-for-thought experiments such that the results might tell you something possibly generalizable is particularly difficult—a contorted balancing act of theory, interface design, engineering, and experimental methods. It’s like what cognitive psychologists have to deal with in their experimental design, except the “apparatus” is a system which must both solve real-world problems and also produce the necessary experimental data.

Ben Shneiderman, a pioneering human-computer interaction researcher, offers this charming schematic for research project design in The New ABCs of Research. He calls it the “two parents, three children” pattern.

The challenge is similar to what learning scientists must do in designing educational interventions. In Principles and Methods of Development Research, Jan van den Akker offers a beautiful distillation of what a unit of progress looks like in that field (thanks to Sarah Lim for the pointer):

[Educational design] principles are usually heuristic statements of a format such as: “If you want to design intervention X (for the purpose/function Y in context Z), then you are best advised to give that intervention the characteristics A, B, and C (substantive emphasis), and to do that via procedures K, L, and M (procedural emphasis), because of arguments P, Q, and R [(theoretical emphasis)].”

The key thing it does is to explicitly connect the dots between a grounded theoretical claim, the implied design approach, and the desired outcome. I’m certainly wary of trying to fit all research into some kind of formula like this, but how clarifying it is to have this target painted so sharply! If you’re a researcher and you want to develop some new intervention, you need to design an experiment whose results can generate a statement of this kind.

I think that research cycles in tools for thought should strive to generate analogous statements. What are the consequences of our theories and our design decisions, the consequences which others can build on? Progress will mean being able to make lots and lots of fine-grained heuristic statements like “Retrieval practice systems should offer users the opportunity to retry a few minutes later when they fail to remember an answer [because I’ve run a controlled experiment and found that without that opportunity, lapses are about 10pp more likely to persist on the subsequent attempt].” But it will also important to be able to make big-picture statements about core mechanisms of systems, like “Intermittent follow-on review sessions can be used to change readers’ emotional relationship to written material in ways A, B, and C through authorial methods X, Y, and Z, as predicted by theory P, Q, and R and supported by experiments K, L, and M.”

Or, to take another example, I know many of my readers are fans of outline-based text editing. This morning, inspired by a message from patron Ethan Plante, I went looking for academic work on the theoretical or empirical foundations of outline processors. I was shocked how little I could find. So, if you’re experimenting with building outline processors, or “block-based” tools, or whatever, some questions to be answered: what effects do these alternative writing primitives have on composition? on thinking? on reading? What is the theory which would explain these effects? How would we know if it were true? What else does that theory imply? What would research systems which could answer these questions look like, and how are those different from the commercial systems people build?

Some positive examples

I wrote this post to clarify my own challenges, so it’s necessarily coming from a place of frustration. I want to close on a more positive note, by pointing to a few contemporary examples of projects which complete the full cycle I’ve described:

Bret Victor’s projects are the classic modern example. I'll give a too-short summary of one branch of research. Explorable Explanations (2011) proposes that a reading environment could become an environment to think in if authors didn't just present data, but rather embedded their live models into the written medium. A sequence of projects on reading, writing, and interacting with dynamic systems followed, pursuing this line of thinking and several adjacent ones, each contributing substantive generalized insight. Eventually Bret and colleagues became dissatisfied with the limitations of screens and the asymmetric role of authors, leading to the wildly original concepts behind his current project, Dynamicland—a building that is a computer.

Ink and Switch, the industrial research lab, did a thoughtful and well-documented series of experiments with freeform multimodal tablet interfaces for supporting creative thought. Their first, Capstone, was based on a model of creative work as dependent on gathering and sifting raw material for patterns and insights. They built a number of interactions to support their model, identified opportunities and limitations with that system, and designed a new system based on those insights called Muse. That project put inking front-and-center and produced general ideas about designing ink-centric interfaces with no chrome. The research project is now a product company (co-founded by patron Adam Wiggins). I’m excited for their attempt to prove out a translational R&D model, and I’ll be interested to see whether they’re still able to generate and disseminate generalizable insights in that context.

Piotr Wozniak, the contemporary founder of spaced repetition, is another great example. He had the original key insight that a computerized system for spaced repetition could lead to very large, very cheap memory databases. But he’s been iterating on those ideas for decades now. He’s not just been optimizing the scheduling algorithm but using the data as part of proposing and exploring new models of human memory (e.g. How much knowledge can human brain hold). This research doesn’t seem to have been constrained by SuperMemo’s commercial setting, though that’s perhaps because it’s not run with the intensity of a modern U.S. software company.

Evan Wallace at Figma developed and documented a new primitive for representing and editing vector paths, motivated by practical problems with existing vector pen tools, which more directly (naively?) expose the underlying Bezier curves. He shows how this new representation makes certain common operations easier to perform. This work may not transformatively change and expand the thoughts people can think, but both Sketchpad and Illustrator’s original pen tool were certainly significant, and this seems to meaningfully extend that work. I do wish Evan had written about the work more substantively, but of course, he’s trying to improve a product, not contribute to a field. There’s a nice recent technical write-up, but it’s implementation-focused.

In the sphere of para-academic Twitter tinkerers, I want to applaud Omar Rizwan’s experiments with TabFS. That project expresses a deep insight of Omar’s: that a shortcut to end-user programming may lie in extending the architecture of operating systems up to application-level objects—like browser tabs. In many ways, this project is an extension of Plan 9, but with a powerful injection of worse-is-better folk/craft philosophy. But unlike many of my beloved Twitter tinkerers, Omar's been diligently synthesizing and disseminating new insights produced by how he and others have been using TabFS. Unfortunately, those insights are in sponsor email newsletters which lack permalinks, but you can get a sense from his Twitter. Longer-term, I'm sure Omar will produce some durable write-up of what he learns, something which others can build on—he's done it for past work.

What are your favorite contemporary examples of people who are completing “full cycles” of work in tools for thought? Please share them in the comments.

View Post

In search of better questions

One provocative litany I’ve used to frame my work is: what comes after the book? Is it pictures of pages on screens? Is it videos of lectures? Why are all the answers to this question so boring? Where are the powerful ideas about how people learn, feel, and act?

I’ve been exploring memory systems as one avenue in response. But in my mind, doing something interesting with memory systems means leaving behind spaced repetition systems as we understand them today. The goal isn’t to “scale Anki.” It’s to use prototype systems to understand something new about how people learn, feel, and act—then to use those new understandings to create new kinds of systems.

On the ground, day-to-day, the biggest challenge I face is in asking good enough questions—questions whose answers can significantly change the way we understand a phenomenon. I’ve written this somewhat grumpy piece to distill some of my struggles here. We’ll sort through some boring, bad questions and try to lever our way to some more interesting ones.

Quantum Country has accumulated several million data points. Seems great! People are often quite excited when they hear that, as if accumulating a mountain of data will necessarily produce new understanding. But to extract meaning from that data, you need good questions. More perniciously: unless you have good questions in mind, you’re probably not even collecting the right data.

I’ve run dozens of analyses across many controlled trials on Quantum Country’s data. If I were a junior academic, I could probably have turned these into several papers’ worth of experimental results. Gotta juice that publication/citation count! But I haven’t published any of these studies, because I don’t think the questions they’re asking are good enough. The results I have are too parochial, too conditional on local details.

Asking good questions is hard. Part of the problem is that most papers aren’t asking good questions. If you read an unbiased sample of papers, your taste will mostly be shaped by boring questions which do little to advance the field.

The obvious questions are usually incremental. They assume the parameters of existing frameworks, then attempt to clarify some extension or variation. “Does the spacing effect manifest… for first-graders… when learning science concepts?” Such experiments can accrete understanding, but they’re quite distinct from, say, the initial experiments which uncovered the effect.

Another set of obvious questions stem from asking “what can we do with the data we already have?”, rather than “what would we really like to know, and how might we collect the data to know it?” This kind of data-centric fixation is wearyingly common around Silicon Valley types: wow, you have all this data! Let’s optimize things! Surely we can use this data to produce a more efficient review schedule? Yes, sure, but inefficiencies are not what hold back memory systems: they’re wildly efficient, even using dumb schedules! Why are schedule optimization questions the ones you want to ask? In most cases, I think the answer is “because it’s easy.”

Analyzing Quantum Country’s memory data

Here’s a bad question you can ask about Quantum Country: does it work? There are several key problems with this question. The first is: what does “work” mean? The second is: a yes/no question like this tells you little. The third is: despite the phasing, spaced repetition is sufficiently well-supported that “yes” should be considered the null hypothesis. But understanding these failures can help us write a better question.

Here’s a better—but still bad—question you can ask about Quantum Country: what fraction of participating readers eventually end up reliably remembering all the material? The boring complaints about this question are methodological: compared to what? What does “participating readers” mean? What does “eventually” mean? What about survivorship effects? But a lack of rigor isn’t the real problem with this question. The real problem is: what would an answer even mean? If the answer were 80%, how would your understanding differ from a world in which the answer was 70%? What does an answer to this question teach us about Quantum Country, much less about how people learn/feel/act in general?

Let’s try again: across varying time periods, how much of Quantum Country’s material would a reader remember if they didn’t do the review sessions, compared to those who did? This question seems worse because it’s less precise. Yes, you’d need to nail down several elements to get a real answer. But rigor aside, this is a better question because it starts to access the dynamics of learning. It’s the first of our example questions which might teach us something generalizable.

It’s important to remember, after all (and I’m reminding myself right here!): this is the point—to learn something generalizable. We’re trying to learn something which might help us build the next system, the next category of systems. The point is not (as it usually is in tech) to produce experimental data which can show that “our product works!” on some marketing page. Fuck that. The point is insight and its downstream consequences.

So let’s double down on generalizability. Here’s an even less precise question which I feel is nevertheless much more interesting: what is the effect of a particular review event on a person’s memory? A moment before that question appears, their mind is in one state. Then they answer the question, and their mind is in another state, with durable changes which persist for weeks or months. What happened? Can we characterize that change? What parameters does the change depend on? What’s its stationarity? Essentially, can we establish a function which describes the dynamics of retrieval on memory?

I became interested in this question last summer, only to realize that my millions of data points couldn’t actually help me answer the question, since they lack variation along the necessary axes. I’ve had to artificially introduce controlled variation (e.g. random variations in scheduling) and wait for new data to accumulate. This was a painful but valuable lesson.

The other problem with this sort of question is that it probably means constructing a model. The literature around spaced repetition is full of models, predicting e.g. probabilities of recall after various intervals. I’m extremely skeptical of these models. They might be somewhat predictive, but I don’t think they’re very explanatory. What should we understand a “probability of recall” to mean, physically? When I have a 60% chance of remembering an answer in a given moment, what’s actually happening to make my mind differ from another answer I have a 70% chance of remembering? It’s not a matter of dice in my brain. There have been various attempts to align empirical probabilistic models to theoretical frameworks of memory, but such models are fraught with “let’s estimate a probability by assuming an exponential fit and doing a logistic regression…” More predictive than explanatory. I don’t trust it.

Quantum Country has an unusual opportunity to explore the dynamics of memory with less modeling chicanery. For example, the challenge that a system like SuperMemo faces is that each user writes their own questions. So there is only one sample—ever—of a given person answering a given question for, say, the third time. At that moment, it was either "remembered” or “not remembered”. Or, OK, a "grade" of 1-5. You don’t get a nice continuous value, and there’s no way to talk about the “probability of recall” for answering that question at that time without doing some kind of curve-fitting estimation. Was it 80%? 85%? How good was your estimation? Well, you have to use another model to evaluate that, according to how well the estimate explains the subsequent data points. This is what we call the “rub some linear algebra on it” approach to understanding. Don't get me wrong: you can produce useful systems without explanatory understanding! But it's helpful to identify such places as potential opportunities.

On Quantum Country, everyone answers the same questions, so we have many samples for every situation. We don’t need to estimate “retrieval probabilities”: we can look at how fractions of populations shift between various buckets. For example, of the 50k people who reviewed this question five days after initially remembering it while reading the essay, how many of them remembered the answer? How does that compare to the fraction of the population which was instead asked to review the same question several weeks after their initial read? No model necessary here. Or you can think of it as a frequentist probability estimation of some hidden “retrieval strength” variable, I guess. Whatever. I think it’s a stronger foundation for understanding.

When someone has trouble remembering a prompt, what should we do? Yes, we can change the schedule, but what else? I’ve run a controlled trial on the retry mechanism, and it seems to help, particularly early in the learning process. But that’s quite a blunt instrument. Re-reading? Breaking the forgotten topic down into more detailed constituents? Providing alternative examples? Supplementary explanations? Or, maybe you can do nothing, and if there are enough other adjacent prompts, those will eventually support memory of the troublesome prompt. To me, these intervention questions are much more interesting than issues of schedule optimization.

Bored of memorization studies

For the sake of discussion, let’s try a different approach. Let’s get further away from well-studied SRS paradigms. Imagine that you’re forbidden to ask: “did they remember that answer?” What questions can we ask about the mnemonic medium? About reading informational texts, in general?

This is a nice lens to hold, because it’s a reminder that Michael and I didn’t conceive of the medium simply as a more easily adoptable Anki. It’s just that it’s so easy to ask questions about whether or not people can remember the answers to the questions the medium asks. So it’s easy to accidentally fixate there, even though other elements may be much more important. But that’s lazy, and it’s unlikely to produce transformative insight.

We could say: look, the spacing effect and testing effect have been studied enough. They reliably produce stable memory encodings. If you understood them better, you could probably make them work more efficiently. But they’re really quite efficient already. As far as rote memory issues are concerned, maybe the problems are sufficiently solved.

But rote memory isn’t all that interesting. Memory is a proxy for learning, which is a proxy for meaningful enablement. So what can we say about learning? In what circumstances and to what extent does reliable memory transfer to open-ended tasks in the subject? That is: if you’ve studied Quantum Country, can you explain topics in quantum computing to someone else? Can you solve (simple) problems you haven’t seen before? Can you create circuits for a purpose? Can you spot unmentioned connections to your understanding of classical computers? Get more specific: what are the authorial implications? What characteristics of prompts seem to promote this type of transfer learning, and through what mechanisms?

One of our core hypotheses for Quantum Country (still untested!) is that the mnemonic medium may have significant effects on downstream topics. That is, if you study chapter 1 via a mnemonic text, can you learn chapter 2 more rapidly? Accurately? Deeply? Can you learn topics you wouldn’t practically have been able to learn before? What are the key interactions here? Presumably some prompts matter more than others—what characterizes that? Presumably there’s a non-linear relationship between the amount of practice and the impact on downstream topics—what is it, and in what ways is it malleable? What are the upper bounds on this effect? To paint a vivid concrete picture: can we reliably enable a typical teenager to engage with graduate-level material?

What about creativity? Where do ideas come from? In Seeing What Others Don’t, Gary Klein suggests that key patterns of insight generation include noticing connections and contradictions (along with a few other factors, less relevant here). Propensity to notice connections and contradictions seems awfully dependent on what’s in one’s memory! So: can memory systems make us more insightful? Presumably some kinds of prompts help here more than others—what characterizes that? Are special synthesis-oriented prompts helpful, or is the impact more a function of solidly understanding the basics? If we designed a new “memory system” with the sole aim of downstream impact on creative work, what would it look like? Would it involve retrieval practice at all?

Screw "learning": what about action? What sort of learning leads to downstream action in the world, as opposed to just learning for learning's sake? How might we design environments which support the factors which produce such action? Which promote great conversation with friends?

What about behavior change? Are “salience prompts” a thing? How do we write good ones, and what’s the scope of their effect? Is there value in author-provided prompts of this kind, or must they be created by readers? Perhaps there's some happy medium? I've suggested that for topics like meta-rationality, extended contact with the material may turn out to be the primary value of the medium. How would we know if that were true? If “extended contact" really is the primary goal, what fundamental "nouns" and "verbs” should we build a communications system around?

One surprising theme in Quantum Country user interviews was that the sessions had an impact on readers’ identities. Engaging with questions about quantum computing every few days over the span of months helped people start to think of themselves as “a person who studies quantum computing”, in a much more visceral way than if they’d simply read an explanatory essay on an afternoon a few months back. I don’t understand this at all! I don’t understand how to know whether it’s happening, or what’s happening, or what the implications of it are—much less how to characterize the interactions with details of the text or the medium in any more generalizable way. But despite my total inability to generate any good questions around this theme, it strikes me as fertile ground for good questions.

Most of the questions in this section aren’t stated crisply enough to actually explore in detail. Refining them to the point of actionability will require a great deal of insight—insight which may not be available without poking around at poorly-shaped versions of the questions. But asking these increasingly outlandish questions is an exercise, for me, in actively rejecting the stupendously boring questions which pervade the literature around memory systems and adjacent “learning” technologies.

————————

All this blather about questions isn’t just idle rumination. I have a few projects about to take flight for which my questions are quite inadequate!

I’m collaborating now with an economics professor on a class in which we’re running a randomized controlled trial around mnemonic-medium-like interactions. We’ve got the core mechanics up and running, so now the question is: what, exactly, should we be measuring in the course of the class? I mean, yes, sure, we’ll record their class test scores and a lot of their review attempts. But it would be quite uninteresting to simply find that “people who use SRS get better grades in the class.” That’s the null hypothesis at this point. The goal is to generate insight. So what should we be looking for in interviews? In open-ended projects? I’m not worried about pre-registering my hypotheses or anything like that. Everything we’re doing is exploratory, meant to improve the questions we’re asking. But I do want to make sure we’re recording what we need to record to answer a wide range of questions.

Likewise, I’m excited about David Chapman’s new meta-rationality essay, which incorporates Orbit prompts to reinforce its ideas. It’s quite unlike both Quantum Country and How to write good prompts: it’s in part a persuasive essay, though it’s also an explanatory essay, introducing much more abstract tools than those in the prompt-writing guide. The feedback so far has been interesting. Something about it isn’t working. But it isn’t not working either—I think. My questions here are still quite weak. I haven’t dug into the data we have at all, but I’ll do that in the coming week.

View Post

Member preview: try Orbit in your own writing

Hello, all! I'm having a contemplative start to the year, working to shift up some of my systems and plans in response to the reflections in my 2020 wrap-up essay. I hope your 2021 is already offering you many interesting rabbit holes.

Now that Orbit is being used so prominently in How to write good prompts, I'm receiving a lot of requests from people interested in experimenting with it for their own writing. I'm intentionally opening the floodgates slowly—I've found that authors require a fair amount of support, and of course the platform is still quite raw—but I wanted to let any interested members give it a try.

Please don't share that link with others, but it's OK to publish any pages you make using Orbit. Please do let me know about your experiences as you try it out!

Relatedly, I'm looking for folks who are interested in using Orbit in the context of a course (either in an academic setting or not). I'd love to collaborate with an instructor on experiments around research questions like:

  • Sure, mnemonic essays help people remember what they read, but what effect does it have on subsequent reading/learning experiences? Say someone reads an Orbit-enabled chapter 1. Can they then tackle more complexity in later chapters as a result? Learn later topics more rapidly? With less effort? More deeply? What factors in the medium seem most associated with these effects? Can those factors be amplified?
  • What impact does the medium have on creativity and insight generation? A more concrete experiment: if someone learns a topic through the mnemonic medium, how do their subsequent essays / projects / term papers / theses differ? What factors in the medium seem most associated with these effects? Can those factors be amplified?

It will probably be easiest to explore these questions in a technical context, but I'm also interested in working with authors of serious non-technical content (like that prompt-writing guide) to better understand potential paths for the medium in those contexts.

View Post

Reflections on 2020 as an independent researcher

Now available publicly if you'd like to link to it externally.

2020 was my second year as an “independent researcher.” It’s certainly not a well-defined job title. I’m grateful for the freedom it entails, but I’ve needed to grope around in the dark for patterns and structures which pre-existing institutions and roles would ordinarily provide. To begin 2021, I’ll share some of what I’ve learned and some of the outstanding questions which seem most pertinent.

These observations are necessarily quite personal, but I hope they’ll be of interest or use to others considering similar paths.

1. Why work independently?

I’m not interested in independence for its own sake. I’m interested in inventing environments which significantly expand what people can think and do. That aspiration is what drives my work, rather than a particular title, role, or practice. When I use phrases like “independent researcher” to describe my work, that’s a loosely-held shorthand for what I really value: freedom of inquiry.

For that matter, I’m skeptical that “independent researcher” is a stable or desirable long-term identity. Independence offers freedom, but it’s also quite limiting in various ways I’ll discuss later in this essay. If the work goes well, an independent researcher will likely find compelling opportunities to evolve into some higher-leverage institution—a studio, a foundation, an academic center, a business. If the work doesn’t go well, most independent researchers would have trouble sustaining their work.

In my case, the aspiration isn’t just to solve the object problem of, say, forgetting what you read. I want to more deeply understand the properties of enabling environments—principles of operation, design procedures and patterns, relationship to individual and social cognition—to foster a community which routinely invents new environments of this kind. Field creation will almost certainly involve institution creation. But my understanding is too weak right now to see much further.

Why not start a startup?

I understand theories by making software, and I live in San Francisco, so people often expect that I’m planning to launch a startup. But my interests are misaligned with a startup’s fundamental drive—growth. I’m interested in generating insight, not in generating growth, except insofar as growth is necessary to understanding the ideas I’m trying to explore.

Some companies figure out how to align these aspirations so that marginal revenue/usage enables marginal fundamental insight, which enables marginal revenue/usage and so on, in a virtuous cycle. Pixar is a good example. Cutting-edge graphics research enables new kinds of storytelling, which in turn funds more research. But this situation is quite rare.

Obviously, many startups do uncover fundamental insights, but typically as a means, not an end. You need a specific theory about why a given business model is connected to a given type of insight generation. You also need to explain why marginal insight remains an ongoing prerequisite to marginal business, rather than a “nice to have” which can be cut when convenient. I don’t have a theory like this for my own work yet. Such flywheels are particularly rare in my domain, since novel interface ideas are typically public goods. Their development might require significant investment, but once created they can often be copied cheaply by others, limiting returns to the inventor.

A related question I’ve had to straighten out for myself: am I really aiming for insight, or impact? Is my goal just to invent such environments—or also to operationalize, scale, and spread them? It’s true that my work is use-inspired; I identify more with applied researchers than with basic scientists motivated purely by discovery. So I wouldn’t want to work on a project without the possibility of transformative impact. But so long as I have some persuasive theory of how that impact could happen, I prefer to focus on producing insight through prototypes; I’d rather let others operationalize those insights into scaled products.

To put this another way, I’ll invoke a common trope about startups: “ideas are cheap; execution is what matters.” It’s a decent if overstated rule of thumb, but not because new ideas are unimportant. Startups are fragile. They usually focus on niches where execution is the primary factor because they have to. They tend to die if they tackle a niche which continuously demands both outstanding execution and also deep, original insight. This lens suggests a useful way to frame my goal: developing ideas far enough that they become “obvious,” the banal fodder for half a dozen companies in a future YC batch.

Another key misalignment makes me hesitant to consider a startup: culture. Tech culture is different from research culture, and I’m already quite overweight on tech culture. Anyone working in an industry for a while tends to adopt elements of that culture—its processes, its norms, its values, its tacit knowledge. Much of this is incredibly valuable, of course, but these norms also create constraints.

To give one example, tech culture is calibrated to a much faster pace than research culture. A “huge project” for a Silicon Valley tech person may be a year or two long; a “huge project” for a researcher may last a decade. Persistence with a difficult problem may require tens of hours for a tech person and hundreds (or thousands) of hours for a researcher, no matter how quickly try to work. It’s not that the tech people are constitutionally lazy or something like that: in industry, it usually is, in fact, a bad idea to spend many hundreds of hours thinking about a single problem. Better to create an 80/20 solution or try a different approach. But foundational insights often do require more patient, focused thought than heuristics from tech culture would naturally encourage. Coming from the tech industry, my expectations around the pace of progress are often seriously miscalibrated for many problems I’m tackling. I’ll feel like I’ve been banging my head against a question forever, but it’s only been a few tens of hours. That’s nothing! With this mindset, I’ll miss the results I seek and, as a bonus, drive myself nuts. I’m working to become much more comfortable slowing down.

Why not become an academic?

If I’m interested in freedom of inquiry and focused on generating insight, why not join academia? There are some boring stock answers: publish-or-perish, the grant treadmill, overbearing administrative responsibilities, conservatism, etc. But the real reason is my choice of field. My goals, values, and practices align poorly with the academic discipline which would most naturally host my work, human-computer interaction (HCI). If in some alternate universe I were interested in a different topic, there’s a good chance I’d have become an academic.

That’s a dispassionate way to put it. A subjective and inflammatory way to put it is that I feel contemporary HCI research lacks vision and imagination. Part of the problem seems to be that the field is trying too hard to be a science. It emphasizes an empirical approach, often producing elaborate artificial evaluations of uninteresting systems. Peer reviewers seem to reward analysis and data over ingenuity and compelling direction. The field disincentivizes building systems significant enough to explore new paradigms; published systems are rarely driven by a serious context of use. I read the major conferences’ proceedings every year, yet I learn much more about HCI from studying game designers and idiosyncratic Twitter tinkerers. Of course, there are a few researchers who do great work in spite of the field’s challenges. It’s hard not to be inspired by people like Hiroshi Ishii or Ken Perlin. And I should be clear that these are comments about the field, not about the individuals in it, who have consistently struck me as kind and well-intentioned. The headwinds just seem too strong to justify.

I’ll share one more story about the field. I noticed a professor’s name on several papers I enjoyed, so I invited them to chat. We had a friendly, wide-ranging conversation about topics in the field. Then at one point, I asked: “How do you balance pressing your own long-term research agenda in your lab with supporting the inquiry of your grad students?” They replied with some consternation: “Oh, no no, you have it wrong. I view my primary role as helping students pursue their own research projects. I’m not trying to push any long-term agenda of my own.” I suspect they exaggerated their stance here. Beneath the surface, I can see some consistent theories being explored through the students’ papers. But if all professors really operated according to this one’s belief, we can see the problem that would be created by induction. The entire field would consist of nothing but student work, without any venue for senior researchers to develop an idea of their own over time. I’m sure that’s not actually what’s happening, but qualitatively, the sense I get reading the major conference proceedings isn’t so different. It feels like skimming a sea of churning froth—tiny isolated studies rarely accreting into a broader current.

I criticize academic HCI not out of ill will but out of genuine perplexity. I honestly don’t understand why the field behaves as it seems to behave. I’d really love to be wrong. If you think I’ve misunderstood something, I’d love to read your comments. Maybe the problem is that I’m actually trying to start a field which doesn’t exist yet. Maybe this is my version of Bryan Caplan’s sour grapes. Maybe I’m too arrogant or closed-minded.

So I work independently. Not because that’s an ideal arrangement, but because I don’t see a good alternative. I don’t yet know how to create or join an institution which would enable better work. Part of the trouble comes from working in a proto-field, without methodology or principles solid enough to effectively support a pipeline of new investigators. In fact, I don’t understand yet my own projects well enough to effectively coordinate a large team around them, though I could certainly put a few more hands to good use. The most concrete incremental constraint is, of course, funding. Let’s talk about funding.

2. The unexpected success of crowdfunding research

When I left Khan Academy in early 2019, my (very understanding) wife and I made a rough plan: in 2019, I’d focus on the work and avoid thinking about funding at all; in 2020, I’d figure out some plan for sustainability and begin moving towards it; hopefully by the end of 2021, we’d stop burning cash. My collaborator Michael Nielsen suggested we set up a low-stakes Patreon for our work, and I agreed, not thinking much of it. He’s now moved on, but I’m quite grateful to him for that early nudge.

Less than two years later, this Patreon now generates roughly a graduate student’s fellowship grant. It’s not lucrative, but it’s enough to cover my living expenses. That’s an important milestone! So long as this income stream continues, my runway has been extended indefinitely—or at least until I start getting nervous about not being able to save. My goal was to end 2020 with a plan for how to eventually fund my work, but I’m astonished to now find myself ending 2020 with solid funding in hand.

Is this a model which other independent researchers could use? My story doesn’t necessarily generalize, but I’ll share some observations which might be useful to others interested in experimenting with crowdfunded research.

Growing a crowdfunded research grant

First, we should examine the dynamics of growth and churn. The funding must be stable if it is to be a viable source of independent research. In Technology and Courage, Ivan Sutherland describes courage as one of the key ingredients for research and notes:

I find that I have only so much room for taking risks. When I can reduce the risk in some places in my life, I can more easily face risk in other areas. I provide myself the courage to do some things by reducing my need for courage in other areas.

This certainly matches my experience, not just with this Patreon but with marriage, home ownership, investments in well-being, and so on.

These charts depict the shape of monthly revenue and membership count growth over 2020 (the differences are due to volatility in the distribution of patrons’ pledge sizes):

On the whole, then, my Patreon experience has been characterized by modest but steady growth. To understand how stable the funding is, we should understand the dynamics of churn, which these charts don’t depict. The story there looks decent: monthly new patron rate (median, 25–75th %ile) was 9% (8–15%); monthly departing patron rate was 3% (2–4%). This suggests reasonable stability, though departure numbers should increase over time since the Patreon is new, while new patron growth may decline as I saturate my interested audience.

Annual billing has been a helpful boon for me. Roughly a quarter of new patrons elect to pledge on an annual basis. Assuming steady revenue growth, then, these patrons’ annual commitments let me fast-forward about three months into the future along my growth curve. Annual pledges also offer a helpful extra measure of stability.

On my Patreon, I offer three funding tiers: $5, $20, and $100. The tiers are mostly arbitrary for now—the only difference is that the $100 tier comes with public attribution (which no sponsor seems to care much about). But the tiers give people a way to modulate their support according to their interest and capacity. I’m quite happy with how the price mix has worked out. Most of my revenue comes from $5 members, while a smaller number of more generous sponsors’ contributions boost my funding considerably. In aggregate, that mix generates a grant large enough to live on, while freeing me from worries about any individual funder. This does wonders for my peace of mind. At Khan Academy, we were often quite dependent on a small number of huge-value donors, and we’d regularly fret about appeasing them; when one would decline to offer another grant, it would create tremendous stress. The “grassroots” situation creates a much healthier interpersonal relationship with funders, some of whom are personal friends—I’d hate for anyone to feel like they need to keep funding me out of some sense of guilt.

Patron growth appears to be roughly linear, but you’ll notice a discontinuity in the slope of the graph around May. That’s when I decided to try sharing more exclusive writing with patrons. This does seem to have worked as an enticement, though I’m actually not sure how important exclusivity is. I expect the articles drive new patron growth in part because they create repeated opportunities for others in patrons’ networks to stumble upon my work. Is the higher rate of growth because people want to unlock the paywall, or because articles create traffic? I hardly promote the Patreon at all—I’m not yet sure how to do that in a way that makes me feel comfortable—so this type of traffic may be important.

Researching in public

One surprise of crowdfunding research has been the pleasure of building a closer relationship with a community of people enthusiastic about my ideas. But to do good work, I mostly need to avoid thinking about funding and funders. The promise of ongoing exclusive content creates some tension, then. Unlike a typical paid newsletter or blog, funder-exclusive writing is a secondary by-product of my primary work. In this way, I’m not a traditional “content creator.” Sometimes I catch myself thinking in terms of what I’ll write or report next to my funders. That’s not good. Such a mindset, taken too seriously, encourages shallower work designed to appease others. Also, I’m human, so I naturally want to report successes. But this can create the same pressures which exist in scientific publishing: short-term-ism, conservatism, publication bias, harmful over-claiming. In research, it’s terribly important that you be brutally honest with yourself. I don’t think it’s possible to craft marketing-like messages about your “great progress” without closing your own eyes to what’s actually happening—which means you’d better be brutally honest when talking to others about your work.

As an antidote, I’ve tried to engage in what Michael Nielsen has called “anti-marketing”. That is, to make a point of focusing publicly on the least rosy parts of my projects—what’s confusing, what’s frustrating, what’s not working. It’s hard to do consistently, but when anti-marketing is the goal, then interesting challenges become something positive: useful fodder for public conversation, rather than something to be swept under the rug. I suspect it also builds a deeper, more authentic relationship with an audience.

More broadly, I’ve experimented this year with a mindset I’ve been calling “working with the garage door up.” I try to share rough, ongoing artifacts from my process, including the working notes where I do most of my daily thinking. This has worked quite well when I adopt the right mindset—that I'm sharing objects made as part of my primary work, rather than things created specifically for publication.

The practice generates more conversation and serendipitous inbounds "for free." It’s worth noting that in most ways, unusual inbounds are a better leading indicator for my work than page views or other more traditional metrics. Popular projects might garner a lot more mass attention but a lot less attention from unusual, singular people. Those people often introduce surprising (and more meaningful) insights and opportunities.

Apart from potential distraction and distortion, there’s another subtle issue with talking to a broad audience about ongoing research. The clearest, most familiar parts of your ideas are the ones which you’ll have the easiest time communicating and which your conversation partner will have the easiest time grasping. Often, those elements are already somewhat mainstream or even clichéd. Others are likely to have lots of cached thoughts around such ideas, and they’ll tend to be interpreted incrementally.

But if you’re doing something original, the most interesting aspects are the ones which others—and you!—understand least well. Particularly early on, you may not be able to articulate the new element you’re reaching for very clearly. It may just sound like an unusual adverb choice or an innocuous-seeming qualifier. Others’ replies will tend to emphasize the most mainstream elements, since they may not notice or know how to react to the aspects you least understand. Such conversation will often drag you back towards the mainstream. It’s a kind of “regression to the mean” for ideas.

Great colleagues and collaborators can take more active steps to mitigate the issue. For instance, if Michael Nielsen hears me talking about some idea that seems fairly banal on the surface, he’ll deliberately tug at the vague spots where I’m straining to reach past typical interpretations. This was part of a broader practice he called “listening for enablement.” My Khan Academy research colleague May-Li Khoe would respond to poorly-understood ideas by riffing in wild, unpredictable directions. Her sketches would often be exploring something else entirely, but on many occasions those vivid reactions helped me understand my own inklings better. Unfortunately, I’m not yet sure how to avoid the regression problem when discussing ideas regularly with a broad audience.

What are patrons buying?

Free web sites will often have a “donate” button with language like “if you enjoyed this (free) content, please consider showing your appreciation with a donation.” That’s how Michael Nielsen and I thought of this Patreon when we started it: roughly like a tip jar. But in a patronage model, people fund the work in an ongoing fashion. The tip jar model doesn’t explain this behavior very well. Why donate repeatedly over time out of gratitude for past material?

In my interactions with patrons, I’ve been surprised to find that altruism is rarely the dominant force. Patrons mostly don’t think of themselves as paying for consumption of past work; they’re buying into production of future work.

From this angle, it may make more sense to think of the production of the work itself as a product. What are patrons buying when they buy that product? In its most compelling form, a patron’s purchase causes future work to be produced which would not have been produced without it. Perhaps without that patron’s contribution, the creator must spend some of their time freelancing to pay the bills, so their projects were limited in scope; but with that patron’s contribution, the creator can work full-time on much more ambitious projects. Or maybe they can hire a freelance artist to illustrate their game, etc. This is like a Kickstarter crowdfunding campaign, except in an ongoing fashion instead of for a one-time effort. In my current situation, marginal funds buy something subtler: growing confidence and stability; freedom from foundation or investor appeasement. As funding continues to grow, marginal funds will likely buy contractor time to expand my scope.

In December of 2020, I asked my patrons to briefly explain why they support my work. Roughly a quarter of all patrons wrote back. The vast majority framed their motivations in terms of supporting production of future work. Some people quite specifically want to use a prototype I’m developing; others just want to see certain ideas developed further. About a third framed their funding in terms of “people, not projects,” expressing general confidence that I’ll do interesting work. Naturally, that’s my favorite kind of support. After this cluster of answers, the distant second most common motivation was access to the behind-the-scenes content.

I considered asking patrons to pick their top couple reasons from a pre-authored list, but I thought it would be more interesting to read open-ended responses. I’m glad I did! I wouldn’t have included any of the following motivations, but each was expressed by >10% of respondents:

• patronage creates a feeling of being “on the edge” of something new

• participating in the experiment of crowdfunding research

• seeing my work “up close” is a source of personal inspiration

• wanting to support “tools for thought” in general (rather than any of my specific projects)

Understanding all this helps me better frame how I describe and relate to the Patreon publicly. For example, relatively few readers cited exclusive content as an important motivation, but it clearly accelerated the growth of funding. Seeing these responses from patrons, I dug into the data and noticed that the exclusive content also reduced patron drop-off rates by roughly half. My current theory is that for many patrons, this insider content is important mostly because it reinforces the (approximately true) impression that their support is creating marginal production, which is by far the dominant motivation people cite.

My path to sustainability

How repeatable is my modest success with crowdfunding? Could others follow this path? I can’t know, of course, but it’s worth articulating some of the factors which may have been essential. At a high level, my story is one of leveraging career capital from the tech industry. As Adam Wiggins suggests, I suspect it’s possible for more tech people to do this once they have some years under their belt.

First and foremost, I was in a position financially to draw no income for (as originally planned) several years. In terms of the general population, this is a very unusual and fortunate position! For many tech workers, though, it’s not a terribly outlandish goal. Most of this capital came from five years spent at Apple: joining Khan Academy (a non-profit) cut my income by two thirds. San Francisco is quite expensive, but we’ve lowered our costs by owning rather than renting our home, which of course also required capital. My wife’s generous understanding was unusual and essential. Practically speaking, though, it helps that she’s a doctor. Having trained from 2011 through mid-2020 (medicine is wild!), she recently began her first “full” attending position, now at last with plenty of earning potential of her own. Also, we don’t have (or plan to have) kids; our mortgage is our only debt; our families don’t need our financial support. It’s a fairly ideal situation financially, except for the San Francisco part.

I should mention that there was another straightforward path for me (and for many other tech workers) which wouldn’t have required crowdfunding at all. Before I left Apple, my plan was to save enough that I could quit and pay myself a grad student’s stipend from the interest indefinitely. It would have taken another five years or so, depending on my diligence. That seems very achievable, though I was so uninspired by the work at Apple at that point that it might have done some permanent damage!

While I’d planned to live without income, I avoided burning through savings by becoming the lucky recipient of an Emergent Ventures grant. It covered my first year’s expenses. (As with the Patreon, this was a nudge from Michael Nielsen: I’d intended to ignore funding for my first year!) I would certainly recommend Emergent Ventures to others trying to find a way to support their independent work. The application took only a couple hours; I enjoyed a half hour of thoughtful conversation with Tyler Cowen; they made a decision within a few days; my only obligation was to send them a few short reports. Emergent Ventures stands in delightful and courageous contrast to the intensely burdensome experiences I’ve typically had interacting with foundations. The grant successfully bootstrapped me: by the time the grant expired, my patrons mostly covered my expenses. Self-obsolescence seems like an ideal outcome for a philanthropic grant. I suspect this kind of bootstrapping would be quite important for many people to crowdfund independent work: patrons take time to accumulate, but the first few hundred donations aren’t enough to quit your job.

Finances aside, a few related career capital factors have likely been crucial to my progress. They may be important for others doing similar work.

First and foremost, I’ve had enough professional experience to build up quite an unusual skillset. I can independently research, design, implement, analyze, and report on novel software environments. I’m certainly not a world-class graphic designer or literature reviewer, but being able to execute the whole end-to-end process myself is incredibly enabling. There’s also a matter of degree. With my industry experience, I’m comfortable rapidly building highly polished production systems, systems which can attract users organically and engage with serious, real-world contexts. I suspect one key challenge for academic HCI is that researchers often lack the skills (if the inclination) to design and execute consumer-quality software. Without those skills, many projects are trapped in toy contexts.

Those past experiences have also yielded essential social capital. Most people don’t seem to care that I’m unaffiliated, but I suspect that’s only because I can introduce myself by saying I helped build iOS at Apple and led R&D at Khan Academy. Without some kind of strong social proof, it may have been hard to get anyone to engage seriously with my work. Along the same lines, my past work and writing has accumulated a modest but substantial online audience. I don’t need an academic journal or publisher to market my work for me because I can “publish” on Twitter. My audience isn’t large by “public intellectual” standards, but it’s enough to produce strong network effects which propagate the work quite broadly and prompt interesting inbound relationships. This same network is the source of most of my patrons. I don’t think I could have recruited this audience of funders without Twitter—at least not with such little effort and distraction.

So there seems to be some important path dependence to this career capital. If I’d tried to do this type of work immediately, I’d lack many key skills; I’d work too slowly; I’d be distracted by promoting my work and marketing to funders. Maybe I would have made good progress despite these challenges. I’m not sure! But my experiences do suggest that many more “staff/principal”-level tech workers could successfully pivot to an independent practice.

I’m much less sure about this, but I suspect my work depends on time spent living in San Francisco. Prior to the COVID pandemic, most of my evenings were filled with conversation which reinforced a mindset of earnestness, intensity, disagreeableness, resourcefulness, and so on. Patrick Collison advises: “Figure out a way to travel to San Francisco and to meet other people who’ve moved there to pursue their dreams. Why San Francisco? San Francisco is the Schelling point for high-openness, smart, energetic, optimistic people. Global Weird HQ.” I think he’s right. Living here has changed me deeply. Probably other places would have had a similar (or a better?) effect on similar axes. For example, I like what conversation in Cambridge, MA does to my state of mind. But when I lived in Portland, OR, for example, the environment tended to emphasize a different set of values—community, craft, sustainability, enjoyment. I liked these values, too, but I suspect they would not so naturally reinforce my current work.

Can you absorb a scene’s values through the internet? I certainly did to some degree as a teenager living in Saint Louis. Social technology has only improved since then. In many ways, distributed networks like Twitter transmit values more fluidly than live interaction. But my sense is that bandwidth limitations in interaction are significant. There are depths which are hard to absorb without constant physical immersion.

3. Working alone

I may be working independently, but that doesn’t have to mean working alone.

Executing alone

I’m very grateful to have a modest and possibly sustainable source of funding for my work. But as pseudonymous blogger Applied Divinity Studies writes (after receiving an Emergent Ventures grant of their own):

I suppose the expectation is that I’ll just save the money or spend it on rent. The implicit assumption being that I have a burn rate, this defines my runway, and money is used to extend the time I have to continue to do what I’m already doing.
But that’s crazy. This isn’t how any ambitious person spends money, nor is it a path to long term growth. Surely there’s some way I can spend this money to actually do better work, earn more money, and grow exponentially? … Surely there’s something I can do with the money other than give it to my landlord?

I’ve been writing about a related metric in my work since 2018: how much capital do I feel I could productively deploy towards my goals (annually, say)? In early 2019, it was a few tens of thousands; now I’d put the number at a million or two. It’s still not a high number, but at least it’s climbing. The difference is that I now understand the work well enough to imagine how I could accelerate it with a team.

I also understand an enormous challenge I face in working alone, doing the type of projects I’m doing. I might be a jack-of-all-trades, but practically speaking, I worry that I can’t do good research if I’m also doing all the implementation work. The simple version of the problem is that there are prototype ideas which I don’t explore because it would take too long, and there are many concepts I’ve sketched yet haven’t had time to implement. But there’s a deeper problem here.

My approach requires developing new software interfaces which express insights, then studying those interfaces and their use to generate new insights, and so on. Michael Nielsen and I called this “insight through making.” In practice, it’s quite difficult to think deeply about theories while in the midst of a significant software development project. They’re different states of mind. And it’s hard to build momentum on software development when spending much of one’s day in reflection, writing, and study. Worse: switching costs are high between software development and research thinking. I’ve not had much success when dividing my days or even my weeks into “building” and “thinking” blocks.

In March 2020, I wrote a list of research questions for the mnemonic medium, then embarked on building Orbit, which I planned to use to study those questions. Nine months later, I’ve made little progress on those research questions. I’ve mostly been building. At some point I’ll need to execute a “hard switch” back to thinking about those questions for a while, during which time it’ll be difficult for me to build anything significant for Orbit.

Maybe that’s fine. I just need to spend months at a time in one mode or the other, and I need to get more patient. But when I’m deep in software development, reaching flow on a daily basis, my mind narrows to a kind of tunnel-vision, totally fixated on the software systems and their problems. The problem isn’t just about escaping tunnel-vision in that moment. I’m worried that engineering mindset stunts my growth as a researcher (still quite nascent), even in the weeks following an intense period of implementation.

Another problem with cycling slowly back and forth is that feedback loops become too long. For example, in February, Michael Nielsen and I published an experiment involving a new kind of spaced repetition prompt, an “application prompt” which we thought might help readers apply what they read. We’ve run some initial analyses and interviews around those prompts, but in truth we’ve learned surprisingly little about them since their introduction—mostly because I’ve been focused on building Orbit. A very reasonable criticism of my year is that I’ve built a grand new observatory before I’d come close to exhausting what I could see what the one I already had.

I understand at least some of my projects well enough to accelerate them through execution-oriented teammates. In 2021, if I can find the right people, I’d like to experiment with volunteer and (funding dependent) contract collaborators. It would be easy for coordination overhead to do net damage to my work—so this goal will depend on finding the right people.

Culture and scenius

I may not want to join academia, but I deeply envy a good field’s intellectual kinship. Solo engineering has fairly obvious limitations; solo research presents subtler problems. I’m grateful for intermittent conversations with others doing related work, but for the most part it’s not enough.

I want to be part of a rich scenius of serious, capable people doing full-time original research on enabling environments. I want to attend colloquia in which I’m regularly stunned by the ideas presented. I want peers who will candidly observe the limitations of my ideas, then work with me to improve them.

I don’t yet know how to make this happen. This proto-field has (hopefully) a proto-scenius: many part-time tinkerers, many startups doing research-ish work when they can spare the time. Funding is one limiting factor for a bloom of full-time work. But culture is another. This scene, as far as it exists, mostly draws its norms and values from tech culture—just as I originally did. I like the influence of arts culture on this scene, supporting a more expansive, playful design orientation. But I worry that we need a significant injection of research culture to support the patient, probing, self-critical work which can yield transformative insights.

I recognize the irony in calling for more research culture after earlier maligning academic HCI. But my problem is with norms and goals at the field level; many individuals working in it could still provide great influence. And we could draw on researchers in other fields, as I’ll describe in the next section.

I’d like to focus more effort on this problem in 2021.

Serious contexts of use

The most serious problem for me in working alone is that it means I lack a serious context of use. I’m building tools for tools’ sake. As Michael Nielsen and I described elsewhere, this is one of the most common failure modes for work like mine.

The Apollo program created countless powerful scientific and engineering tools, but the point was putting people on the moon (and showing up the Soviets). Likewise, when Pixar created its revolutionary animation tools, many teams had been working on computer graphics for years, but Pixar’s systems emerged from a zealous pursuit of a storytelling dream. Mathematica is great because Wolfram built it to help with his own mathematics research. And so on.

Practically speaking, such contexts provide deeply meaningful feedback. Many critical insights about a prototype system will only emerge in the context of a serious creative problem that’s not about the system itself. But perhaps most importantly, these projects also provide the intense personal connection which makes great work possible.

Adjacent to my work, there are many “explorable explanations” which attempt to explain a subject through novel interactive media. Some of these articles are quite striking. But most are primarily motivated by the author’s interest in the experimental medium. Most authors’ skills lie primarily in building interfaces, rather than in whatever discipline they’re trying to explain. These authors’ medium ideas are limited by their lack of domain expertise and domain motivation. By contrast, it was important that Quantum Country had a serious context of use: to help earnest students learn all the fundamental principles of quantum computing and quantum mechanics, as well as several applications. This richer context was possible because my co-author Michael Nielsen is a pioneering quantum computer researcher.

So one recipe for insight through making might be deep collaboration between some colleagues focused on some serious domain problem which might benefit from augmentation (“tool-users”), and other colleagues looking to use that context to drive the creation of new environments (“tool-makers”). Our plan for 2020 was to start an experimental media studio. We’d produce ambitious media projects which would make some significant original contribution to deeply important questions (e.g. what to do about climate change?). These projects would be worth doing in their own right, but they’d will also serve as a vehicle for the development of new enabling environments for thinking, communicating, acting. This plan depended on Michael’s special skills as a synthesist: I’d be primarily “tool-maker,” while he’d be primarily “tool-user.” So I had to shelve that approach when his plans changed. But I’ve spent the last few months trying to build relationships with domain experts who might make interesting collaborators in a similar vein. It’s too early yet to declare success or failure, but this type of collaboration is not easy to establish.

————————

What are the limits of independent research, and of independent funding? If independent research is a stepping stone, what new paths does it open? Is my position a fluke or an example? Can independent researchers be coordinated into a scene without traditional institutions?

I’m not sure how to answer any of these questions, but I’m grateful to be trying. Happy new year.

View Post

Your help requested! Quick survey on funding independent research

As the year draws to a close, I'm assembling some reflections on being an independent researcher in 2020. One important section centers on patronage as a funding model—and I could use your help to complete it!

Willing to help? Please click here to send me a quick sentence or two about why you're a patron. The form is anonymous.

Your response will help me validate and elaborate some ideas I'm considering in this space. Thank you in advance! 🙇‍♂️

View Post

Prompt makeover logs

My past few weeks have been filled with plenty of interesting conversation about the prompt-writing guide and the challenges of formulating knowledge, both in the workshops and in email correspondence. 

Amusingly, one of the richest discussions I've been able to observe has been José (aka Artir)'s conversation with himself as he criticizes his own recent prompts and revises them. You can see his notes here. One example:

Q: Does LDL particle size have any relation to CVD?
A: No
Problem: It’s a Yes/No question, the same question can be leveraged into an explanation qustion
Q: Why does an increase in LDL particle size not correlate with CVD?
A: Because what matters is particle count, LDL is all sufficiently small to matter for CVD.

Reading this made me realize that others would probably benefit from the same exercise! And others could benefit if you would share some of your notes and revisions as José has done. If you're game, please comment with links to your notes.

I should also mention: notes like this really help me!

As a researcher, my main "differentiator" is that I explore ideas by expressing them in real-world systems, so that I can learn from how they refract through the lens of serious use. Those observations let me improve the ideas, which let me improve the systems, and so on in a virtuous cycle. Michael Nielsen and I called this the "insight through making" loop. This approach differs from typical industry practices in that iterations are meant to express deep theoretical insights. And it differs from typical academic research by building authentic, real-world systems which provide richer fodder for generating theoretical insights. I certainly can't claim that my insight-through-making loop is operating smoothly, but that at least is the aspiration.

The prompt-writing guide has involved creating a tiny insight-through-making loop. To write it, I needed to distill a number of ideas about learning into something like a system—a framework communicable to readers. So the most rewarding part of the last few weeks has been seeing those ideas refracted through readers' interpretations, both through the workshops and through email conversation. Thank you all for that; hopefully, what I learn will translate into better theories and better systems.

View Post

Early access: "Translating knowledge into spaced repetition prompts"

I spent November working on a detailed guidebook and style manual for modeling knowledge and writing spaced repetition prompts. The guide will also be the first large public demonstration of Orbit: the text is interleaved with prompts mean to help readers internalize its content.

I'd like to give you early access: click here to read. (please don't share this with others yet)

This guide is meant to help bootstrap the next major phase of my work, which will revolve around working with authors. I'd like to experiment with Orbit across a variety of subjects and contexts, both to support memory and to experiment with some of the more exotic prompt types discussed in that guide and in my other writing (e.g. Timeful Texts).

To those who have participated in my prompt-writing workshops: thank you! Those discussions helped me improve both my understanding and the guide itself. If you read the guide before last Friday, you may want to flip back through to try the embedded prompts. Also, the final part (now titled "Prompt writing, in practice") has been mostly rewritten.

Comments, questions, bug reports are of course encouraged.

View Post

Invitation: "Translating knowledge," workshop on spaced repetition prompt-writing

I've been shifting my focus to helping authors learn to write with Orbit. As part of that effort, I'm developing a set of materials to help people develop a personal spaced repetition practice, since I believe that may be a practical prerequisite to using prompts in communication.

If you're interested in learning to internalize complex knowledge using spaced repetition, I'll be running some live workshops on that topic. As members of my lab community, you're invited!

Click here to sign up, and I'll follow up with details. Note that it's a 90 minute workshop with 30–45 minutes of pre-reading, so it's a somewhat serious commitment.

That link's just for you all, so please don't share it—though if you think someone would be excited about this workshop, this might be a nice time to suggest they become a member. :)

You'll also be helping me learn about people's challenges and experiences with this medium: I just ran the first workshop with some kind authors yesterday and learned a great deal. I'll share those insights here as I understand them better.

View Post

Liquid olives and iPhones; problem-solving and problem-finding; The Uncertainty Mindset

Growing up years ago in the midwest, my perception of a fancy restaurant was awfully simple. Firstly, fancy restaurants have fancy waiters who make you feel uncomfortable for using the wrong utensil. And secondly, fancy restaurants use fancier ingredients: the menu might include ribeye and lobster instead of hamburgers and barbecue. But then I moved to California, and—to make a long story short—a bowl of tapioca made me weep. I learned that great restaurants are rarefied institutions of extraordinary innovation and artistry.

Around this time, I was an intern at Apple by day, and I was devouring high-end cooking texts by night. I felt a great deal of connection between the two worlds. They shared an essential shokunin-style craftsmanship important to me at that stage of my career: yes, invention matters, but first we must render each pixel perfectly, cut the brunoise precisely. But—and this was fairly mysterious to me at the time—both worlds also housed practitioners who thought the unthinkable, who pushed their fields forward. How do those two worlds co-exist? How do these teams operate?

At the world’s best restaurants, it’s not enough to deliver a great service every night. It’s an incredibly competitive sphere. Sitting still means falling behind. Apple’s situation is quite similar: the iPhone might be one of the most astonishing consumer products ever designed, but it’s not enough to simply manufacture the same device each day. Likewise, the world’s best restaurants spend lavishly on dedicated R&D operations which develop new cooking methods, source and understand new ingredients, invent surprising process improvements, and evolve the restaurants’ unique culinary styles.

I’m quite envious of the years spent Vaughn Tan in preparation for his new book, The Uncertainty Mindset. He embedded himself in the culinary labs of some of the best restaurant groups in the world, studying how they opened new locations, solved problems, developed new dishes—and convened a major conference in the mud beneath a circus tent. From his observations, Vaughn distilled several shared but unusual traits in these teams’ organizational practices. I was struck, reading the book, by how similar these practices were to the best of my experiences at Apple. I’ll share some of those stories and how they relate to Vaughn’s stories of culinary R&D teams. As excited as I am by this book, I’ve noticed that its practices seem to align less well with my experiences as they’ve shifted increasingly towards the “R” side of R&D. I'll try to characterize that evolution too.

Continuously-negotiated roles

In startups, roles are fluid. Everyone wears many hats: what’s important isn’t your job description but the problems which need to be solved. As companies get larger, though, abstraction becomes important. It’s awfully hard to have thousands of employees without clear job titles and areas of responsibility. Somehow, though, at least in the early days of iOS, Apple managed this. Roles were, as Vaughn puts it, “modular and provisional.” We fluidly assembled for each new problem, composed to provide the skills needed for that particular project.

My title was “software engineer,” but that’s not a useful descriptor for assembling a team in practice. It’s both too broad and too narrow. First: what kind of software engineer? Initially my experience was mostly in interfaces and API design, but I’d eventually develop skills in architecture and systems engineering which would get me pulled into a completely different set of projects. More importantly, though, my role evolved to include meaningful design and product management work, partially in response to the demands of projects, and partially in response to my interest. This was great for me personally—it kept me interested—but also great for the organization, since projects like parallax required interdisciplinary teams with unusual combinations of skill sets.

Vaughn argues that this kind of fluidly evolving role definition is essential for innovation in teams because of the activity’s inherent uncertainty. Consider el Bulli’s famous liquid olive: an ovoid appearing to be an olive arrives on a spoon before you, but when you pop it into your mouth, you discover that it is in fact a paper-thin membrane surrounding an explosive olive juice. To make one of these spheres, chefs drip juice into a bath of sodium alginate which reacts to form the delicate skin. Imagine that you’re hiring for a culinary R&D team, and you want to hire the chefs who would invent this technique. What title do you put on the job req? What qualifications? Material science? Chemical engineering? Pastry cookery? This prompt assumes, of course, that you’re even able to specify “invent the liquid olive” as the problem statement—which you can’t. The activities of such a team are inherently uncertain. Not in that they’re risky: risk is something you can understand, plan for, manage. Instead, they’re just laden in unknown unknowns. Your team members’ roles must deal with this inherent uncertainty, which itself shifts over time.

Demo-driven development

Of course, not every role in a kitchen is so nebulous. You will need an army of prep cooks who are simply prep cooks. And likewise, the vast majority of Apple’s software engineers were much more straightforwardly software engineers. Apple needs most of their software engineers to be happy just being software engineers. The process which separated those more unusual engineers at Apple sounds a great deal like the one Vaughn describes culinary teams using.

My journey away from being a straightforward software engineer relied on the process of “demo-driven development” which my mentor Ken Kocienda described in Creative Selection (recommended!). Our work revolved around a regular cadence of demonstrations. When I was working on the swipe-left-to-right gesture for navigating backwards on iOS, I demoed my work every day or two to designers, engineers, and executives. Craig Federighi, head of Apple's software engineering, would swing by my office, play with the gesture, notice what was working and not working, make suggestions. All this helped make the gesture itself good. But it also served an important organizational purpose: these projects and demos functioned as ongoing public tests. Everyone got to see how I handled (or failed to handle) the various problems which came up. When the next project came around, this would help leadership put together better teams. Because my peers were also present for many of these demos, they could see for themselves where I was doing well or poorly, which would both inform their own work and reduce feelings of resentment or confusion when future project assignments were made.

If you’re going to have a team with continuously negotiated roles, you need a context for that continuous negotiation. These demos unified the “tests” with the real work. Vaughn describes culinary R&D teams following a similar path, testing team members by asking them to solve a wide variety of problems in different operational configurations. Each day ends in a group tasting, just as our days often ended in a group demo.

This project-driven approach also creates a fertile training ground for new employees. The constant feedback is a great way for new team members to learn the house “style.” Different mentorship collaborations can be tried and discarded. This makes training feel a bit haphazard—and sometimes quite stressful—but it’s difficult to create consistent formal training programs when pursuing open-ended goals, for the same reason that it’s difficult to precisely specify anyone’s job description on such teams.

Desperation projects

Another common practice for culinary R&D teams overlapped a great deal with my experience at Apple: the strategic use of desperation. Excerpting from Vaughn:

Every point in the pattern looked like this: commit to a project beyond the team’s ability, freak out individually and collectively, work like mad, somehow pull victory from the jaws of defeat, breathe a massive sigh of relief. I ran into people from each of the teams periodically. When they were in the middle of one of these projects, they seemed desperate: emotionally and psychologically exhausted, worried (slightly terrified was often a better description) that things wouldn’t work out or (worse) would be disastrous. The teams seemed unable to learn from their mistakes and avoid these desperation projects. In fact, they kept committing to doing them. They would heave a sigh of relief that they’d scraped by and then—the next month or the next year—find something else to do that would make them desperate again.
Eventually, I came to understand that they put themselves into these terrible situations as a way to force themselves to innovate, that the desperation was productive, not destructive. It was desperation, but by design.

In late 2012, I landed after a long flight and shared in a ceremonial moment as three hundred of us simultaneously disabled airplane mode. Immediately, my phone was flooded in notifications: Scott Forstall (the executive primarily responsible for the creation and first few iterations of the iPhone) had been fired. Jony Ive, formerly responsible only for hardware design, would take on software design as well. In late November, a truly desperate project was declared. Jony wanted to redesign the entire operating system and every app we shipped. We’d be putting it in developers’ hands in early June. This would have been an insane proposition even if we were only planning to change the OS’s outermost skin, but the initial ambitions, at least, went much beyond that.

Just for flavor, one project I pursued was based on the observation that white things in the physical world are never actually white. They always take on some character of their surrounding, shifting with perspective and external lighting conditions. Perhaps we could develop a special material—“digital white”—which would embody a similar subtle dynamism. My prototypes used the gyroscope to create a subtle living shimmer, almost like a blind debossing on a book cover. The idea was to use this to indicate interactive elements on the screen, since we were stripping away the skeuomorphic trappings of buttons. We ended up tossing this idea: it was too heavy in power consumption—and in Jony’s words, it was “a bit… carnival.”

But still, that was one of at least half a dozen similar major, system-wide projects I led in those seven months. My routine was simple: each day, I woke up, rolled over, grabbed my laptop, and didn’t put it down until I went to sleep, every day for seven months. We worked from deep desperation, but as Vaughn describes, it was absolutely one of the most exhilarating and dynamic periods of my life.

Problem-solving and problem-finding

While I read Vaughn’s book, I felt deep connection between my experiences at Apple and his stories of the culinary teams. But the connection was much weaker to my time at Khan Academy, and weaker still to my more recent research. I can’t characterize the difference completely, but I think it may rest on a distinction between innovation and invention, between problem-solving and problem-finding.

I joined Khan Academy with my friend May-Li Khoe, who had been heavily involved in innovative work at Apple. We started an R&D group together, hoping to bring this kind of exploratory work to Khan Academy. Its charter meandered substantially throughout the five years I was there, and in those meanderings I feel I can trace some limitations of the approaches Vaughn describes.

For some of our projects (like scaling open-ended problems online), our team acted almost like an in-house consultancy for the rest of the organization, creating solutions to some problem which could be at least vaguely defined. This was similar in many ways to my time at Apple: yes, we were creating interesting new interfaces, but in response to some sort of exogenous problem statement. In these situations, the problem statement is basically never fully defined—part of the project is negotiating what the problem to be solved really is, “ripping the brief” and so on. Part of the work between projects is lobbying for future problems-to-be-solved to executives. But the problem statement’s presence anchors the project and makes its edges finite. Regular demos make sense because you can evaluate them in the context of the evolving problem statement. Desperation can apply useful pressure because you can make pragmatic tradeoffs, perhaps sacrificing some bold artistry in pursuit of some practical solution.

But some of our projects (like Cantor) were more about problem-finding than problem-solving. There was no clear problem statement. There wasn’t really a client or a customer. In fact, the most important work to be done was in identifying a powerful problem statement—and for many of these efforts, we never did! The interesting parts of my work today are mostly of this kind. Yes, there are specific problems to be solved, many of which I’ve discussed here, but the most powerful forces in my work are about problem finding. This work felt much less connected to Vaughn’s descriptions than my other experiences had.

Why is it that knowledge workers seem so fundamentally unserious about improving their fundamental skills, compared to athletes and musicians? How might we create environments which do the job of a book, but which participates more actively in the impact the book hopes to have? These are already very unusual problems, and just posing them is a significant contribution. But these big-picture problem statements shatter fractally into a hundred sub-problems, and most of the progress in my work comes from identifying and improving articulations these sub-problems. Actually solving those problems is important, of course, but that’s downstream.

When problem-finding, theories and concepts are often the locus of activity. Rapidly iterative demos and prototypes become much harder to produce—and, at this stage, much less relevant. Desperation becomes, at least in my experience, a much less helpful force. When it’s not yet possible to sharply articulate a clear problem to be solved, pouring jet fuel on the fire just produces a haphazard eruption. Long walks and hours-long lunch discussions are the order of the day.

I confess I don’t understand this distinction very well. My experience suggests that a certain amount of “demoing” and a certain amount of desperation are actually quite helpful in research. But I notice that most stories Vaughn describes are of a culinary R&D team reacting to an externally-defined problem (though perhaps a vague one): a palate cleanser is needed on this tasting menu; these cannelloni are produced inefficiently; this restaurant needs help opening; and so on. There are some glimpses of problem-finding, like the liquid olives from el Bulli described above, but these aren’t documented as clearly as the others, and it’s not clear that the stories follow the same principles quite so sharply. After all, as I understand it was Ferran Adria himself (the head chef) who developed that technique, not some R&D team in his employ.

Is this the distinction between invention and innovation? Research and applied research? Academia and industrial R&D? The lines are fuzzy, and I don’t claim to understand them. Vaughn’s book is the most insightful treatment I’ve read of industrial R&D organizational practices—and now I’m hungry for more, focused more on the “R” side of the equation than the “D”!

My thanks to James Cham, both for recommending this book and for nudging me to compare its ideas to my own experiences.

View Post

Working with authors: text-writing requires prompt-writing requires text-writing

A few weeks ago, I had something to celebrate: Orbit had reached the point that authors could publish texts using it. Normally, when software projects reach a major milestone, there’s something highly visible that “goes live”—something to link others to! But in this instance, it’s a piece of infrastructure that’s “gone live”; now the work is in helping others use it to create something meaningful.

As I’ve worked with authors, I’ve realized something important I should have seen much earlier: authors who don’t already have a successful personal spaced repetition practice have a really hard time writing effectively with Orbit. This makes perfect sense in hindsight. Michael and I had already been using Anki for years; we drew heavily on that experience when creating Quantum Country. If we’d come in cold, without ever having written spaced repetition prompts, there’s no way we could have done it.

I’d been thinking of the challenge for authors in terms of “communicating effectively with prompts,” and that’s real—but they also need the basic skills of knowledge modeling that come with a personal practice. The challenge is that now my dependency graph has a cycle in it.

One of Orbit’s main theses is that writing good prompts is a huge barrier to widespread adoption for spaced repetition systems, but we can significantly lower that barrier by interleaving expert-authored prompts into contextualizing narrative. As you read more expert-authored prompts, you’ll absorb how to write in the medium yourself. Then the barrier will be lower to writing your own prompts and establishing a personal practice.

But if you need a strong personal spaced repetition practice to write texts with Orbit, and you’re meant to develop a personal practice by reading texts written with Orbit, we’re in trouble. We’ll need to bootstrap the cycle. Maybe we can start with many authors who already have a strong personal practice… but there are so few such people that this seems unlikely.

I think a more plausible approach is to put a lot of effort into helping authors develop the skills needed for a strong personal practice. I expect this to be an easier task than helping the general population learn to write prompts well: authors already think deeply about representing ideas in words, and they care a lot about precision.

I had thought I’d start working with authors by writing resources about how to communicate well with prompts. But now I see that the first step to writing good texts with Orbit for others is being able to write good prompts for yourself. So I’m spending this month writing a small handbook on prompt writing. I’ve invited a number of authors to join me for a small workshop with authors to activate those concepts and to help me iterate on the handbook. If these go well, I may hold more workshops for wider audiences. In any case, I’ll publish the handbook publicly (and first for you all), assuming it ends up worthwhile enough to publish. I’m planning to use Orbit when writing the handbook, to help readers retain the key ideas and techniques.

There have been a number of articles about writing good spaced repetition prompts, but they all feel too structureless—loose bags of tips like “avoid lists.” I think it’s possible to understand in terms of more fundamental principles what makes some prompts work and others fail.

In particular, one model that’s helped me is to understand that when you write a spaced repetition prompt, you are giving your future self a recurring task. Prompt design is task design. If your goal is to build recall, the purpose of those tasks is retrieval practice. That’s a principle from cognitive psychology which I’ll unpack in more detail in the handbook, but in short, you must design tasks which, when enacted, require you to retrieve the knowledge in question.

The process feels surprisingly similar to translating text between languages. When translating a passage, you’re searching for words which, when read, light up a similar set of bulbs in readers’ minds to those which might have been activated by the original language. It’s not a rote operation. If the passage involves allusion, metaphor, or humor, you won’t translate literally; you’ll try to find words which recreate the experience of reading the original for a member of a foreign culture. When writing learning-oriented prompts, you’re performing something similar to language translation: which tasks, when performed, require lighting the bulbs which are activated when you have that idea “fully loaded” into your mind?

If the idea is fairly simple, it may be possible to directly conceive a task which reliably lights all those bulbs (and not other extraneous ones!). But when the idea has too many important facets, it’s hard to design a task which reliably stimulates all those elements. So you have to break the idea down into many tasks. This is where knowledge modeling comes into play. Given a piece of knowledge, how do we express its constituents and relationships? The answers differs for declarative, procedural, and conceptual knowledge, but there are consistent frameworks one can use for each.

————————

Housekeeping note: if you live in Europe, you can now choose to fund your membership payments in pounds or euros instead of dollars. My thanks as always for your kind support. I can’t believe this little lab community now has over 400 people in it!

View Post

Preview: a brief explanation of Orbit

I've been tackling a tough writing challenge: briefly introducing Orbit, explaining how it works, and sketching what it aspires to.

The context is that at least initially, people will first encounter Orbit as a small embedded widget in a web site. There's not much room for explanation there, so the widget has a "learn more" button for people interested in more details. At least initially, I'll treat that page as a general-purpose explainer, linkable in other contexts.

So here's my attempt at distilling Orbit into a few hundred words. It's probably still much too long. I'd love your feedback: https://app.withorbit.com (n.b. mobile layout is not yet implemented!)

View Post

The carrying capacity of a regular memory practice; deliberate practice and flow

Slow, compounding progress is a subtly powerful force. Regular weightlifters might not perceive their progress in every session, but as the weeks go by, they’ll find they can handle loads which would previously have flattened them. Richard Hamming makes a similar observation for intellectual efforts in The Art of Doing Science and Engineering:

I had worked with John Tukey for some years before I found he was essentially my age, so I went to our mutual boss and asked him, “How can anyone my age know as much as John Tukey does?” He leaned back, grinned, and said, “You would be surprised how much you would know if you had worked as hard as he has for as many years”. There was nothing for me to do but slink out of his office, which I did. I thought about the remark for some weeks and decided, while I could never work as hard as John did, I could do a lot better than I had been doing.
In a sense my boss was saying intellectual investment is like compound interest, the more you do the more you learn how to do, so the more you can do, etc. I do not know what compound interest rate to assign, but it must be well over 6%—one extra hour per day over a lifetime will much more than double the total output. The steady application of a bit more effort has a great total accumulation.

It’s a nice heuristic, but it’s not so easy to see how to carry this out. Most knowledge work activities don’t actually compound in this way quite so reliably. But as has been described elsewhere, memory systems do seem to compound in this way. Small amounts of marginal effort yield compounding returns in one’s ability to recall a given piece of knowledge.

Most people have a regular exercise practice, a regular email practice, a regular news-reading practice. Might a regular memory practice someday become a mundane entry on that list? If so, what might the impact be? And what must have occurred to bring this change about?

We can begin by looking at some simple numbers. Say that each person is willing to review for about ten minutes a day. Even using Quantum Country’s extremely simple algorithm, we could add about 14 new questions per day while remaining under that time limit (assuming an accuracy rate of 90% and an average of 6 seconds per question). This amounts to 5100 questions per year. For reference, Quantum Country contains about 200 questions. So your practice could accommodate ingesting one Quantum Country-sized text every two weeks. The time factor scales linearly, so you could double your carrying capacity by reviewing for twenty minutes instead of ten.

I’ve previously noted that I feel most researchers in this space over-index on optimization and fancy scheduling algorithms. But in this efficiency-focused context, we can see that those choices really do matter! For instance, if we’d implemented some more accurate scheduler which reduced the average error rate from 10% to 5%, my simple simulator suggests you could add 16 questions per day instead of 14. That would come out to 730 extra questions over the course of a year—almost four extra Quantum Country-sized books. A regular memory practice exhibits outsized returns to small increases in scheduler performance.

So what? What do we imagine the impact would be if people regularly durably ingested 6,000 atoms of knowledge each year? Well, if they used this newfound ability in the way most people start using memory systems—that is, for learning unimportant trivia—such a memory practice would mostly be wasting people’s time. These systems’ promise lies in using them not for meaningless factoids but for engaging more deeply with whatever matters most to you.

As Michael Nielsen and I wrote:

One of the ideas motivating Quantum Country is that memory systems aren’t just useful for simple declarative knowledge, such as vocabulary words and lists of capitals. In fact, memory systems can be extraordinarily helpful for mastering abstract, conceptual knowledge, the kind of knowledge required to learn subjects such as quantum mechanics and quantum computing. This is achieved in part through many detailed strategies for constructing cards capable of encoding this kind of understanding. But, more importantly, it’s possible because of the way the mnemonic medium embeds spaced repetition inside a narrative. That narrative embedding makes it possible for context and understanding to build in ways difficult in other memory systems. … In some sense, Quantum Country aims to expand the range of subjects users can comprehend at all. In that, it has very different aspirations to all prior memory systems.

Apart from the aspirations mentioned there, the mnemonic medium aspires to solve the biggest outstanding barrier to regular memory practices of the type I’m describing: the challenge of writing questions. It’s quite difficult to write questions which effectively encode abstract, conceptual knowledge. Even once one’s acquired the skills involved, it’s quite taxing. Reviewing 100 conceptual questions will take me a few minutes; writing 100 good conceptual questions will take me hours. What good is a system with a carrying capacity of 6,000 questions per year if you can only spare the time to write a few hundred?

The mnemonic medium solves this problem by supplying expert-authored questions. Of course, other SRS platforms allow users to share questions, but our experiences suggest this only works well for fairly simple declarative knowledge because the questions are highly atomized. Each little question must stand on its own, presentable in any order at any time. But that makes it difficult to use the questions to effectively communicate an idea which is itself highly structured and ordered. In practice, ideas generally depend on other ideas, and emotional salience requires relating those ideas to a larger whole. By contrast, as we noted in the excerpt above, the mnemonic medium gives narrative structure and meaningful context to these questions. At least with Quantum Country, this seems to have allowed us to create an effective set of expert-authored prompts without encountering the problems which shared questions usually suffer.

But I’m not excited about a regular memory practice which consists solely of ingesting and retaining knowledge others have written. My hope is that as more high-quality books are written in the mnemonic medium, these expert-written prompts will act as a scaffold which will help people develop fluency in writing their own. We’ve received some feedback from Quantum Country users along these lines, but of course it’s too early to tell.

Texts like Quantum Country might help people learn to write their own knowledge-encoding prompts, but there are lots of other interesting ways to use a regular memory practice. I use mine to keep my burgeoning theories top of mind, to study my past decisions, to provide myself with aesthetic kindling, to modify habits, and so on. Separately, Michael’s described how one can use these systems to “see through” complex ideas. These more agentic, purposeful practices are currently inaccessible to most memory system users, but I think authored texts can scaffold users here as well.

———————————

Merlin Mann came to prominence writing about “Inbox Zero” and practices for dealing with one’s to-do lists. But over time he grew disillusioned with that culture and with the framing of his own past suggestions. He observed: look, do you need a fancy digital to-do list or a lifehacking blog to make sure that you play video games? No? Maybe that’s the problem you need to fix with your to-do list.

Sure, that’s an overstatement when taken literally—akrasia is real etc. But it’s worth asking: when we say “regular memory practice,” are we imagining an “ought-to” or a “want-to” habit? That is, is it something like flossing, which you don’t really enjoy but which you do anyway for the long-term benefits? Or is it more like the habit of enjoying a glass of wine at the end of a hard day?

When spaced repetition is working most efficiently, it feels like you’re failing all the time: each session aspires to ask you questions you’ve almost forgotten the answer to. To put it another way, spaced repetition practice which optimizes for memorization efficiency should not produce flow. It’s closer to what K. Anders Ericsson calls deliberate practice: a planned training activity focused on producing performance just beyond your present abilities, with fast-paced feedback and reflection. Deliberate practice is generally not pleasant.

But when I engage with memory prompts which are more about visualizing past experiences, or remembering the key factors for an important decision, those aren’t really about “pushing my performance level.” Those activities are closer to catechism, and they can produce flow states.

Can these two activities coincide in the same practice? The emotional experience and the objectives are fairly different! But maybe that’s OK: people like unified inbox interfaces which combine their personal and work email accounts.

I’m not sure how to thread this needle. I’m worried that a discussion framed around efficiency and carrying capacities takes us in a dutiful direction. Perhaps it’s possible to reject the oughta/wanna dichotomy by designing activities which yield enduring benefits but which also appeal emotionally, even in the moment.

For inspiration, I’ll leave you with pianist Nahre Sol, who has has many videos on her YouTube channel depicting how she makes repeated passages musically interesting through composition and improvisation during her practice. (Unfortunately, Patreon doesn't let me embed YouTube videos inline; click here to see her video!)

View Post

The galaxy brain problem; speed-running UIs

I’ve been spending a lot of time these last few weeks trying to make Orbit explain itself.

In Quantum Country, the essays themselves are, in part, essays about the mnemonic medium. The first essay spends 1,000+ words introducing, motivating, and elaborating the system; later essays spend hundreds of words. But we don’t want every author who uses Orbit in a book or article to have to write a lengthy introduction to the medium. The system needs to introduce itself—at least in part.

This is quite a challenge! If we make people click through a tutorial or something before they can use the embedded prompts, many people will simply leave. But we can’t delay too much explanation, since we need people to sign up in order to send them their review sessions.

Exacerbating this problem is a challenge typical to products with new paradigms. I’ll call it the “galaxy brain problem."

See: the concise, easily-understood way to explain Orbit is: “effortlessly remember what you read” But that’s too reductive. It sells short what the medium can do.

A more elaborate explanation is: “helps you understand complex topics more deeply, efficiently, and reliably.” Unfortunately, this is also much less concrete and immediately informative than “effortlessly remember what you read”. And still, Orbit’s aims are wider than either of these summaries.

As I’ve described (e.g. in Timeful Texts), an even broader framing is that Orbit keeps you in contact with material over time, potentially in a programmable way. It’s a way to “bring ideas into your orbit.” That phrase does a much better job of capturing what Orbit’s about… but it’s also mostly meaningless to anyone without context.

Finally, the most general, galaxy-brain-ish instantiation is: “Orbit is a system for orchestrating dynamically-scheduled microtasks.” This captures the “programmable attention” sense, beyond the instantiation in media you might read. But that explanation is totally meaningless to basically anyone outside this audience.

So what do we do? Right now, my approach is to initially present Orbit as a way to remember what you read, but to “season” that presentation with phrases representing the broader aspiration as you get deeper into the system. Then follow-up emails and content at the end of the review session will slowly unpack more of the ideas.

So your first glimpse of Orbit might look like this. Note that the banner text is introducing a shallow framing of what this is about: “Quickly review what you just read.” And there’s coach text by the button along those lines too.

The banner and coach marks continue to unpack the interface as you proceed through the review. Then once you finish, you see this longer explanation, which starts to introduce the broader aspiration. It’s still too wordy, but I’m at the point where I struggle to remove more detail and keep it something which someone has been given enough motivation to make an account for yet another service.

When Orbit initially launches, I probably won’t have much more than this in place in terms of storytelling. That’s OK—I’ll add the extra pieces with time. This stuff is so hard!

————————

Have you seen /r/speedrun? This is a community of people who “speedrun” games. So they do things like try to beat Mario 64 as quickly as possible, exploiting bugs in the game to improve their time. But there are more exotic speedruns. Like speedrunning Windows 95 setup.

Michael and I noticed one behavior that makes Orbit’s onboarding design problem extra challenging: people speedrun interface text. Show me a modal dialog? I’m gonna find the “dismiss” button as fast as possible. Some tutorial screen I need to read? Click click click. We found that even people who are serious about the ideas in the interface do this—heck, we do this ourselves! Often, this behavior is justified: this kind of text is usually bad marketing copy, or text trying to paper over poor interface design. But even when we write interface text very carefully, showing great respect for the users’ time, people still don’t read it.

This is in stark contrast to essay text. People read essay text surprisingly carefully. We had people quote back to us whole sentences, verbatim, from the Quantum Country passages about the mnemonic medium. It’s not just that this prose is written by more capable authors: the same people wrote the interface text!

The author of the essay’s prose had worked hard to build credibility and trust with the reader—to show the reader that their time was being respected. One strategy we pursued was to trade on that trust in Quantum Country’s interface: we could use the same tone, the same diction, the same phrases in both essay prose and interface prose to unify the apparent authorship—and to perhaps get people to stop speedrunning the interface. This wasn’t a complete solution by any means, but it’s not a strategy we can use with Orbit.

I think my next best approach is to create essay-like prose contexts to talk about Orbit—contexts which aren’t in interfaces, so they’re not something to be speedrun. For instance, I might write a series of essays on various ideas about the medium and how it might be used, then send those out over time as a newsletter to Orbit users. Or maybe just let them circulate naturally around the web, via social media and the like.

I continue to make progress! We’re in a high-craft phase—the last 10% always is—but week over week things are feeling better and better.

View Post

“Skip”: exponential-backoff deferral mechanisms and fuzzy inboxes

If you search Google Scholar for research on spaced repetition systems, you’ll find a sea of papers focused on optimization. They’re tuning algorithms for less forgetting, more stabilization, better scheduling. To be clear, these are worthy aspirations. If we can improve response accuracy even just a couple percentage points, the exponential schedule magnifies our gain, substantially increasing the capacity of our system. But this extreme focus on efficiency misses the highest-order bit: almost no one actually uses these systems, so their practical efficiency rounds to something near zero. As Gwern framed this problem: “If you’re so good, why aren’t you rich?”

Happily, memory systems are already extremely efficient. Having experimented with these systems for a few years, I’ve come to believe that the critical thing to optimize is emotional connection to the review session and its contents, and conversely, to ruthlessly minimize elements which provoke a sigh. It’s a continuous process. These systems, left to their natural inclinations, naturally decay to produce dutiful sessions which feel disconnected from anything that matters to you.

Some techniques can help up front: don’t “stockpile” material for some future day; avoid writing prompts in response to a feeling of “should”; write about connections, consequences, implications—not just about facts; avoid tangential “orphan questions”; etc. But this isn’t enough. Review sessions need ongoing grooming to stay interesting. If you don’t add anything new for a while, your sessions will feel stale. If you often forget an answer, you must refactor the question or risk future eye-rolls.

For me, the most important type of maintenance concerns my shifting interests. I might stumble on a fascinating paper and write a few dozen prompts to help me internalize the details. Now fast forward six months. In the happy case, I’m thrilled when these questions recur because they reconnect me to the ideas of this fascinating paper, which now appear richer through the lens of my intervening experiences. Much of the time, I simply feel grateful to still remember the details. But sometimes, I no longer find the paper very interesting; the questions just feel like a chore.

If these review sessions are to be a long-term habit, it’s important not to spend time with prompts about things that don’t matter to you. Without culling, sessions stop feeling valuable and start feeling like a chore. Long before sessions feel like a chore, unimportant questions will dull your focus on their neighbors.

The problem is that culling is hard. It’s not functionally hard: the app has a simple delete button. It’s emotionally hard. Much of the value of computerized memory systems is that they save you from having to make endless tiny decisions about what to do and when. I see memory systems as an example of a more general class of “programmable attention” systems. Making a decision to delete a prompt is a relatively weighty decision. It might be nominally undoable (e.g. from a trash bin) but in practical terms, you know that you're unlikely to restore a deleted question. The irreversibility of the decision feels misaligned with the slight emotional aversion the prompt produced. So most people (myself included) will tend not to delete as much material as they should, which means their sessions will be less emotionally connected than they should be.

I believe this problem arises not just in memory systems but in email inboxes, reading lists, overflowing browser tabs, to-do lists, and so on. I’ve been exploring a mechanic for a fuzzier, less-destructive alternative to “delete” operations.

Imagine that this is what you see when you’re reviewing a prompt:

Note the button in the bottom-left: “Skip.” We could also call it “Not now,” “Later,” “Defer,” or simply: “Nah.” When we press it, we simply move on to the next prompt. The current prompt will reappear in, say, a few weeks. Maybe then we’ll be more interested; if we decide to answer at that point, the review schedule will continue as normal. But if we hit “Skip” again, we’ll defer the prompt for a few months. At which point if we skip it again, we’ll defer it for a year or more. So after a few consecutive “skip” operations, we’ve effectively “archived” this prompt, but we never had to make the destructive, non-contiguous decision to remove it. Instead, we just respond to our emotional reaction in each moment. If a prompt elicits “nah,” this mechanism gives that emotion a safe outlet. It’s “fuzzy delete.”

I believe this mechanism can be usefully applied to anything shaped like an inbox. I’ve experimented with it for reading and writing lists, and I’d like to expand that. Here’s a springboard to more general working notes on the “fuzzy inbox” topic.

This kind of mechanism is important not just in review sessions but in the reading experience of the mnemonic medium. Quantum Country assumes a goal of completionism: readers are expected to answer every prompt in the essay. That’s certainly appropriate for some texts, but in more casual texts, readers may have a wider range of prior knowledge and interests. The completionism requirement also pushes authors around: it restrains them from writing prompts which many readers would likely find interesting, but which feel inappropriate to make mandatory.

But if we handled that by asking readers to pick and choose which questions they’d “bring into their Orbit,” we’d create much the same emotional friction as with deletion. It’s too weighty a decision, too early. Many people would feel hesitant to drop questions they don’t really care about. In many cases, since they’re still reading about the topic, they may not even know what’s important yet! So this same fuzzy mechanism seems useful in the context of the initial reading experience.

One challenge for this mechanism is that the feeling of aversion which accompanies material you don’t care about is somewhat similar to the feeling of aversion which accompanies challenging material. We don’t want to encourage people to skip questions for the latter reason: desirable difficulties are important to learning! I’m not sure how to address this conflict within the interface: at the moment, the point (like many others) will have to be communicated through culture and the halo of “canonical literature” around the tool. This challenge is more fundamental than it may seem. Memory systems achieve their remarkable efficiency by scheduling prompts for when they should feel hard to answer. This is probably a major barrier to adoption: review sessions focus on the material you have the most trouble remembering, which both makes them somewhat unpleasant and also makes it appear that the system’s not working!

Another, more parochial challenge, is that I’m not sure how to convey the state of a skipped question in the interface. If an article has 100 questions and you skip half of them, do we talk about your memory of the article in terms of the remaining half? Graphically, the rays of the starburst depict interval period. If I skip a prompt the first time I see it, does its ray jump to 1 month? This seems to convey a false sense of “progress”… but maybe it’s not false, since you don’t really intend to review that prompt. On the other hand, if we leave the ray at the minimum length, it’ll feel unpleasantly “left behind” as you review the rest of the article.

***

I think I was thirteen when I first read this quote on WikiWikiWeb:

The first ninety percent of the task takes ninety percent of the time, and the last ten percent takes the other ninety percent.

Why do I ever share estimates of project timelines? You’d think I’d have learned by now.

Anyway.

I’m deep in the second ninety percent of Orbit’s initial release. It’s a grab bag. I have replaced one binary serialization protocol for a different binary serialization protocol. I have patched a platform text renderer whose hinting was broken. I have gone cross-eyed staring at endless iterations of approaches to styles for embedded prompts.

We’re in the home stretch. Zeno keeps stretching the home stretch. More soon. In the meantime, I've created an Orbit account on Twitter which has been posting little visual flotsam that doesn't merit a full Patreon post; feel free to follow along. The good stuff will eventually end up here. Speaking of which, here's a fun little study I did recently on the starburst:

I’m giving a talk about Quantum Country at a small private conference this week. The occasion has pushed me to return to analysis of Quantum Country readers. One interesting finding to report: in How can we develop transformative tools for thought, Michael and I showed this pleasingly-exponential graph, which shows that readers reach around 54 days of demonstrated retention after six repetitions.

In the second half of 2019, we made a number of improvements to the platform, which have cumulatively resulted in compressing this curve by more than two repetitions. That is, readers now achieve a noticeably higher degree of demonstrated retention after four repetitions than readers did last year after six. Putting aside for the moment my hesitations about focusing on efficiency, there really does appear to be quite a lot of low-hanging fruit here.

As always, thank you for funding my burgeoning research grant. Relatedly, if you didn't see it, you might enjoy this little thread on Twitter expanding on some of the funding model ideas I brought up in the last post.

View Post

Celebrating a significant Patreon milestone; thoughts on crowdfunding tools for thought

We've now crowdfunded two thirds of a grad student-level grant for research on tools for thought.

View Post

Early access to new essay: "Timeful Texts"

Click here to read "Timeful Texts."

How might one create a medium which does the job of a book, but which escapes a book’s shackled sense of time? How might one create timeful texts—texts with affordances extending the authored experience over weeks and months, texts which continue the conversation with the reader as they slowly integrate those ideas into their lives?

These last few months, I've been writing elliptically about various ways we might expand our conception of "spaced repetition systems" beyond memory alone. This essay, joint with Michael Nielsen, elaborates on one idea: expanding an author's expression in time.

For this piece, we had the great pleasure of collaborating with Maggie Appleton, who thoughtfully illustrated the ideas presented in the essay.

Enjoy! (If you spot any issues, please do let me know so I can correct before broader release!)

Click here to read "Timeful Texts."

View Post

A nascent art direction for Orbit

(You may have received a duplicate notification for this post: Patreon lost all the images in the last one, so I had to recreate it! Bluh…)

So far, I’ve been using Quantum Country’s design language for Orbit, just to help me focus on all the architectural work I’ve had to do. That was a useful constraint for a while, but Orbit’s almost ready for its first publications, so now it’s time to switch gears: how might Orbit’s visuals reinforce its ideas?

Orbit has two enormous visual design challenges.

First: Orbit initially appears as an embedded “guest” inside some other publication. As a guest, it must behave itself! It shouldn’t clash with the host’s styling choices; it shouldn’t overly distract from the surrounding content. On Quantum Country, we just used the same design language for the interactive prompts and for the book itself. You collect prompts on Quantum Country, then review them on Quantum Country. But the model for Orbit is that you collect prompts on Quantum Country (and other places), then you review them all, in aggregate, in Orbit. This is much trickier! The reader navigated to Our World in Data (or whatever), but then this weird “Orbit” thing in the article is asking them to sign up, and later they’re getting an email from Orbit asking them to do stuff. So Orbit needs to introduce itself as a distinct “partner” entity while you’re reading the host publication. Orbit’s a research platform… but given these constraints, it needs to have a “brand,” to feel something like a “product.”

The other challenge is more of an opportunity: how might Orbit use visual language to transcend the Spockian, educationalist tendencies of tools in this space? How might we emphasize “engaging more deeply with what matters to you,” not “how much trivia can I memorize”? Spaced repetition memory systems make memory a choice, but retention is not valuable for its own sake. Memory is valuable insofar as it helps people do whatever gives their lives meaning. Unfortunately, the dominant culture around memory systems leans heavily toward memory for its own sake. You’ll see people who try to remember every detail of everything they read (“just in case”), but without seriously engaging with the texts in any other way. One gets the sense that they’re terrified of ever “losing” anything; you see the same obsession drive many “note-taking” enthusiasts. I don’t want to serve or expand that culture. But early adopters for my work will often come from that culture. This is a real hazard.

My visual design chops are not up to these challenges, so I was thrilled to collaborate with the wonderful Nio Ono for a few weeks in June on art direction for Orbit. (My thanks to Taylor Rogalski for the introduction!) I should be very clear that Nio is the hero behind the work you’ll see here: I brought a compass, but she drew the map! I’ll need to walk these trails myself in the coming months, so I’m writing this post as an exercise—to more deeply understand and interpret these visual ideas by explaining them to others.

Before we dive in: I realize that many of you have never interacted with nascent design work! Let me share a few words about how to engage with this material. The purpose of work in this phase is to open doors, to find some fingerholds in the problem space, to create momentum. Since this is a visual exercise, it’s particularly about exploring how certain aesthetic approaches feel. You’ll see imagery that looks like screenshots of interfaces, and it’ll be tempting to interpret them concretely, as interfaces. They are, in one sense… but they’re really about trying to see visual language, in context. Squint your eyes a bit; pay less attention to “interface” and more attention to vibe. This exercise is about exploring the textured interplay of symbols, type, grid, color, etc. There’s an annealing process between these visual primitives and the interface they inhabit. Before this sprint, I’d partially defined Orbit’s interface, but I couldn’t go further without developing the visual language. Likewise, we’ve now spent some time on the visual language, but we can’t go much further without taking some time to push the interface structure forward using these ideas. The design will go around in circles, teetering between form and function, before things finally settle. 

Ethos

What’s Orbit’s “personality”? What ethos is it trying to communicate? To give us a rough compass, we started with this aspirational trio of principles:

earnestness, ardor, curiosity — not: duty, instrumentalism, grinding

Orbit helps you deepen your relationship with whatever you care about most. The activities are largely cognitive, but your relationship to the material is emotional. It’s for ideas and imagery that you could talk about all night. Orbit’s for the stuff which gives your life meaning. 

Orbit is not about eating your broccoli. It’s not for things you think you “should” be engaged with, things which require you to summon your willpower. It’s not “educational” in tone. Orbit results in learning, but the goal isn’t to learn: it’s to be able to do something that brings you meaning, in the world.

wu wei, effortlessness, “trust the process” — not: inboxes, graphs, knobs

Putting something in Orbit is like gardening. You plant the seeds; you trust that they will grow and mature. You’re not terribly concerned about the specifics of when or how often you’re going to see a given item. You feel a kind of lazy confidence that the timing will be vaguely reasonable; that over time you’ll internalize everything you add; that over time items will move to longer orbits so that they don’t overstay their welcome. Your Orbit never feels “too full.”

Your daily Orbit practice is like your daily meditation practice. You’re never “done,” exactly, and you don’t have a progress graph going up and to the right. Like your meditation practice, you don’t and can’t ask for the specific outcomes of a specific exercise. But you show up, and you follow the breath, and over time, you see more clearly.

Orbit is not an inbox which demands grooming. It’s not a red badge with a three-digit number. You don’t “dial in” your Orbit items. You don’t obsessively track the “progress” of any one item (though you do feel some broader arc of “progress”).

diligence, seriousness, agency — not: complacence, passivity, deference

Meditation involves effortlessness at one scale but diligence at another. You show up for practice every day and follow the instructions, trusting that the activities will help you become wiser without understanding the exact mechanism. But that doesn’t mean that meditation is easy! If you’re not engaging seriously, fifteen minutes can slip by without you clearly observing a single breath.

Likewise, Orbit involves both effortlessness and diligence. You show up to your daily session without specific expectations or demands, but you engage attentively with what you find there. Orbit is ultimately a tool for serious people. It’s about going deeper, understanding more clearly; it’s about dissatisfaction with prior methods, with just hoping you’ll remember to think about something.

Authors sculpt the Orbit experience by providing prompts, but Orbit is a tool you must wield for yourself. It expects and rewards your own agency—in writing your own prompts, in remixing those given to you, in ditching ones you don’t care about, in molding the system to your values.

Visual themes and inspiration

Those principles are in tension! Seriousness vs. effortlessness; curiosity vs. diligence; earnestness vs. “trust the process.” But there’s a real thread running through them. Here are a few highlights from a lengthy set of themes and inspirations.

Gravity is both effortless and inevitable. It’s serious yet invisible. Visually, this connotes heavy and contrasting weights, perhaps stolid kinetics.

Poster by Joseph Müller-Brockmann, 1958


Celestial mechanics are appropriately wondrous, “trust the process,” precise. The name “Orbit” naturally suggests this direction, of course.

Poster by Laura Csocsán for Tempus Futurum


Contrasting geometry echoes the theme of heavenly bodies, injects some notes of curiosity in the context of structured rigidity

Kenzo Hara for hy-phen.jp


Analogous color suggests the nostalgic ardor of retrofuturism.

Poster by Bo Lundberg

A core identifying graphic: the starburst

The Voyager and Pioneer probes carry a metal plate with this starburst, designed by astronomer Frank Drake:

It’s a map of the solar system relative to nearby pulsars. It’s quite striking visually, but it also encodes a great deal of information: the lengths represent relative distance; the hashes encode the pulsars’ periods.

This figure (and others from—ironically—astrology!) inspired Orbit’s central identity: an information-rich starburst figure. There’s no fixed “logo”: this shape changes each time it appears, providing data visualization and navigation affordances. It’s a reflection of each user’s individual actions over time, but it’s distinctive and consistent enough as a motif to act as a recognizable “brand” symbol.

The tapering strokes create an optical illusion of a haloed star in the negative center. Weighty lines surrounding empty space echo the effortless action of “wu wei.”

Functionally, the starburst visualizes of a collection of prompts—maybe a set you’re reviewing that day, or a given book’s prompts, or a folder of prompts you’ve made yourself. Each prompt is a stroke; each line’s length represents a prompt’s intervals. A denser starburst represents a larger collection of prompts.

In the context of a review session, the starburst becomes a kinetic radial progress indicator element, ticking clockwise with each completed prompt, using color to indicate completed items:

When visualizing large collections, we can emphasize the line “tips” rather than the strokes (left). In dense list contexts, the starburst can “unwrap” into a vertical stack of lines, each still representing one prompt (right).

Type

We explored a variety of geometric sans serifs but ultimately fell in love with Dr, an oddball from Production Type. It’s all about contrasts: circles and rectangles; heavy lines and negative space; super-contrasting proportions. It’s distinctive but flexible enough to work with many host publication styles. We can scale the weights up and down with the type size to maintain a consistent stroke:

Notice also the grid structure: there’s a disciplined rhythm to this layout. A sense of “ruled” pages carries the diligence and effortless action themes.

In the images above, we see some semblance of interface begin to take shape, but as I mentioned earlier, these aren’t trying to be “screenshots”: they’re here to explore proportion and hierarchy.

For a logotype, we’d like something which echoes the bold geometry of Dr, but with enough distinctive details to stand on its own. It shouldn’t depend on the starburst for identifiability. This is the current favorite, a riff Nio drew on Herbus:

The dominant “o” keeps the focus on circles and the celestial theme; the “r-b” ligature continues that circular motion. The rectilinear “i-t” composition contrasts distinctively with the curved forms.

Iconography

It’s hard to get far with icon design when Orbit’s interface is this nascent: we don’t yet understand what all the core actions are and how they might be arranged. But Nio observed that there’s an interesting opportunity for icons to reflect the geometric type, the tapering strokes of the starburst, and the emphasis on negative space. This collage shows how “add”, “next,” “close,” and other common actions might consistently carry the visual language we’ve already defined.

Orbit does have one unusual core action we do understand well: revealing a prompt’s answer. That action presents an opportunity to do something unique and memorable—see the “eye” at right above.

Color

Reviewing prompts in Anki can feel a bit like driving on miles of straight, unchanging roads through endless, flat fields. Color offers an opportunity to introduce some dynamism. Since the rest of our visual elements have been fairly austere and disciplined, color is also a chance to underscore that first theme—passion, excitement, curiosity.

Nio constructed a color wheel which rewards surprising, apparently-anarchic combinations. (It also, somehow, maintains WCAG-accessible contrast in most useful combinations?!?)

In this rendering, the idea is that the interface’s color scheme changes throughout the day, reinforcing the ritual notion of the review session. Perhaps if you open Orbit in the morning you’ll see orange tones, in the evening rich purples. The starburst color feels unpredictable, but it follows a simple structure: backgrounds are drawn from the inner ring of the color wheel; the starburst color is the outer spot one notch counter-clockwise (a close analogous hue).

Another exciting potential use of color is to represent source contexts. That is: Quantum Country is purple, so maybe whenever prompts from Quantum Country come up in your review session, you see a splash of purple between the blue of Our World in Data and the yellow of a Tufte book. Spaced repetition prompts can feel quite atomized and detached from the original thing you actually cared about; this approach to color can reinforce the original emotional context.

The accents here play with a variety of hue angles, from analogous to quaternary to complementary. Is it too much? Maybe! This is what I was alluding to earlier: you really can’t tell in a static presentation like this. Bold solutions often feel much too aggressive “at rest” but exciting when “in motion.” If you tone graphic choices down to feel comfortable “at rest,” you’ll end up with something anodyne. I’ll have to actually live with it for a while to know.

The images above are depicting the in-app Orbit review experience—the place you go to review all the prompts you’ve collected from a variety of places. But as I mentioned at the start, Orbit must “behave itself” when embedded within host publications. We almost certainly can’t get away with these aggressive colors in that context, and we probably shouldn’t even try: review areas shouldn’t drag your gaze away from the surrounding prose.

But that’s OK. The same approach can be applied in a much more restrained fashion. Perhaps when viewed within Quantum Country, the same colors are used with restraint, but enough to create a connection with bolder use in the Orbit app.


Note that we emphasize the complementary green rather than the host purple, since our wheel’s purple may not quite match the host’s brand color. The background wash is a desaturated triadic relative.

I expect authors will ultimately need some presentation knobs for the embedded context. A design which feels balanced in one site will feel overwhelming in another. Will people be satisfied with choosing from Orbit’s color wheel, rather than using their precise brand color? Probably a monochromatic option should be available, particularly for places where the “brand color” is monochrome, like gwern.net.

My sprint with Nio is over, so the next steps are on my shoulders. I’ll now bring this visual language to life. I’ll use it for my daily SRS practice, and I’ll prototype how it behaves in various host publication contexts. I’m sure lots of things will break, lots of new questions will be raised, etc etc.

The platform’s making lots of progress week over week. I’d targeted the end of July for the first articles with Orbit prompts being publishable, and I still think that’s not wildly wrong. I’m looking forward to having more bandwidth to work with author-collaborators.

(Incidentally: “Orbit prompts”… are they “Bits”? Premature, maybe.)

In a real sense, if you enjoyed this work, you have yourselves to thank for it: I’m still operating in the red, but your collective patronage has slowed the burn enough that I can intermittently engage professionals like Nio. In fact, an upcoming post will share another exciting project with a different visual collaborator.

View Post

Demonstrating a "personal mnemonic medium"

I've spent lots of time this past year thinking about how to use writing to develop ideas over time. Writing is an important part of my research process; those couple hours each morning are often "where the real thinking happens."

Of course, I also use spaced repetition systems to engage more deeply with ideas. But at least until late last year, the two systems were unpleasantly divorced. Existing spaced repetition systems treat prompts as write-once, atomic entities—not smaller parts of a larger whole, meant to evolve over time. So I've been iterating on a system connecting the two, creating a sort of "personal mnemonic medium."

In this system, the walls between SRS and static notes are removed, and one can fluidly write both simultaneously. The mental model is not that we "import" prompts from notes but rather that the prompts are in the notes, and the SRS displays them by-reference.

Writing prompts while writing notes has an interesting effect on the way I think when writing notes: framing ideas as questions (particularly questions which can work out of context) often requires more incisive understanding. I've only been writing in this way for about half a year, so I barely understand this medium—but it's certainly fascinating.

This video demonstrates my latest iteration, in which Orbit scans on-disk files for embedded prompts and tracks changes over time. Incidentally, though I don't demonstrate it, once the desktop Orbit has seen those prompts, they'll be available in the mobile and web experiences as well.

Note that like last time, the Orbit interface itself is very much a placeholder! I've been focused on infrastructure. One important missing element is provenance: I've found it's very important with this type of prompt to display the title (and a link) to the note it came from. That's extracted but not displayed in this demo. Art direction and more careful information architecture to come…

View Post

Bringing ideas into your Orbit

As I’ve mentioned in recent posts here, I’ve spent these last few months building infrastructure which I hope will help me (and others!) explore a wider set of ideas around systems like the mnemonic medium. In some real sense, it feels like I’m building my research lab!

One strange thing about building a system like this is that I have only a hazy picture of what I’m actually building. I build the thing to see, so that I can build the thing, and so on. My conception of the core structure is constantly evolving, and what follows is all extremely speculative! But I thought you all might enjoy hearing about how I’m thinking about it right now.

Computer operating systems have come with a predictable set of personal information management tools for decades: an address book, a calendar, an e-mail client, some basic note-taking function, files and folders, etc. These are structured differently from siloed “apps,” which typically aim to subsume some workflow from start to finish. This basic OS software is more general-purpose, each both a tool and a service, connected throughout the OS via API-powered integrations. You add an event to your calendar from an email, autocomplete a contact’s name within a chat app, save and open files to the same folder from many apps, and so on. More recently, we’ve come to expect these functions to seamlessly sync everywhere and to communicate with various web services.

What if there were an “OS-level” spaced repetition system (SRS)? What if, rather than living “inside an app’s shoebox”, as in Anki and other existing tools, prompts were framed more like files in folders—readable and writable throughout the system and by other services across the web?

Web articles could surface interleaved prompts, written by the author as in the mnemonic medium or perhaps by readers as on Genius / Hypothesis. You’d fluidly import these prompts as you read, just as your browser forms a history as you read.

Your PDF and e-book reader’s annotations could naturally be surfaced in this centralized SRS, rather than remaining siloed in some inaccessible sidebar.

Just as modern operating systems may create tentative calendar events or contacts based on chat messages or emails, the system may create tentative SRS prompts based on links you’ve bookmarked or phrases you’ve searched for repeatedly.

When you jot notes in your daily meetings, you could tag key insights with a special tag to surface them to this system—perhaps a future word processor’s formatting bar would include buttons for bold, italic, underline… and “to be reviewed.” I’ve built this kind of “personal mnemonic medium” for myself, and I’ve been using it for the last four months. I haven’t spoken about it much yet because it’s very odd, and I don’t understand it. But it’s fascinating, and—I think—promising.

But all these ideas become much more interesting once you think of SRS as useful for much more than memorization.

With the final chapter of Quantum Country, we’ve experimented with using spaced repetition prompts to help readers apply what they’ve learned, in addition to remembering what they’ve learned. In our personal Anki practice, Michael Nielsen and I have been experimenting for several years with using these interactions to prompt synthesis and reflection. Over the last year, I’ve been experimenting with using these interactions to support incremental creative work and for my reading queue.

For example, you can raise your smart watch today and say: “remind me to write about my idea that SRS could be framed as an OS-level service.” That’s a one-time reminder. But with this OS-level SRS, you could raise your watch and say: “remember to reflect: what novel contexts might benefit from an OS-level SRS service?” That would create not a one-time “to-do” but a prompt for repeated reflection over time. 

This all calls for a broader perspective on spaced repetition systems. The typical image is of flashcards, used to memorize things, marked as correct or incorrect. But a more general framing is: with an SRS, you can arrange to repeatedly engage with some task over time, and the timeline can evolve with your actions. Some instantiations of this…

  • memory: repeating recall questions; intervals expand/contract exponentially with success/failure
  • procedural fluency: repeating short exercises with many variants; intervals expand/contract exponentially with success/failure
  • writing prompts: repeating questions, inklings, and stems accumulate written responses over months; mark prompts as “fruitful” or “not fruitful” to expand/contract intervals
  • habit formation: repeating short reflection prompts, possibly changing according to sequence, applied to your daily experiences; intervals expand by default but can be slowed
  • reading queue: a queue of PDFs, tabs, books constantly revolves; if you ignore an item for a while, it’ll depart for a week, then a month, etc; say “I’ve read enough of that for now” and it’ll come back in a few days; say “not now” to an item, and it’ll depart for a few weeks, then a few months, etc…
  • email inboxes, task queues, personal communication, etc

Central to what I’m imagining here is a daily practice habit somewhat akin to meditation: you open this thing up and engage with whatever microtasks it presents. You’ll work on your memory, maybe do some self-authorship with reflection questions, do some quick physics problems, some quick writing, etc. Then ten minutes later, the train arrives, you board, and that’s it for the day. The next day’s different.

In summary, this system is about giving you a way to bring ideas into your orbit. When something seems interesting, you can tie a string to it and throw it up in a lazy arc. It’ll swing back around at some point, but you’re not terribly concerned with when. You’ll give its string more or less slack over time. Floating above your head, then, is an ever-shifting constellation of inklings, facts, questions, prompts, obsessions. Every day you stare up at the slice of sky above you and respond to what’s there.

So at least for now, I’m calling this system Orbit.

————————

Your thoughts and comments are very welcome! As always, I’d like to thank you all for your generous support of my work. 

I’d also like to mark a milestone: as of a couple weeks ago, your collective support now covers half of my mortgage! A year ago I honestly expected Patreon to be something of a token support mechanism, but thanks to you all, this now makes a meaningful difference in my life. Of course, I’m still in the red here, so this is not on its own a sustainable source of support, but it does help push out the timeframe for seeking other funding sources—which in turn, helps me focus on the actual work with minimal distortions. Thank you for that.

View Post

Big milestone today: generic embeddable prompts

Hello, kind patrons! I mentioned in the last post that I was working on a next iteration of the system behind Quantum Country that can be embedded anywhere on the web: blog posts, e-books, Twitter threads, academic papers, etc. Today I hit a milestone I've been working toward for a couple months!

In the video, I've doctored collaborator Michael Nielsen's great notes on direct air capture to include prompts like the ones on Quantum Country. It looks like no big deal—after all Quantum Country's been doing this all year! But there are two enormous differences.

The first big difference is that Quantum Country's a package deal. The essays are built in a special way and are deeply integrated into the review system. But the notes I show in the video are just a plain web page that could be anywhere on the web: they don't have to be hosted on our special infrastructure or built in a special way. A simple HTML tag is all it takes to add the prompts seen in the demo.

The second big difference is that the video shows dedicated review apps. To review prompts from Quantum Country's essays, you can go to a "review page" on Quantum Country, but to review prompts from dozens of web pages, we'll need to establish a separate place that aggregates all those questions. What you see on the right are native macOS and iOS review apps, displaying a review session based on the prompts collected on the web page at left. But they'd also display prompts from anything else you read. There'll be a web-based review option, too, not shown here.

The interface shown in this video is basically a placeholder: I've been focused on the platform infrastructure. Lots of design work awaits. But I've imported my Anki library into this system, and for the last couple weeks, I've used these apps instead of Anki for my personal spaced repetition practice. That practice helps me feel all the rough edges viscerally and pushes me to make the experience great. But I'm already enjoying the automatic syncing and the modern native (not Electron-based!) macOS app.

Anticipating the obvious question: no, this stuff isn't ready for others to use yet. But that's what the next few months are about. You all will get access long before this is all public. Thank you for your ongoing support!

—Andy

View Post

What's next?

Hello, kind supporters! Andy here. Now that the final chapter of Quantum Country’s published, we wanted to share what we’re up to next.

First off: an important update to the Patreon. After years of scrappy independent projects, Michael’s decided it’s (perhaps past) time to pursue stable employment. We’ll still be talking about these projects all the time, but they’ll be a side-project for him, rather than a full-time gig. So starting next month, this becomes a solo Patreon. I’ll still be pursuing the same ideas—just with updated tactics, sans Michael (more on that below). Please feel zero guilt in modifying or canceling your support accordingly!

Now then: what’s next?

First off, Quantum Country’s still a laboratory. We’ve been running a long-term randomized controlled efficacy trial, and that data will be ready for analysis in the coming weeks. I’m also understanding the impact of application prompts, both through data analysis and through feedback from readers.

But stepping back, now that Quantum Country’s finished, it’s time to push on the limits of the mnemonic medium. A few of my top questions:

  • Varying subject matter: How does the medium’s performance (real and perceived) vary when applied to technical primers like Quantum Country, but on other topics?
  • Varying genres: How might the medium’s mechanisms be adapted to other genres (e.g. persuasive writing, informal discussion, reference material, academic papers)? How might it accommodate the wider range of reader motivations in those contexts?
  • So what?: What is the big-picture impact of remembering what one reads, relative to what’s important in readers’ lives?
  • Towards virtuosic use: What are the most important attributes of good questions? What would virtuoso or even canonical uses of the mnemonic medium look like? How might we effectively coach authors?

We can explore some of those questions in the context of Quantum Country, but it’s clear that we’ll soon need more mnemonic texts to better understand the medium. So I’m rebuilding the mnemonic medium as a platform that can be embedded into any web page. 

Now, there’s an important path dependence in creating new mnemonic texts: the medium requires readers to adopt a new habit. Readers are (somewhat) willing to complete practice sessions for Quantum Country, since it’s a substantial text. But it wouldn’t make sense to onboard by regularly reviewing ten questions from a short essay—at least not until you’re regularly encountering lots of short material you might want to study. For now, readers’ first exposure to the medium must be a substantial text.

So in the next six months, I’ll be pursuing two parallel streams for creating new material in the mnemonic medium:

  1. More mnemonic books. The goal of this path is to produce more texts like Quantum Country, substantial and meaningful enough to onboard a reader on their own. I’m reaching out to the creators of some of the best existing web books to see if they’d like to integrate the mnemonic medium into their existing texts. Some of these books are licensed under Creative Commons, so I may adapt one or more myself, either as a demonstration or for wide use. I’m also talking with some favorite authors about writing new work or adapting existing work for the mnemonic medium. (Know someone who I should talk to? Please holler!)
  2. Free-ranging experiments via Anki (and QC) users. Books are heavy. They take time to write, and they’re hard to change. I’d like to be able to iterate on more nascent ideas in smaller contexts—like a Twitter thread! The trouble is that those contexts are too small to onboard users; they’re only usable by people with a pre-existing memory practice. So I’ll steal them from Anki. Lots of existing Anki users find the app repellant and don’t care about all the knobs and levers. They’d be eager to switch to a better-designed platform, particularly if it’s easy to import their libraries. This pool of early adopters (along with Quantum Country’s readers) can then enjoy smaller-scale experiments with the new platform—and perhaps create some of their own.

I’ll also be exploring more fundamental changes to the medium:

  • Different types of understanding: Continuing with the application prompts introduced in Quantum mechanics distilled, how might we help readers build the practical ability to apply what they’ve read? How might we support readers in synthesizing conceptual understanding?
  • Connecting to meaning: The review activities are abstract, isolated from a context in which that knowledge actually matters. How might we situate practice within reviewers’ sources of meaning, whether social, professional, or creative? Can we ditch “practice” altogether by constructing meaningful contexts which would naturally amount to review?
  • Taking emotion seriously: The mnemonic medium is in many ways aseptic, overly cognitive. What mediums might we invent if we began with authored experiences focused on deep emotional connection (like the best movies and video games), and blended them with methods for building deeper understanding? Mnemonic video is one early idea.
  • Timeful reading: The review sessions don’t just help you remember material. They also extend your relationship with the material beyond the moment of reading, across weeks and months. How might we take advantage of this property, e.g. to author texts which unfold over time, or prompts which build on others you’ve already mastered?
  • Programmable attention: One of the most powerful elements of the review sessions is that it orchestrates your repeated attention over time across hundreds of tiny tasks, too many to manage by hand. How else might that superpower support creative work—reading, thinking, expressing, problem-solving?

If you follow me on Twitter, you’ve perhaps seen that I’m also experimenting with an unusual note-taking system. That’s something of a side project right now, but it keeps knocking on the door of the primary project I’ve been describing. At some point I may let it in.

I’ve decided to continue bootstrapping all this for now, thanks in part to your kind support. Perhaps I’ll eventually apply for grants. Perhaps some parts of this work will become commercial (in which case, of course, you will all be given free passes). Figuring that out is not a high priority right now.

Your comments and questions to all this are very welcome! We’re both deeply grateful for your kind support over the last year.

—Andy (for Andy and Michael)

View Post