
A casual tools-for-thought / weird-interfaces / future-of-personal-computing unconference to kick off the new year!
When: Sunday, Jan 29; 9AM-12PM PST [add via GCal]
Where: Gather, a spatial video chat platform. Click here to join on Jan 29
Who: patrons, fellowship applicants, some others I'll invite from this scene
RSVP: Please let me know here if you think you'd like to join, so that I can pay Gather to provision the right amount of space. Not a binding commitment.
If you've never attended an unconference before, the big idea is that the attendees create the schedule. This event will be as good as you all make it. You are invited to host a discussion circle about a problem you've been thinking about, or to host a paper reading group, or to give a talk, or offer a hands-on workshop, or open a virtual help desk—whatever you like.
To host a session: sign up for one of the time slots at one of the tables on the unconference schedule here. Just give a short description of what you have in mind, and include your name.
No pressure! Casual, birds-of-a-feature type discussions with no real preparation are often the most interesting unconference sessions.
(If there's no more room for sessions, ping me and I'll open up some more…)
See you soon!
2023-01-20 18:47:21 +0000 UTC
View Post
Reflecting on my work with the mnemonic medium, I notice that I’m making good progress on questions of breadth and scale, but not nearly enough along the much more important axes of depth and impact. That’s not for lack of trying—my attempts to explore the latter keep fizzling. But I think I understand better now why that is, and at least partially what to do about it. In the new year, I’d like to aggressively reorient my work towards traction on this central question: how to make a radically more powerful memory system, one which makes it easy to build a deep, flexible, and durable understanding of complex conceptual material?
In this letter, I’ll lay out the situation and describe my best thinking on what to do about it. Beyond helping myself understand, my aim here is to create a good context for conversations with others who can help me improve my plans. I’m still quite confused about many of the points I discuss here, so this should be read as a provisional, interim discussion—but hopefully a useful one on the way to a better path.
Why I’m excited about memory systems
Since this is a letter about my progress being off-target, it’s worth carefully re-articulating the aspiration that drives me here.
I’ve spent almost a decade exploring how we might use computers (and specifically dynamic mediums) to create exceptionally high-growth environments. I’ve done projects around discovery learning in microworlds, computer-supported collaborative learning, mastery learning, ML-driven personalized learning, and so on. But my personal experiences with memory systems have been so vivid that all the other mechanisms I spent years investigating seem laughably weak by comparison. Memory systems conjure a powerful sense of vertigo for me. I feel like a caveman who’s stumbled on a box of simple machines. I might not understand the principles of mechanical advantage, but even primitive fumbling produces startling results!
Today’s memory systems are niche tools, used mostly for memorizing vocabulary and simple facts. These applications wouldn’t be so compelling on their own, but a few expert users have discovered clever ways to apply them much more broadly, supporting many kinds of thought. To give a few unusual examples, I use memory systems to amplify creative insights, and to digest meaningful experiences. In this letter, though, we’ll focus on where the impact has felt most immediately transformative: understanding difficult conceptual material.
When I use a memory system alongside a book or a paper on an unfamiliar topic—and when everything’s going just right—I feel an alien lightness. I may still struggle with the ideas, but there’s an internalized certainty: I can carve away truly reliable footholds; I can keep accumulating until I’ve understood as much as I want; I’ll be repaid with durable access to that understanding months later. The feeling is absolute discontinuity with all my prior experiences studying new subject matter.
I’m reminded of the move from analog to digital computers. When components accumulate error, there’s a pervasive sense of fragility, a sharply rising complexity as the system grows. By rectifying errors within each component, you can assemble reliable systems which grow without strain. This is an imperfect metaphor. Understanding isn’t binary; I’m not claiming that all thought can or should be made “error-free.” But I have no doubt that this sort of “rectifier” radically expands the cognitive ground I can cover.
With that glowing description, one might imagine that memory systems just need to be popularized, “scaled”, rough edges removed. But there are good reasons for memory systems’ limited adoption. In my view, the most important problem is that it’s very difficult to do what I just described—to reinforce arbitrary material with them. One must develop a specialized skill to do this at all; I’ve spent years accumulating strategies for digesting different kinds of ideas into memory system prompts. And it’s not enough to just help readers develop those skills. Even for me, writing these prompts takes so much time and mental energy that I cover only a fraction of the knowledge I find meaningful. To make matters worse, it’s especially tough to use memory systems well with unfamiliar material—which is of course when they’re most useful. As a non-expert, my prompts often emphasize the wrong elements, or miss the forest for the trees, or fail to encode enough connective tissue to support rich understanding.
Why I’m excited about the mnemonic medium as an avenue for more powerful memory systems
Prompt-writing involves productive struggle, and I’m sure that contributes enormously to deep understanding. Sometimes there’s no way around this work: many of my most valuable prompts draw connections to personal experience. Yet I believe we can do much more with much less. Today’s memory systems “make memory a choice”—but it’s a stark choice between moderate-cost/high-benefit and no-cost/no-benefit. We can design systems which offer intermediate points. More importantly, I believe we’re nowhere near the efficient frontier. We can both lower the costs and also radically increase the benefits.
Ideally: as I learn new material, rich representations of that knowledge are continuously encoded into my memory system. This system not only helps me reliably remember what I learned, but it actually deepens my understanding of the material over time. The whole process integrates naturally into my intellectual life, with so little burden that it can be applied pervasively. This ideal may be beyond reach, but if we can make something which comes close, it’s tough to imagine that such a system would not “change the thought patterns of civilization”.
An obvious initial approach here is for experts to create and share memory system prompts about various topics. Plenty of people have tried. Shared prompts seem to work moderately well for vocabulary and simple facts, but not for more conceptual material. When I’ve tried these collections, I’ve found that I can parrot back responses to the prompts, but my understanding is quite brittle—for instance, I’d struggle to explain the material to a friend. In user interviews, other users consistently express similar problems.
By contrast, my understanding is much less brittle when I digest a well-written explanation into my own memory system prompts. I’m sure part of the reason is that prompt-writing forces me to think much more about the material. But I notice that when I practice those prompts, I’m often transported back to the relevant passages in the narrative. The isolated prompts are anchored (at least partially) in a broader, well-connected context. That richer context also keeps me more emotionally engaged during review.
That’s why I’m so excited about the mnemonic medium. If we integrate expert-written prompts into an explanatory narrative, perhaps they can take root in more fertile soil. This strategy could bring us much closer to the ideal I described earlier.
In practice, we might worry: how much understanding and connection do we sacrifice when an expert writes those prompts, rather than the reader? Interviews suggest that readers rarely experience the kind of brittleness and detachment that I hear so consistently from users of downloadable memory system decks. But I need to understand this trade-off much better, and to improve it where I can.
Apart from its practical effects, the mnemonic medium has potential to become a powerful laboratory for experiments exploring: how can we make much better memory systems?
Here’s a simple example. What if, in the weeks after you read a book, you’re asked not only to recall what you’ve learned, but also to apply that material in simple practice exercises—ones you haven’t seen before? Might that help you deepen your understanding over time? It would be tough for you to set this up for yourself. Even if you could write good exercises, the idea is to apply the material to a novel situation. The mnemonic medium offers a natural way for one person to author material like this, and for another person to use it with fresh eyes.
The mnemonic medium also makes it much easier to run iterative experiments. For example, a designer could include these new practice exercises only for certain topics, and then compare how the reader’s understanding grows for those concepts versus others. If the designer learns something new, they can iterate on the system, observing how their changes affect new readers.
There are dozens of improvement ideas I’d like to explore, and questions I’d like to answer—some about cognition and efficacy, others about emotion and feel. Ideally, the mnemonic medium would support rapid cycles of iteration towards a much better memory system. But I’m writing this letter because I haven’t yet managed to create the right environment to iterate on the most important questions. In fact, Quantum Country’s fourth essay implements the “practice exercise” idea I just described—and I haven’t managed to understand its impact, at all.
Struggling to push for depth
Over the past two years, most of my progress has been in solving problems specific to the mnemonic medium’s conceptual design. That is: problems with using expert-written memory system prompts, and with embedding memory systems into different kinds of texts in different kinds of situations. This is good and necessary! Such problems are central to the idea and must be solved to make the medium work.
But as I’ve made more progress here, I’ve slipped into solving increasingly peripheral problems—making the medium work for non-experts, in increasingly casual scenarios. I’m straying towards issues of scale, and away from my central aim: how to make a radically more powerful memory system, one which makes it easy to build a deep, flexible, and durable understanding of complex conceptual material?
To make matters worse, I can’t tell you very much about how well today’s mnemonic medium already accomplishes this. I’ve run plenty of experiments on readers’ ability to recall the embedded prompts, but I know little about how the medium impacts what actually matters—the depth of their understanding; their capacity to make meaningful use of the knowledge.
Through collaborations with university professors and course instructors, I’ve tried several times to create a context where I can observe the downstream impacts of my designs. I failed to learn much from any of these experiments. Few students seemed interested in deeply understanding the material; not just the memory system but the courses in general suffered from low participation.
Even without these problems, though, I now believe that my approach with these experiments was wrong. I was thinking in terms of statistical tests, trying to measure causal connections between properties of the memory system and downstream performance. But what I really need now is not statistical power but intimacy—a detailed, vivid picture of readers’ understanding as it develops in concert with the memory system. I’d rather establish a close relationship with a handful of serious test readers than run a high-powered experiment which can only report numeric change in test performance. The latter will be useful later, to optimize and validate, but it’s not the right data to drive early, open-ended design decisions—to identify the right fundamental primitives.
I’d thought the trouble with my course collaborations was that attrition made my sample sizes too small. But the real problem was that these contexts weren’t setting up the right kind of relationship between me and my test readers. The few students who participated actively in these courses were rarely willing to offer more than occasional perfunctory feedback on the memory system. This is reasonable: it’s not what they signed up for! It’s an extra burden. Even if they were more willing to help me, these students are generally just trying to “get through” the class. I can’t effectively learn how a memory system affects depth of understanding from someone who isn’t trying to understand the material deeply.
By contrast, my dream test readers would be quite serious about deeply understanding the material. They’d be extremely demanding of the system, ideally behaving like collaborators in its creation. They’d push at the edges of the system and creatively co-opt it for their own purposes. They’d happily tackle difficult challenges to proactively probe for weak spots in their understanding.
There’s an important distinction to recognize here—one that’s quite alien to typical Silicon Valley practices. Yes, of course, an ideal future memory system would also help less demanding people in less demanding contexts. But that doesn’t mean those are the right people to help invent the system. Feedback from that audience is the right force to broadly scale a system that’s already astoundingly transformative in the best case. It’s the wrong force to push a nascent-but-promising system to become astoundingly transformative, unless the system’s power pivotally depends on its social embedding.
To find the form of a system with the highest ceiling on its power, I want to design for (and with) demanding experts, in difficult situations. I want a close, collaborative relationship, not an arms-length “vendor” relationship.
Sketching a path forward
Now that I’ve articulated this aspiration—this shift to depth over breadth—how should I go about actually making it happen? My leading idea is to collaborate closely with a small group of people as they try to deeply understand a topic of great interest, with the help of a memory system. I’ll choose people who are already expert practitioners with today’s memory systems, and who are excited to collaborate on much better memory systems for the future. My aim is to form an intimate, nuanced picture of readers’ cognitive and emotional experiences—then to use that to drive an insight-through-making loop.
A few concrete practices I’d like to try:
- Personal “memory Wizard-of-Oz”: I follow my collaborator around, adapting whatever’s next on their reading list to the mnemonic medium—partially simulating a future ideal in which these affordances are available pervasively and effortlessly. Each adaptation can encode one or more new ideas to test (either at a system level or at a content level).
- Small-group course: I teach a small-group course on a difficult subject I know well. I create or adapt materials, using the mnemonic medium and whatever augmentations other seem most useful. I repeat the course several times, trying new ideas each time, and improving old ones through experience from the previous iteration.
- Memory coach: I help my collaborator become a much better practitioner of existing memory systems. We’ll meet regularly to review the material they’ve been writing, and to discuss how it’s working for them. I’ll help them contend with obstacles and suggest ideas for improvement. My aim here is to put pressure on my principles as an expert practitioner; this should help me test and distill my methods, and ideally develop improved ones. Moreover, this practice would help me focus on insights around object-level use of memory systems, rather than always systematizing for creation of new systems.
I’m not trying to set up sensitive statistical evaluations here, or to make rigorous claims which would generalize to large populations. I’m looking for qualitative surprises and glaring quantitative discontinuities—hints of transformative power I can exploit, clues about obstructions which might be cleared away. The theme here is small-N, bespoke, detail over generalization, exploration over validation.
To that end, I’ll focus on discussion and diaries, which I’ll ask my collaborators to keep while they read and practice. We want to notice moments when: understanding feels shaky, brittle, or parrot-like; thoughts seem sluggish; readers are looking things up; readers are bored, dutiful, or emotionally disconnected. But also moments of: unexpected ease, velocity, confidence, connection, or excitement. How does it feel to use the newfound knowledge—in creative work, in social settings, in personal reflection? How does it feel to explain the ideas to others, to learn downstream material, to solve problems?
As an adjunct, I’d also like to use the mnemonic medium much more myself. So much of my understanding of memory systems comes from years of experimentation to support my own creative and intellectual work. Those experiences have helped me develop a nuanced understanding for many aspects of memory systems. But I have no intimate personal experience of learning difficult material using the mnemonic medium. The trouble is that it’s surprisingly difficult to do that, since I’m usually in the position of creating mnemonic texts!
The closest I’ve gotten was during the development of Quantum Country, but I was too distracted by fixing early issues with the system to pay enough attention to my own learning process. More fundamentally, I didn’t have an intense personal drive to deeply understand quantum computing. No problem sets, projects, or social contexts put pressure on my knowledge, so I don’t have much sense of its solidity.
I stopped reviewing Quantum Country’s material long ago, mostly because it wasn’t integrated into my regular memory system. Enough time has elapsed that I could probably learn a lot about the mnemonic medium by repeating Quantum Country afresh, perhaps with a project in hand that I felt excited about. But this is a bit backwards. It would be much better to pursue this sort of experience in a domain where I already have a live need for deeper understanding.
I’d like to experiment with adapting a textbook of interest for my own use by hiring someone with expertise in both memory systems and in the relevant domain. It would be tough to iterate on structural changes to the memory system itself with this approach alone: I’d need to make new functionality available, then rely on someone else to use it well. Still, it’s hard to imagine that I wouldn’t benefit enormously from greater personal intimacy with the medium, particularly alongside tight collaborations with other readers.
————————
So: What better approaches am I missing? What key considerations am I ignoring? Where are my premises weak?
I’ve left the concrete details of my experimental design quite unspecified. What likely traps await me there? Why might I find myself unable to see what I need to see?
————————
This letter grew directly from a series of conversations with Michael Nielsen, and much of its thinking is due to him. I’m particularly grateful to Michael for the central observation that I’ve been slipping into issues of scale, the emphasis on small-N methods, and the idea of memory coaching.
I’d also like to thank Rob Ochshorn, Joe Edelman, and Joël Franusic for thoughtful comments.
2022-12-27 21:55:14 +0000 UTC
View Post
tl;dr: join me on Dec 24 at 9AM PST to discuss Bret's latest talk. Please watch the talk before attending; bring your noticings, wonderings, and ideas. [GCal]
For me, Dynamicland is the most uniquely fascinating project on the future of personal computing. Like all radical ideas, it resists summary, but crudely: it's a startling vision of highly malleable computing, instantiated in the physical world so that it becomes naturally embodied and social.
It's tough to talk about Dynamicland in part because there's been no official presentation or publication about it. In fact, I believe its PI, Bret Victor, hasn't given a public talk in about eight years.
But I was delighted to see Bret give a talk on its recent progress in July, at a conference on designing molecular machines. That talk is publicly available online, though it's somewhat obscured: it's listed as a talk not by Bret, but by his collaborator Shawn Douglas. Shawn does fascinating research on DNA origami; he speaks for the first 14 minutes, teeing up the scientific context for the work Bret presents in the second half of the talk.
I saw Bret present the same talk again a couple weeks ago, and I realized it might be fun to discuss, particularly in connection with (or in contrast to?) our past few seminars on projects in malleable computing.
Since I'm already familiar with Realtalk, the most interesting thing for me in this new talk is that Bret has embedded himself (with collaborator Luke Iannini) in Shawn's biology lab. They're getting their hands dirty in the wet-lab experiments, gaining real intimacy with that serious context of use, and solving meaningful problems. I love seeing this work grounded in that way; I'm quite unhappy with my own work's lack of this kind of context.
If Dynamicland is totally new to you, here are a few more resources you might find interesting:
- From lab members:
- From others:
2022-12-15 21:50:48 +0000 UTC
View Post
A consistent challenge in my development as a researcher has been: how to cultivate deep, stable concentration in the face of complex, ill-structured creative problems?
In roles oriented around operation and execution, I benefited enormously from standard “productivity” advice. Task managers and time-planning tools were essential. But now, task managers and calendars only help with the least important pieces of my work.
Bill Thurston writes:
Mathematics is a process of staring hard enough with enough perseverance at the fog of muddle and confusion to eventually break through to improved clarity.
This description resonates more with my experience of design research than anything Getting Things Done has to say, valuable though it was in my past life. To make progress in my present work, I need to “stare hard enough and with enough perseverance at the fog of muddle and confusion.” But if I’d read that last sentence five years ago, I don’t think I’d have really understood what it meant. I wouldn’t have grasped how difficult it is to stare this way, or how impossible progress is without this state of mind. Here’s what I might tell my past self:
“Why is this so hard? Because you’re utterly habituated to steady progress—to completing things, to producing, to solving. When progress is subtle or slow, when there’s no clear way to proceed, you flinch away. You redirect your attention to something safer, to something you can do. You jump to implementation prematurely; you feel a compulsion to do more background reading; you obsess over tractable but peripheral details. These are all displacement behaviors, ways of not sitting with the problem. Though each instance seems insignificant, the cumulative effect is that your stare rarely rests on the fog long enough to penetrate it. Weeks pass, with apparent motion, yet you’re just spinning in place. You return to the surface with each glance away. You must learn to remain in the depths.”
I’ve gotten much better at this. I need to get much better still! I’d like to share a few notes about my progress with this problem—mostly just reflecting “aloud”. My strategies aren’t at all intended to generalize, but I hope that my experiences might offer some hints to others seeking more depth.
Why flinch away? Some personal psychology
First: why do I flinch away when progress is slow and next steps are unclear? Why do some people seem not to flinch away? I’ve noticed a few patterns at play.
The first seems to be faulty expectations. I spent years in the tech industry; I internalized the pace of progress appropriate to industry problems. Some part of me expects that same pace on a totally different class of problems. The slow pace of research problems feels viscerally much less stimulating than I’m used to. Unbidden, my attention seeks out other more immediately rewarding targets. Sometimes this is obvious (“answer email”, “browse Twitter”); but behaviors like “read some papers” or “hack together a prototype” are often subtle grasps for more immediate stimulation.
It’s possible to get a feel for this effect on very short time scales. If I find myself sucked into an hour-long Twitter binge, I’ll become noticeably more habituated to ultra-fast reward cycles. For hours afterwards, everything else will feel much slower, way less stimulating. I’ll suddenly need real willpower to read a book for a solid hour. The acute effect wears off after a few hours, but some fraction of it persists into the next day.
What to do about faulty expectations? An illustrative list:
- Collect vivid stories which reinforce a more realistic pace of progress for this type of work. Memoirs of scientists and artists are great for this. Mason Curry’s Daily Rituals is a nice anthology in this vein.
- Practice mentally noting the impulses as they arise; make it a game to catch them as “early” as possible, listening for ever-quieter cravings.
- Savor the subtle insights which really do occur regularly in research. Think of it like cultivating a much more sensitive palate.
- Consciously fortify myself when interacting with industry people; don’t compare my pace to theirs; don’t accidentally internalize their values. This is quite tough for me living in San Francisco. I’m immersed in industry culture here, and most of my friends are founders.
Another important pattern for me is fear. When progress isn’t evident, I quietly wonder: “Is this a good idea after all? Is progress even possible? Is the problem here that I’m not good enough to make progress?” When I exert willpower to press on, I inflame those automatic fears. “Wait, I’m continuing anyway?! That’ll only make my incompetence more obvious! Others will lose respect for me; I’ll get cut out of things; I’ll end up alone and miserable.” As a deeply lonely teenager, I learned that I could earn others’ regard and become valued in a community by “doing cool stuff on the internet.” So, even today, my automatic response to these fears is to switch to an activity which produces some kind of visible output. Make a prototype, write up some notes, sketch a concept. These are appropriate behaviors at times, of course, but not when pursued as fearful substitutes for what I’m actually trying to do.
What to do about fearful reactions? An illustrative list:
- These fears are modulated by my faulty expectations; see previous list.
- Visualize close friends; notice that our relationships don’t seem even slightly contingent on the status of my work. Notice that less-close relationships which do seem contingent don’t feel terribly precious; notice that I don’t fear losing them.
- Feel into the past work that’s most gratifying in hindsight; notice that it is never the flailing result of an impulse to produce “output”. Notice that this work is exclusively the product of perseverance and unchartable paths.
- Mentally note these fearful sensations, as early as possible.
- Obvious tactics everyone recommends for this sort of thing: therapy, meditation, psychedelics.
Another pattern has to do with my stance towards the work. I’m much more likely to flinch away from difficulty when I view my research problem as a task, as something to be accomplished. I’m much less likely to flinch away when I’m feeling intensely curious, when I truly want to understand something, when it’s a landscape to explore rather than a destination to reach. Happily, curiosity can be cultivated. And curiosity is much more likely than task-orientation to lead me to interesting ideas.
I make a practice of regularly checking in about whether I have a dutiful stance towards some aspect of my research. Once I notice, I can usually summon curiosity by asking lots of questions, imagining potential implications, and so on[1]. Often I need to improve the framing, to find one which better expresses what I’m deeply excited about. If I can’t find a problem statement which captures my curiosity, it’s best to drop the project for now.
Curiosity can also totally change my relationship to setbacks. Say I’ve run an experiment, collected the data, done the analysis, and now I’m writing an essay about what I’ve found. Except, halfway through, I notice that one column of the data really doesn’t support the conclusion I’d drawn. Oops. It’s tempting to treat this development as a frustrating impediment—something to be overcome expediently. Of course, that’s exactly the wrong approach, both emotionally and epistemically. Everything becomes much better when I react from curiosity instead: “Oh, wait, wow! Fascinating! What is happening here? What can this teach me? How might this change what I try next?” The same applies to writing. For example, when one topic doesn’t seem to fit a narrative structure, it often feels like a problem I need to “get out of the way”. It’s much better to wonder: “Hm, why do I have this strong instinct that this point’s related? Is there some more powerful unifying theme waiting to be identified here?”[2]
Big morning block
I’m wary of metrics, and I don’t think concentration lends itself to sensible KPIs. But in my experience, most of my progress has come from careful monitoring and reflection over time—a mixture of qualitative and quantitative data, all seasoned with much salt.
The most important practical thing I’ve done for my depth of concentration is this: I do my primary creative work in one giant unbroken block, starting 7-8AM and ending 1-2PM, with no meetings or extended breaks. I think the important thing here is not the solution, which is probably specific to me, but the process I used to reach it.
My initial prompt was mundane: if I want to take a meeting, when should I schedule it to cause minimal disruption? I tried coupling meetings with lunch breaks; I tried end-of-day; I tried mid-afternoon coffee dates. Reviewing my journals, I noticed that no matter what I did, I almost never got much depth in the afternoon. Evenings usually felt even worse.
I’ve long been fascinated by what William James has called “the energies of men”:
I wish to spend this hour on one conception of functional psychology, a conception never once mentioned or heard of in laboratory circles, but used perhaps more than any other by common, practical men — I mean the conception of the amount of energy available for running one’s mental and moral operations by. Practically every one knows in his own person the difference between the days when the tide of this energy is high in him and those when it is low, though no one knows exactly what reality the term energy covers when used here, or what its tides, tensions, and levels are in themselves.
Years ago, I got an Apple Watch app called Tracker. It’s very simple: it randomly taps me on the wrist at a few random times each day, and asks me to indicate my mental energy level on a 1-to-5 scale[3]. What’s important about this app is that it samples randomly, rather than relying on me to choose when I record samples; and also that it’s an unobtrusive one-tap interaction. The notification appears on my wrist with five buttons; I tap one; it disappears. At no point am I looking at my phone.
I’ve used this app for years to run experiments involving my energy levels. Turning to this data again in the context of my meeting question, a clear story quickly emerged. Here’s a year of data (~50-75 samples per hour):

Conditioned by years in a typical office, I’d been working from roughly 8 AM through noon, then taking an extended break for lunch and to recharge, then returning to my desk around 1:30 or 2 PM for a second round… which never seemed to go all that well. That was a huge mistake! I’ve still got lots of mental energy around noon, and even around 1 PM. I was taking a break during that period, then returning just in time for my 2 PM energy nose-dive, which doesn’t recover until a much shorter second wind at around 6-7 PM[4].
So this year I’ve worked in one big, 6-7 hour block, starting between 7-8AM and wrapping up between 1-2PM. I prepare lunch in advance[5] and eat it at my desk. Depth of concentration is cumulative, and precious. An extra hour or two of depth is enormously valuable. I reliably get more done—and, more importantly, with more depth—in that 6-7 hour morning block than I’d previously done in 9-10 hours throughout the day.
This feels wonderful. By 2PM, I’ve done my important work for the day. I know that no more depth-y work is likely, and that I’ll only frustrate myself if I try—so I free myself from that pressure[6]. I take meetings; I exercise; I meditate; I go on long walks. I’ll often do shallower initial reads of papers and books in the afternoon, or handle administrative tasks. Sometimes I’ll do easy programming work. It’s all “bonus time”, nothing obligatory. My life got several hours more slack when I adopted this schedule, and yet my output improved. Wonderful!
Tuning breaks
I simplified a bit in this last section. I don’t actually work in a 6-7 hour unbroken block. My experience has been that intermittent short breaks—5 minutes, not 15 or 30—help depth of concentration enormously.
The best explanation I have is that the work is simply too difficult, or that I’m simply not conditioned enough, to focus deeply and continuously for hours. If I try, then I rapidly burn myself out. What this feels like, in the moment, is that I’ll find I need to apply more and more willpower to maintain my concentration, until the pressure’s too much, and I give into distraction to release it.
A short break lets my willpower muscles recharge. Five minutes works well for me; longer periods more sharply erode depth. If it’s done right, it won’t disrupt depth of concentration much. The important thing is to diligently avoid doing anything cognitive during the break: reading or thinking about another topic imposes higher switching costs. My favorite activity is to do simple household chores. I’ll pick things up, wash dishes, and prep vegetables for dinner. The temptation is to look at my phone during breaks, to read something, to answer messages—but that’s a terrible idea. It brings me all the way back to the surface, makes me claw my way back to depth again when the break timer ends.
But what should the working interval be between breaks? The standard “pomodoro” recommendation is a 25 minute working period, followed by a 5 minute break. When I’m really struggling to concentrate, a shorter working period does help—it feels like lifting lighter weights at the gym. But the breaks do impose a real switching cost. Sometimes they interrupt a smoothly-chugging train of thought; sometimes it takes a few minutes to resume in the next session. And of course there's a practical time cost. The typical pomodoro schedule consumes 17% of the working period in breaks. In practice, I find that I'll take a couple minutes to finish what I'm doing and settle back in at my desk when the timer goes off. I was surprised to measure the true cost at 28% of the total time spent on breaks!
Would longer working intervals be better? Or would the price paid in focus be higher than the benefit of less time spent on breaks? I ran an experiment. Every morning, I randomly chose a working interval: 25, 35, 45, or 55 minutes, with 5 minute breaks[7]. I recorded a time sheet with subjective focus ratings and task information. I did this for 40 days, 10 for each interval.
The aggregate findings were unsurprising. Time efficiency is much better for longer working intervals: the median 45 and 55 minute days had almost an hour of extra real working time. Conversely, longer intervals produced less reliable deep focus. 55 minute intervals were about twice as likely to have shallow focus as 25 minute intervals. But the most interesting result was that the effect of long working intervals on focus shifted throughout the day, as depicted in this figure:

For the first two hours of the day, longer working blocks don’t seem to impose a noticeable penalty on focus. But by the end of the day, long blocks have shallow focus half the time, while short blocks might actually have gotten more focused.
So for the past two months, my schedule has been: 55 minute intervals before 10 AM; 45 minute intervals between 10 AM and noon; 25 minute intervals afterwards. I hit higher depth-of-focus ratings more often now than I did before this change, and I’ve added ~45 minutes of clocked working time onto my morning blocks—without actually making them any longer.
The data also suggest that, as expected, some kinds of tasks are easier to focus on than others:

This figure suggests I might try varying the working interval with the difficulty of the task. I’d need to be careful there; for example, in the case of programming, the scores are poor in this sample because the programming I was doing was so mundane and annoying, and I didn’t want to be doing it—not because it was difficult.
Micro problem-solving
I’ve discussed some big changes I’ve made which have helped my depth of concentration. But day to day, much of the work is in training myself to notice small impediments to depth. Lots of tiny problem solving. Here’s a non-exhaustive list:
- One obvious category is around technology and internet use.
- I used Focus to block obvious displacement activities during my working block (email, Twitter), but I noticed myself dodging difficult work by diving down literature rabbit holes, or endlessly chasing visual reference material, or pulling on some irrelevant technical thread. So now the WiFi is off by default on my computer in the morning.
- I noticed that sometimes I’d re-enable the WiFi because I really actually needed to look something up… and then I’d forget to turn it off. So I made an Alfred workflow which turns it back on for an explicitly-specified period, so that I have to say “give me internet for 3 minutes.”
- It goes without saying, but no internet on my phone before I sit down at my desk. I don’t want anyone else’s thoughts in my head before I start thinking my own.
- I’d find myself looking at my phone during breaks, even though I know it harms focus. So now I use Forest in the morning. It lets me use my phone to play music but blocks just about everything else.
- But then I had a problem: sometimes packages would arrive, and I’d need to buzz them in; my dog walker would tell me they’ll arrive soon. So I configured a “Focus” mode on my iPhone which only allows notifications from the couple sources I really want to be responsive to in the morning. This mode’s on a schedule, so that it’s automatically enabled during my working block.
- When I’m stuck, I’ll often find myself feeling sleepy. I think this is just a consequence of expecting more stimulation than I’m getting. When I notice this, I play energetic music, do a quick exercise, etc.
- If I spend a working interval flailing, never sinking below the surface, the temptation is to double-down, to “make up for it”. But the right move for me is usually to go sit in a different room with only my notebook, and to spend the next working interval writing or sketching by hand about the problem.
- Administrative tasks are a constant temptation for me: aha, a task I can complete! How tantalizing! But these tasks are rarely important. So I explicitly prohibit myself from doing any kind of administrative work for most of the morning. In the last hour or two, if I notice myself getting weary and unfocused, I’ll sometimes switch gears into administrative work as a way to “rescue” that time. Otherwise I do these tasks in the afternoon; I’ve trained myself not to mind if they’re very delayed or dropped. (My apologies to anyone waiting on a reply to an email.)
- I notice that I’d often take a few minutes to transition fully from a break into a new working interval. After some experimentation, I’ve found that a 15-second meditation lets me make that shift immediately. Body scans work well for me.
- When I’m doing very difficult intellectual work, I find that being around other people harms it. When I’m doing work I don’t want to be doing (e.g. mundane programming), I find that being around other people helps my depth of focus.
- Music of some kind seems to help me focus, but the right kind depends on the work I’m doing. If what I’m doing is very difficult (tempting dullness), it should be something subdued, repetitive, and non-vocal; if it’s easy (tempting boredom and distraction), it should be active, something I’d want to sing along to.
- To preserve the sanctity of my morning working blocks, I almost never accept meetings before 2PM. This makes it tough to meet with Europeans, an awkward 7-9 hours away. I solve that problem by offering them weekend morning slots.
- If I want to make more progress on a difficult creative project, a good way to ratchet up the intensity of my work is to add weekend mornings. Trying to work more hours on weekdays rarely gets me anywhere with difficult intellectual work.
- I’ve noticed that unhealthy afternoon/evening activities can easily harm the next morning’s focus, by habituating me to immediate gratification.
- For a while, I’d regularly listen to audiobooks and podcasts when walking. But this contributed to an internal expectation of constant stimulation. I’ve pulled back: better to let my mind wander. Now I listen to those things when exercising and cooking, since I find that the physical stimulus is distracting enough to keep me from doing much useful thinking.
- Twitter, Mail, and so on are disabled on my phone so that even in the afternoon/evening, these automatic activities don’t scatter the next day’s focus.
Key to much of what I’ve described here is a journal, and a practice of weekly, monthly, quarterly, and annual review. I don’t think the details of those reviews is very important—most of the benefit just seems to come from regularly reflecting on what I’m trying and what’s happening as a result. It’s really about developing a rich mental model of what focus and perseverance feel like, and what factors seem to support or harm those states of mind.
On that note, I should mention: everything I’ve written here applies to the kind of work Thurston was talking about—work which requires staring hard enough and with enough perseverance at a fog of muddle and confusion. Sometimes what’s needed is to explore aimlessly. Sometimes a problem requires playful lightness and expansiveness. These modes need a completely different set of practices, and they’re often best with others. Sometimes I just need to execute; and then traditional productivity advice helps enormously. But deep insight is generally the bottleneck to my work, and producing it usually involves the sort of practices I’ve described here.
————————
Thanks to Catherine Olsson, Kanjun Qiu, and Michael Nielsen for many helpful conversations on this subject. Thanks to Sara LaHue for suggesting that I look into interactions between time of day and working interval on depth of focus.
————————
[1] Michael Nielsen has some great notes on tactics for this in his Notes on creative contexts.
[2] I notice that I really struggle to generate curiosity about problems in programming. Maybe it’s because I’ve been doing it so long, but I think it’s usually my problems are usually with ephemeral ideas, incidental to what I actually care about. When I’m fighting some godforsaken Javascript build system, I don’t feel even slightly curious to “really” understand those parochial machinations. I know they’re just going to be replaced by some new tool next year. When I’m debugging some dataflow race condition, I can at least get a little interested in fundamental problems of program structure. Long ago, that was my full-time job! Unfortunately, most of my programming problems these days are the parochial kind. They make me resent programming, which is a shame: it was once the most joyful activity of my life!
[3] You can make it ask about whatever you like (rather than energy level) and configure whatever response scale you like. But there’s a significant caveat: this app includes no data visualizations of any kind. It just gives you a JSON dump, which I analyzed using R.
[4] I did an extended experiment with 20-30 minute afternoon naps around 2-3PM (~8th waking hour). They felt nice, so I still do them sometimes, but they didn’t substantially improve my afternoon mental energy levels. Null results also for mid-afternoon exercise.
[5] I’ve also done experiments around how lunch affects my mental energy. I can’t distinguish a chopped salad, a quiche, and a protein-centric sandwich in my energy levels, but egregiously carb-y choices like burritos and pastries do hurt my energy. I usually eat a chopped salad for lunch these days.
[6] I notice that some part of me feels ashamed to say that I’m “done” working at 2PM. This is probably because in my previous roles, I really could solve problems and get more done by simply throwing more hours at the work. That’s just obviously not true in my present work, as I’ve learned through much frustration. Reading memoirs of writers, artists, and scientists, I see that 2-4 hours per day seems to be the norm for a primary creative working block. Also, I don’t want to harp on this because I want this essay to be about quality, not quantity, but: I think most people are laughably misled about how much time they truly work. In a median morning block, I complete the equivalent of 12 25-minute pomodoros. When I worked at large companies, getting 8 done before 6PM was a rarity—even though I tried to assiduously arrange my calendar to maximize deep work!
[7] I just use the timer function on my Apple Watch. There are apps for this, but I’m often working with a paper notebook away from my computer. And I don’t want to use my phone.
2022-11-30 21:37:04 +0000 UTC
View Post
I'm trying something new! Work with me on a research project of mutual interest; get six months of funding and mentorship.
Details here (apply by Nov 25): https://notes.andymatuschak.org/Research_fellowship
If you know someone who might be a great fit, please send them that link! Copying the overview from that page here for convenience:
Who:
- you are some blend of a designer, a technologist, and a researcher—an aspiring interface inventor
- e.g. artist-technologists, creative tech industry people between positions, academic folks on sabbatical or leave, recent graduates still exploring, post-exit founders pursuing creative work, consultants looking for an interesting break
The project:
You’ll produce:
- a prototype of a novel tool for thought, tested with real people
- an essay (like this) documenting the prototype, the theory behind it, and what you learned in your test
You’ll receive:
- $25,000 (+$5,000 if you’ll move to / live in San Francisco)
- regular meetings with me for ideas, feedback, support
- introductions to potential colleagues and employers
- a larger audience for your work
When:
- six months, ~full-time, starting late 2022 or early 2023
- application deadline 2022-11-25
Where:
- San Francisco preferred; remote OK for the right candidate
2022-11-17 22:01:26 +0000 UTC
View Post
Join me and author Joshua Horowitz on Oct 29 from 9-10AM PDT to discuss Engraft, a new system for composable live programming from Joshua Horowitz (formerly of Dynamicland, now a PhD student at UofW). Please read the paper (or at least watch the video) and bring questions, concerns, noticings.
Please note that this paper is still in submission; do not share the link publicly.
We have computational notebooks; we have Visicalc-like live expression editors; we have noodle-based programming tools like Blueprint. But these tools generally exist in their own universes. If you want to combine them with traditional software programming methods, or with each other—using "the right" live programming tool for each job—you're in trouble.
Engraft is a very interesting attempt to make live programming environments compose well with each other and with traditional programming environments.
Though the authors situate their papers in different historical lineages, I feel Engraft makes an interesting pairing with last month's discussion of Webstrates. Both of these systems are interested in breaking software primitives into more composable, malleable, interoperable elements. Of course, they have different priorities, and they take different approaches—but I've enjoyed pondering them both as pieces of the same puzzle.
I hope to see you next Saturday, Oct 29! And, reminder: office hours this Monday (Oct 24) at 2PM PDT.
2022-10-21 20:49:43 +0000 UTC
View Post
Trying office hours again this month (Google Calendar link; iCal URL):
- Monday, October 24, 2:00 PM PDT (meet)
Bring projects, prototypes, sketches, concepts, questions. Repeating guidelines from previous sessions:
- At least for now, there are no reservations. Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems. But! The chat transcript will be published as a comment on this post after the office hours, so that URLs which people share are more easily accessible. Identities in the chat transcript will be public.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
2022-10-18 20:31:26 +0000 UTC
View Post
Over the past few weeks, I’ve run twenty live observation sessions for my most recent mnemonic medium prototype, and I’ve read through a stack of diaries from asynchronous testers. I’d like to share some of what I’ve learned, and where I might go next.
For better or worse (and we’ll see a bit of both), I had trouble with tester screening during this iteration. I suspect a lot of people were just eager to see what I was working on, so they ignored my instructions about tester requirements. I wasted too much time talking to people who didn’t have a good reason to carefully read the test material, or who had never heard of spaced repetition systems (SRS). Of course, the system will eventually need to explain itself to people who aren’t familiar with SRS, but I wasn’t trying to test that this time—more on that later.
At the highest level, when testers fit my intended eligibility criteria (the tester authentically wanted to internalize the material, and they had prior SRS familiarity), the system worked much as I hoped. In other cases, it rarely did. By “worked”, let’s say: appeared to deliver substantial net benefit to the reader; the reader was vocally appreciative of that help and acted on the interface accordingly; the reader said they wanted to review their saved prompts subsequently; the reader (usually) expressed that they want this sort of interaction in a wider range of texts.
That said, “did it work?” blurs away most of the interesting insight. Let’s dig in.
What have I learned about the potential scope of the mnemonic medium?
One of the key questions I’m trying to answer is: how broadly might the ideas of the mnemonic medium apply? Someday, might we use affordances like these in every informational text? What does the medium want to become, to reach its full potential? The medium seems to work well in Quantum Country… but should we conclude that its range is limited to technical primers? I tested this latest prototype with two books: Introduction to Modern Statistics (IMS)—a formal, technical textbook; and Shape Up—an informal, non-technical book on product management.
At first glance, IMS seems quite similar to Quantum Country. But Quantum Country’s mechanics wouldn’t have worked well in this book. Quantum Country is a focused primer, meant for an audience ready to put themselves completely in the author’s hands. Large textbooks like IMS aren’t meant to be read linearly or uniformly. All of my test readers had studied statistics (albeit sometimes long ago); many wanted to pick and choose just a few prompts about new material. Others still wanted to review everything, but without actually saving prompts about familiar material. Neither of those workflows would have been possible with Quantum Country’s design. Last year, when I tried to extend that design to texts like IMS, I saw users routinely frustrated by the system’s authoritarian rigidity. But in my most recent tests, I observed non-linear workflows working happily, with little friction. I’m now fairly confident that the mnemonic medium can be effectively applied to a broad array of technical textbooks—not just linear primers. Because IMS is technical, I can’t yet say anything about the medium’s performance in non-technical textbooks.
Shape Up is instructional, but it’s informal and non-technical. Success with this text would expand the medium’s domain several notches. Here, results were more mixed. Readers in my target pool (material relevant, SRS familiar) did appreciate the medium’s help, but my qualitative sense was that they were getting noticeably less benefit than IMS readers. It seemed more like a nice-to-have than a transformative augmentation.
In hindsight, it helps to forget spaced repetition for a moment and ask: what would be the ideal personal high-growth environment for Shape Up? What really matters here—as several readers pointed out—is that you meaningfully change your product creation practices. This means you probably want plenty of hands-on exercises and activities, personal mentorship, and perhaps ongoing reflection/application prompts. You might want salience prompts, to help you connect the book’s ideas to events in the moment. Traditional retrieval practice would be quite helpful, too. I found myself forgetting the specifics of the book’s methods before I had a chance to apply them. But it seems clearly less critical than those other modalities. You don’t need cybernetic help with absorbing earlier chapters in order to understand later ones.
In a funny sense, Shape Up is a lot like a self-help book. In fact, many informal, non-technical instructional books are like self-help books. In genres like psychology, philosophy, or business, popular books are often really about changing your life in some way. And so augmentation should be about that too. As a reader with an established SRS practice, I do want the mnemonic medium for books like this. Retrieval practice really does help me bring these books’ ideas into my life. But as a researcher, my instinct is that this genre isn’t a strategic next target for the mnemonic medium. If I want to expand the scope of the mnemonic medium into less technical texts, I should try adapting softer science books, in fields like psychology or political science. If I want to aim for less formal texts, I should try adapting informal “explainers” about technical topics. And if I want to augment self-help-ish texts like Shape Up, I should focus on other support mechanisms, like timeful texts—ones meant to help people reshape their lives around the text’s ideas.
The impact of margin prompts on the reading experience
This latest prototype moved authors’ prompts into the page’s margin. That totally transformed the reading experience. Readers consistently noted that the margin prompts signal particularly important passages and hint at what to focus on. Readers felt that the prompts made them slow down and pay more attention. Most—but not all—readers welcomed that influence.
I’ll share a more concrete story that several readers expressed. Upon first read, a passage didn’t seem to contain anything important. Then they noticed a prompt marker in the margin and felt unsure: wait, was there a key detail here after all? In response, they re-read the passage more closely, guided by the prompt’s focus. In each instance, the reader admitted that they’d missed the point highlighted by the prompt. They appreciated having that corrected.
We heard these same sorts of observations from Quantum Country readers. The embedded review sessions made people re-read certain sections, or read more carefully. But those sentiments were stronger and more frequent in this prototype, where margin prompts make their presence felt continuously.
Prompts as summaries
In Shape Up, when the reader’s screen is large enough, prompts’ “front” sides are always visible in the margins. But there’s not always room for that. On smaller screens, and in IMS, prompts are “collapsed”. That is, readers see only a symbol indicating the presence of a prompt. When they mouse over the symbol, the prompt text is displayed. This distinction made a big difference in the reading experience!

When prompt text was always visible, many readers used the prompts as lightweight summaries of the adjacent passages, reading prompts before reading their associated passages. These readers often used the prompt text to decide whether to read the associated passage. What I’d intended was the inverse: people would read the main text, and when something particularly struck their interest, they could scan horizontally into the margin to read and perhaps save the adjacent prompt.
I’m worried about this summary-oriented behavior. Prompt-first reading will often omit meaningful details: the full text contains narrative material which provides necessary context for later sections. More broadly, mnemonic medium prompts aren’t exactly summaries. And they aren’t meant to work in isolation. Prompts contain the information which should be reinforced through retrieval practice, but they lean on structure and detail in the associated narrative. In fact, that connection is part of how the mnemonic medium aims to solve a central problem: outside of rote material, studying other people’s SRS “decks” usually doesn’t work very well! Such prompts usually feel atomized, disconnected from real understanding. By contrast—at least aspirationally—when you recall the information on a mnemonic medium prompt, you’re involved with more than just its raw text. The review resurfaces the much richer narrative context where you found that prompt.
Some of these summary-oriented readers didn’t really care about the retrieval practice mechanics at all. They really just wanted paragraph- and section-level summaries of the text. That does seem like an interesting reading affordance to explore, but if I were trying to solve that design problem, I don’t think my solution would double as spaced repetition prompts. Summaries and prompts are related—but distinct—mediums. Another similar observation: some readers who were struggling with a passage mentioned that a prompt helped them by offering an alternate wording. This likewise strikes me as a nice second-order effect, but prompts are not the right tool for that job, either.
Completionist prompt-reading
Even when readers weren’t using the prompts as summaries, most people with large screens read every prompt in Shape Up, where prompts’ “front” sides were persistently displayed in the margins. I’m sure this was in part due to the novelty of the prototype. People were curious. But after half an hour, this behavior struck me as a touch compulsive—like they felt an obligation; like they weren’t “reading correctly” if they didn’t read the prompt text. Completionist prompt-reading worries me. It seems like a substantial disruption to the reading experience, like reading a heavily footnoted text. Your eyes scan erratically over the page; your attention jumps in and out of the narrative. Maybe the prompts helpfully guide your reading, but does that make up for the distraction? And is helpful guidance truly the reason why you let your eyes dart back and forth—or is the behavior more compulsive, a dutiful completionism?
Another surprising effect of engaging with prompts while reading: that behavior sometimes amounts to implicit, on-the-spot retrieval practice! That is, when people read the prompt text in the margin, challenge themselves to produce a response, then mouse into the prompt to read the author’s response, they’re doing the same thing they’d do in the “real” review session—they’re just not “grading” themselves. In fact, this may be a more natural way to review while reading than the “traditional” mnemonic medium embedded review box, which can feel like an obtrusive interruption. The catch is that these on-the-spot reviews are likely much less effective. They’re too soon. You just read the sentence containing that idea, so it’s (usually) easy to supply the response—probably from short-term memory. The spacing effect literature suggests that this immediate review probably won’t result in much memory consolidation. Better to wait a few minutes; or, probably, a few hours. Also, at least in this prototype, readers don’t inform the system whether their retrieval was successful or not. That means we can’t set the initial review interval appropriately. But maybe none of that matters. I believe the most important thing to get right in spaced repetition memory systems is the emotional experience. They’re plenty efficient, even with naive scheduling; the problem is that people don’t like to use them. Maybe it’s fine to accept less efficiency here if these on-the-spot reviews feel much more natural than the relatively obtrusive review boxes.
When prompts were “collapsed” (in IMS, and in Shape Up on smaller screens), the behaviors I’ve described shifted dramatically. A small handful of readers moused over each prompt marker to read its contents. But most readers only interacted with the prompts in response to some impulse, like when they found a passage particularly difficult or interesting. As a designer, I flinch at the notion of imposing extra interaction costs… but reading behaviors in the “collapsed” mode do strike me as healthier. Or at least closer to what I’d intended.
What to do about all this? My instinct is to make the “collapsed” behavior a user-controlled setting, and to have it default to “collapsed”. As I apply the medium to more texts and run more user observations, I’ll randomize that setting in each session and continue to watch how people behave.
Is “saving a prompt” the right primitive verb?
Prompt saving as all-or-nothing gesture
In this new design, saving a prompt is a lightweight gesture which expresses your interest in a passage. “I care about this detail enough that I want to practice recalling it, so that I internalize it deeply and reliably.” It’s nice that the new design makes that gesture so fluid, spontaneous, and situated. But no matter how light the interaction, that’s a pretty intense desire to express. You’re signing up to expose yourself to repeated testing on that detail. Sure, if we do a good job with triage tools in the review interface, you can incrementally ditch prompts which bore you later. And sure, “englightened” SRS users know that prompts are dirt-cheap—maybe thirty seconds in the first year and half that thereafter—so you’re not committing yourself to much. But that’s often not how it feels in the moment, when you’re making the decision to save a prompt floating in the margin.
Test readers constantly found themselves wanting to gesture at important passages—some way to emphasize, to express “this is important!”. But often their impulse was mismatched with the notion of “saving a prompt.” They had an instinctive emotional response, and they were looking for an expressive outlet. They weren’t necessarily looking to sign themselves up for future retrieval practice. My prototype forced their impulse into two distant choices: you can save (or create) a spaced repetition prompt about that passage, or else you can leave the text totally untouched.
I saw too many impulses fell into the chasm between those choices. I’d see readers hesitate, perhaps select a phrase… then ask for a highlighter, or to be able to “bookmark” a passage, or to extract it “for safe keeping” into some notes system. On a few occasions, readers created a dummy prompt “as a placeholder” to mark a passage of interest. Not good, but who could blame them? It’s all they could do!
This is a problem with my prototype, but it’s also a problem with digital reading in general. It’s like reading a book behind glass! The web is particularly bad. Even when you’re using a browser extension with the usual support for highlights and notes, it’s like reading a book with your hands wrapped in enormous mittens. You can’t really scribble in the margins: your notes hide behind some icon or in some non-spatialized sidebar. Your highlighter’s expressive range on an EPUB or a web page is: you can make little yellow rectangles, sometimes, where there’s text. Arbitrary markup? “Gosh, what would happen on reflow?!” Quiet, engineer—make it work. (Apple Pages sort of did! For PDFs, see LiquidText; for the web, see academic systems iAnnotate and SpaceInk.) Few digital readers have anything resembling the expressive range they’d get with a real paper book and such exotic tools as a pen, a highlighter, sticky notes, and a legal pad. Yes, hypertext is nice; search is nice; copying and pasting excerpts is nice. But on a computer, I still feel like I’m reading through a thick pane of glass. I can insert myself into a text only by filing a form in triplicate. Then, maybe, a yellow rectangle will show up.
I suspect this rant touches the heart of what many testers seemed to be feeling. I gave them a glimpse of something they didn’t realize they’d wanted: a spatialized tool for interacting with a web book, a way to “scribble in the margins”. Then it turned out to be yet another formal tool—literally, in this case, another form to be filed! Oops.
Incrementalism
Expressivity aside, there’s another good reason to smooth out this prototype’s all-or-nothing interaction: incrementalism. In some cases, the trouble was that the reader simply wasn’t yet familiar with spaced repetition, or that they had deluded beliefs about the cognitive value of highlighting. More often, the reader just wasn’t yet ready to commit to anything more than a simple highlight. They were interested—a bit. But they weren’t sure how much yet. Often you can’t clearly appraise what matters to you until you’ve read a whole section. Saving an author-provided prompt may be a bigger gesture than you wanted to make. The situation’s worse if no prompt is provided: for most readers, writing a new prompt requires enormous investment.
A highlight can be a provisional first step of a longer sequence. Some readers wanted to make a quick first pass with a highlighter; then, after they’d built a holistic grasp of the section, they’d re-read the most important areas more carefully. Others wanted to make a quick first pass, then to let the material marinate. If they found themselves thinking about it in the coming days, they’d use those highlights to guide further work with the text.
What if Orbit could facilitate this incremental approach? Here’s a quick sketch of how that might work. You make a first pass on an article, highlighting sections which strike you, perhaps jotting a few short notes. Then a week later, you’re presented with your excerpts and notes, maybe using an approach like LiquidText’s to show that material fluidily in context of the full text. If you feel a renewed surge of interest in any of those markings, you could put in some more work to turn them into a prompt, or more generally give the text more attention. Taylor Rogalski referred to this approach as “inverted Orbit”: you’re starting with a distant relationship, then bringing the ideas that matter into tighter orbits.
I’m instinctively excited about an incremental workflow, but prior attempts here make me wary. Readwise implements a similar model: highlight, then (in response to periodic emails) revise the most interesting material into prompts or deeper notes. I’ve talked to many Readwise users, but I’ve met none who make much use of its incremental elaboration tools. I’d need to understand that better before pursuing this idea further. Speaking for myself, I notice that once days have gone by, I’ve often lost my emotional connction to the text. A few highlights are rarely enough to rekindle that interest. Sometimes I’ll mark up a physical book, intending to write prompts about it later. But next week, when I see the book on my desk, it feels like I’ve created homework for myself. By contast, during the reading experience, the narrative creates a strong emotional connection to the text. That’s often enough to make me enthusiastic about writing prompts while I read. The next day, prompt-writing feels like a chore. I usually have to re-read the text for a while to get myself interested again. If author-provided prompts were already available for the associated passages, this emotional distance might not matter. I might just need to click a button to accept the author’s prompt. But I worry that the emotional issues broadly remain a barrier here.
Another important prior work comes from Piotr Wozniak, one of the contemporary originators of spaced repetition. His SuperMemo system includes “incremental reading”, which aims to fill a similar need. This design has a small but enthusiastic community of users. For me, at least, it's not quite the right set of primitives. In SuperMemo's incremental reading workflow, the main actions you take are to compress and excerpt. You start by reducing a full article to a few short excerpts which deserve more attention. In a later session, you'll see only those excerpts—each in isolation, out of context—and you might edit them into summaries focused on the elements you find most salient. Then in yet another session, you might transform those summaries into spaced repetition prompts. Typically these are cloze deletions, which are easy to make, but which rarely seem to work well. I like the incrementalism; I like Piotr’s exhortation to stop reading a passage as soon as you feel bored or unfocused. But I don't like that the primary verbs are all about decontextualization. In SuperMemo's conceptual framework, texts are wordy baskets of raw information, waiting to be strip-mined for Platonic nuggets you can “keep”. But for me, texts are narrative, texts are prose, texts are structure, texts are voice. Prompts are helpful cues, but they're subordinate to the richer original material. I don't want to whittle down the source text; I want to layer lenses on top of it.
Act on ideas, rather than acting on prompts?
My instinct is that focusing on incrementalism isn’t quite enough to solve the emotional problems I observed during testing, but we might be able to make some progress by aligning the core verb more closely with readers’ expressive intent. Here’s one approach I’d like to explore.
My latest prototype’s intended workflow is: read the main text until you hit an idea that feels important; look in the margin for an associated prompt; read and evaluate it; save it, if it captures the idea as you hoped. What if the primary workflow involved acting on ideas, rather than acting on prompts? Here’s how that might work: read the main text until you hit an idea that feels important; select some relevant text; click “Save”. That’s it! The text you selected is visually highlighted; it’s saved to some personal library of excerpts. Probably you can jot a quick associated note too.
So far, I’ve just described a typical annotation tool. The mnemonic medium twist is: if your saved text had associated author-provided prompts, those would be automatically saved too. The prompts would surface in future reviews as usual, perhaps with some extra design elements to ground them in your saved text (and its context). You’d see those prompts in the margin once they’re saved (and a hint of them while you’re selecting), so you can evaluate and edit/remove them if you like. But my intent is that you usually wouldn’t bother thinking about the prompts. You’d just provisionally accept author-provided prompts associated with your highlights; we’d make it cheap to ditch them later. If there are no author-provided prompts, you’d notice that via the empty margin. You could choose to write one immediately, but you might also refine the text into prompts later, through some separate resurfacing workflow. (I’m not yet sure how to solve the problems which Readwise users experience with the latter.)
I’ve not exactly solved the stated problem. Readers wanted some lower-stakes way to express what they found important in the text, without making decisions about “saving prompts”, or committing themselves to future retrieval practice. But I think this workflow hints at an important observation: readers often just want to express their interest in an idea. Making people act on “prompts associated with their idea of interest” creates extra weight and indirection. If we could remove that distraction from the reading experience, I suspect people wouldn’t feel much need to explicitly pick and choose prompts. Serious readers might welcome high-quality prompts associated with their highlights in later review sessions—particularly if those prompts felt grounded in the readers’ selected text; if the prompts felt like provisional offerings rather than commitments; and if declining those offerings felt cheap, guilt-free, and reversible. We have plenty of latitude to tune the affective knob of opt-in vs. opt-out here.
An idea-centric interaction model can still produce many of the positive passive effects that I described earlier. Maybe un-saved text with associated prompts is very subtly highlighted, or there’s still some icon in the margins—some way to give you peripheral vision into this extra layer of the text, to subtly absorb “something important here!”. And if you’d like to take an actively studious stance towards a text, perhaps you could flip a switch to show unsaved prompts in the margins, like this prototype does.
On that note, it’s interesting to observe that an idea-centric design is mostly helpful for when you’re not reading studiously, when you don’t necessarily care very much about the text. One could reasonably argue that I shouldn’t focus on such cases. This doesn’t play to the strengths of the medium—serious people internalizing difficult material. In fact, when you’re reading quite studiously, you probably want to save every author prompt by default; any highlighting interaction is a fiddly nuisance. But my instinct at the moment is that the line is blurry; people’s stance towards towards text will shift back and forth in ad-hoc ways.
This is just a sketch, and there are lots of unsolved problems. The most serious one is: how to handle higher-level summary or distillation prompts, which pertain to entire passages rather than to specific phrases? These are often the most useful prompts. One simple solution would be to include those prompts if you highlight any phrase within the long passage they cover. But I’m not satisfied with that yet.
More practically, I’m not excited about the prospect of creating yet another annotation and excerpting tool, particularly when I consider all the subsequent library management and integration workflows which users would naturally expect. I would probably only pursue this path if I found some way to avoid that.
In-context review
This idea-centric design direction rekindles my interest in an idea I’ve been tossing around for a while: can the review experience somehow happen in the context of the book? Right now, you read a mnemonic text; you save prompts; then later those prompts appear one by one, totally divorced from their source. You can click a link to return to the source location, but that means leaving the review for a separate interface and workflow. Review itself is completely isolated from the text, diminishing the prompts’ emotional connection to the original narrative. This separation also creates frictions around curiosity and remediation. For instance, if you find yourself recalling an answer but not quite understanding what it means, you should be able to fluidly and instantaneously peek into the source context, without disrupting the flow of your review. Likewise if you find yourself curious to see an illustration which you remember sat near the source text.
I’ve made a few attempts at designs like this over the past four (!!) years, but I’ve never been able to make them work. Here are a few notes on what I’ve found.
- You don’t want to provide too strong a cue for retrieval practice. So maybe the context only appears when the answer is revealed.
- All that extra text feels overwhelming and distracts from engaging with the answer. So maybe it’s progressively blurred but sharpens on touch or gesture.
- Review usually takes place on mobile devices, where you need to make significant tradeoffs between screen real estate for the answer and for the context. So maybe the context lies blurred “behind” the answer along the Z axis and can be “brought to the surface” on touch.
- One outlandish approach would be to display every prompt as a cloze deletion in context of the text. Review would consist of a shifting sequence of windows into source texts, each with some segment blacked out. But that’s far too much cueing, and cloze prompts don’t seem to work nearly as well as question/answer prompts.
- Finally, a broader problem: as you understand a topic better, your sense of the prompt often transcends any one source and becomes more about connections between them and your own ideas. Naively keeping a prompt visibly anchored in a single source text might actually restrain this process.
My instinct is that some good solution is possible here, and that it would radically transform the feeling of review. I think it would also help smooth the boundary between resurfacing highlights and resurfacing prompts, since both interactions would now be anchored in the context of the source text. Smoothing that boundary might in turn help smooth the in-text “saving” interaction. A solution might point towards a kind of “incremental reading” which helps you distill key ideas and connections as in SuperMemo, but while retaining the rich context of the source text.
“Onboarding”
Contrary to my instructions, half of my test readers either didn’t know what spaced repetition was, or didn’t really see why it might be relevant to them as a professional knowledge worker, outside of language learning. This irked me at first—I didn’t intend to test the system on SRS-naive readers—but, as I’ll explain, it was ultimately quite instructive. To make something of those sessions, I gave SRS-naive testers a 5-10 minute pitch on the value of spaced repetition for internalizing conceptual material. I had some success: among SRS-naive testers for whom the material was truly relevant, almost all ended up engaging seriously. But these conversations clearly illustrated the enormous challenges facing me in “onboarding” design.
In Quantum Country, we integrated a long introduction to the medium into the first essay—about two thousand words. We explained it over time, a section here and there, interleaved into the larger structure of the first essay, and contextualized alongside the concrete interface elements. Then the follow-up emails and end-of-review summaries did more explaining, incrementally over time. This really did seem to work, but I have no idea how to translate Quantum Country’s approach into a general system which can be layered onto every text. By default, users will speed-run interface text. Quantum Country readers only read our long explanations because they were written in the voice of the book’s authors; the authors had already built trust with the reader before discussing the medium; and the explanations were presented both stylistically and structurally as part of the primary text.
Some of my testers had already read this Quantum Country text. They understood my current prototype immediately. It’s good at least to see that the onboarding “transfers”. Other testers hadn’t seen the mnemonic medium, but they had read Michael Nielsen’s “Augmenting Long-Term Memory” or Nicky Case’s “How to Remember Anything Forever-ish” or Gwern’s “Spaced Repetition for Efficient Learning”. These testers also understood the potential benefits of the mnemonic medium immediately. That’s more evidence that lengthy introductory essays can do the job. However, each of those essays is much longer than the medium-centric material in Quantum Country. We’re not gaining much practical ground.
A final cluster of testers had extensive SRS experience from learning a language or from some similar rote subject matter (e.g. anatomy, pharmacology). These testers had more varied reactions to the mnemonic medium. One common reaction was: spaced repetition was so effective for learning languages, but I had no idea how to apply it to anything else—wow, this is great! But another common reaction was: spaced repetition was a useful hassle; it was all about memorizing rote piles of information; I don’t see how that relates to understanding or to anything I’m interested in now (sotto voce: and I don’t buy your explanation that it does). For these users, and for others with school-induced traumas, I suppose there’s some un-onboarding to do.
My impulse is to distance myself from the word “review”, and even the word “repetition”. People are (rightly) much more interested in “internalizing” a text’s ideas than in “remembering” them. Aspirationally, the system is about marination, about establishing a powerful (but lightweight!) ritual for deepening your relationship with ideas you find important. Not about “studying” or “reviewing” or even “practicing”. I’m not thinking about these substitutions in terms of some kind of facile rebranding: I want to shift the mechanics and feeling of the system to better reflect the words I’m highlighting.
Diction aside, how to actually make the system explain itself to SRS-naive readers? For the moment, I have no idea—and I’m inclined to punt. My intention for the near future is to focus on demos with carefully prepared texts, which means I can at least partially follow Quantum Country’s pattern. I think I’ll put together a sharp paragraph of introduction, which I’ll give to authors to present “in their voice” in the text (rewriting as necessary). I’ll link there and in the UI to a longer, essay-like explanation “on Orbit’s site.”
Longer-term, the “right” onboarding path will depend a great deal on context. If I’m exploring author-integrated mnemonic essays, then the author should introduce the medium, and I’ll help them do that. If I’m working on texts to be used in the context of a course or program, then the medium should be introduced by the facilitators, and I’ll help them do that. If I’m experimenting with sharable user-generated “layers” on arbitrary texts, I’ll need to rely on those communities to write and circulate canonical introductions like Nicky’s, Michael’s, and Gwern’s, or to refer to those. Maybe some canonical YouTube introduction videos will get made at some point—perhaps when I start working on mnemonic video?
What’s next
Apart from the more conceptual discussion above, I have some mundane design issues to resolve. For example, people expected prompts they saved to appear somehow in the prompt lists at the end of chapters. And when people went out of their way to cherry-pick relevant prompts in IMS, they were confused that inline review still contained every prompt from each section. The behavior of the “skip” button made sense to pretty much no one. And so on. These issues all seem tractable, and I’d like to resolve them before I do any more tests.
Then I’d like to explore some of the middle ground between IMS and Shape Up: less-formal “explainers” on technical topics, and serious texts on less technical topics. I’d also like to find a more authentic context to test the system, one where the readers really need to learn this material, perhaps as part of a self-motivated program or course.
Meanwhile, I’ll start producing design concepts around the more conceptual problems and opportunities I’ve described above. We’ll see where that goes.
----------------
I’d like to thank Taylor Rogalski for helpful discussions around prompt-centric vs. idea-centric interaction design. My thanks also to Hammad Bashir, who joined me in implementing this most recent design.
2022-10-14 19:00:39 +0000 UTC
View Post
Last month’s mnemonic medium design is now a live prototype. After a dozen live observation sessions with test readers, I’ve already learned a great deal—but I need more sessions and more distance before I can write thoughtfully about what I’m seeing.
For now, I’d like to share the prototype with you all. There’s a lot more design work necessary to make these elements introduce themselves to readers, so I explained the interface before handing control over to my test readers. It may only make sense if you’ve seen last month’s video. At this stage, I’m interested in how “an experienced user” would interact with these primitives.
The prototypes
First, as in the June prototype, I adapted two chapters of Ryan Singer’s Shape Up, a book on product management (subtitle: “stop running in circles and ship work that matters”). I chose this book because the topic is heuristic, contingent, a matter of opinion; the voice is informal; readers will have different backgrounds and views on the subject. And yet I think spaced repetition prompts can still really help readers internalize the material more deeply.
Click here to read the Shape Up prototype. I’m linking to the introduction here because it provides necessary context, but the prototype interface won’t appear until the first “real” chapter, which you’ll access from the bottom of the introduction. Chapters 2 and 3 are adapted for this prototype.
I’ve paired Shape Up with a chapter of OpenIntro’s Introduction to Modern Statistics. This is a formal textbook on a technical topic, like Quantum Country. But I don’t think Quantum Country’s design would work here: readers will have a wide range of prior knowledge, and perhaps a stronger opinion on which subtopics they might like to read. Click here to read the chapter. It’ll work best if you have some hazy memory of statistics from school days, but you’re quite rocky on the details.
These two texts differ substantially in voice, and in how much they demand of readers. Likewise, the adaptations use the mnemonic medium in different ways. Echoing its more relaxed stance, Shape Up embeds a small list of “takeaway” prompts at the end of each chapter, alongside subdued marginal prompts which offer more detail. For the statistics textbook’s explicitly instructional stance; I’ve interleaved embedded reviews like Quantum Country’s as the default path, but with extra affordances for more control.
On testing
If you’d like to play tester, I’d be grateful. A few words on that:
Bug reports are fine, but I’m mostly interested in how the interactions make you feel.
Likewise, if there isn’t some live, authentic reason for you to learn the material in these books, then the “right” way to interact with the interactive elements is to ignore them all! That’s totally fine, of course; it just means you’re not really my target audience for this test.
Stream-of-consciousness logs are very helpful. If you’re willing, consider opening a text file and just typing whatever thoughts and feelings come to mind as you read and interact with the interface. Alternately, you might start a screen recording and talk aloud as you read.
If you send feedback, please give me some context on your background and interest in the book’s topic, and on your background with spaced repetition.
The prototype doesn’t sync with the live Orbit server, but you can downlod your session as an Anki deck by clicking the Orbit icon in the bottom corner.
————————
I’d like to thank Hammad Bashir for tag-teaming the prototype implementation with me. It was exciting to go from demo video to high-fidelity prototype for something this complex in just two weeks. My thanks also to Adam Wiggins, Andrew Sutherland, Roam Research, and Sana Labs for making that collaboration possible by funding a modest special budget for hiring help.
2022-09-29 02:09:52 +0000 UTC
View Post
Summary: Join me on Sep 24 from 10-11AM PDT to discuss Webstrates, a research project with striking ideas for making software much more malleable. Please read the 2015 paper before joining; see also its many descendants and a good recent video overview. Bring questions, concerns, noticings!
---
Computing pioneers dreamed of creating personal dynamic media. That word "personal" meant, in part, that individuals could make the computational media their own—could co-opt and cobble pieces like they might with paper or wood. This aspiration is about more than "owning your data". It's about owning the verbs as well as the nouns, about having control over the tools as well as the objects they act upon.
Lots of systems tried to make good on this promise. In Smalltalk, everything's objects and messages between objects. If you like what a button does, you can take it apart. You can make your own weird control which sends that same message. You can edit or subclass the button to change how it looks. You can get a list control from Bob and a pen tool from Mary, then cobble together your own little Illustrator. There are no "apps". There are just arrangements of objects, and you can arrange your own.
Hypercard tried to negotiate similar ideas in a much more accessible setting. OpenDoc and OLE tried more restrictive tactics to reach a broader audience. None of these approaches has worked out. Today's prevalent solutions—one-off plugin systems for competing mega-apps—are arguably a step backwards from OpenDoc/OLE, where at least plugins could be used in multiple host apps.
But that argument makes it seem like this is all about scale and efficiency. For me, at least, that's not it. I want to be unreasonably demanding. I want to find—or make—one very sharp tool for each subtask and reuse it in combination with the other sharp tools I find, without boiling the ocean. We've roughly got this figured out for command line tools (the UNIX philosophy and all), but we never got there for visual interfaces.
---
Enter Webstrates. Webstrates has some powerful ideas about this problem! I want to talk about those ideas and their limitations. But I also want to talk about methods.
Listen, I make a lot of negative comments about academic HCI. But the 2015 paper introducing Webstrates is everything I want to see when I read design research. Klokmose et al crisply take apart the problem, create a conceptual framework for a solution, instantiate those ideas in a very clever implementation, and then play it out in a variety of meaningful case studies.
It's a great example of how powerful it can be when designers communicate in terms of ideas rather than in terms of specific systems. A typical designer's tweets about "their new thinking tool" focus on the tool itself—how easy, cool, powerful, etc it is. It's hard to accrete insight that way.
The Webstrates paper has a totally different stance. It's about the ideas, not about that team's system. It says: "Look, here are all the ways in which this problem is hard. A solution would probably have properties A, B, C. One way to achieve those is W, but that has problems X, Y." It makes a contribution, then equips you to build on it to make another. (Funny analogue there to the personal dynamic media dream…) If we had more work like this, I think we'd make some real progress.
So. Read the paper; maybe watch Klokmose's recent talk. Then come discuss it with me this Saturday! For extra credit, perhaps check out some of its descendants—I particularly recommend the Codestrates papers and "What Can Software Learn from Hypermedia". If you're interested in systems which try to address some of Webstrates' key limitations, see Varv and Philip Tchernavskij's Tangler.
I'll present some opening material to get the ball rolling, then we'll open the floor for discussion.
2022-09-21 01:23:04 +0000 UTC
View Post
In this talk, I present a new round of primitives for the mnemonic medium, focused on the issues and opportunities I distilled out of the last round of prototyping (see that talk here).
For those who aren't interested in this specific project, you may still find it interesting to see me unpack how I think about iteration on the architecture of an unusual design problem like this one.
Feedback is very welcome. I've included the text of the talk below to make it easier to point to passages (but don't expect it to make much sense without the accompanying visuals).
Script
- Welcome.
- Today I’ll be sharing a new iteration of the mnemonic medium.
- Like the designs I presented in May, this work is trying to solve a specific problem: the medium created for Quantum Country felt rigid and unpleasant when used outside of technical primers.
- How might we create a more flexible mnemonic medium—one which could be used in non-technical books, informal articles, reference material, papers, and so on?
- In this video I’ll be assuming you’ve watched the May talk. If you haven’t, I’d suggest watching the first five minutes, and minutes 10-13, for background on the problems I’m trying to solve.
- Now, here’s our plan: first, I’ll quickly recap the core elements of the design I presented in May; then I’ll describe the problems I uncovered through reader observation and design crit; and finally I’ll present a new version of the medium intended to address those problems.
- OK! First, here’s a two minute recap of the design from my May talk. [play 26:05-27:41 from May talk]
- To test this design, I implemented a prototype in the context of a relatively informal product management book called Shape Up.
- I chose this book because I thought it would exacerbate many of the problems I’ve seen with the medium.
- The material is often contingent, arguable, heuristic, a matter of values or opinion.
- Readers’ experience with this book will vary dramatically with their background: relatively inexperienced test readers will want to study it very carefully, while more seasoned readers will skim, mining for pearls to carry away.
- It’s not formal, not technical; and yet it’s a serious treatment which, for some readers, will reward study.
- In June and July, I observed a dozen people read the prototype book live, while speaking their thoughts and reactions aloud. I also received asynchronous think-aloud diaries from a few dozen additional readers, and benefited from thoughtful crit sessions with some generous designer friends.
- What I learned from those readers was encouraging—I think the medium can be made to work in a context like this book, and I think doing that would be extremely valuable to many readers.
- And yet, unsurprisingly, these sessions also uncovered many problems. Some might seem like surface-level design issues—certain things just need more polish—but the troubles I’ll be talking about today are truly structural. The right way to fix them is with changes to the conceptual architecture of the medium.
- Now let’s dig into some of those problems. Broadly, they can be divided into two groups: first, issues with the peritextual Orbit markers and sidebar, and second, issues with the embedded review boxes.
- The Orbit markers and sidebar presented an interconnected series of issues:
- First, a broad observation: the prompts feel unnecessarily “far away” from the text.
- That’s true literally, if you’re using a large monitor—they’ll be horizontally separated by a big white space.
- But it’s also true conceptually: there’s a surface boundary in the way. The prompts are trapped in this sidebar box, separated from the text they reference.
- Niko Klein pointed out to me that if you watch how people actually mark up their books, they’re not nearly so orderly. Their annotations happily impinge upon the text.
- In this design, Orbit prompts are timid, hesitant to touch the text. There are markers in the margins, but no indications in the text itself of connections to associated prompts. Readers must mentally draw connections between the prompts and the source material.
- Sometimes that’s appropriate: some prompts synthesize or summarize from a high level, and there’s no manageable length of text one could visually associate with them. A certain indirection is necessary.
- But plenty of prompts focus on verbatim phrases from a text passage. It feels strange to make that connection so indirect.
- Another issue of inappropriate indirection: for Orbit markers which hold just one prompt, the distinction between marker and prompt feels far too heavy.
- I notice this marker here; I click it—and it reveals both this single prompt over there, and also a contextual menu here with a single button to save this prompt.
- But the prompt itself also has a button I can click to add it.
- Here we have two different abstractions which both effectively represent the same conceptual object. At least in this case, the distinction between these two roles isn’t meaningful enough to warrant the doubling. And the connection between them is inappropriately indirect, indicated only by the prompt’s colored trim and its rough spatial position.
- Can’t these two things be the same thing?
- Of course, the main reason I distinguished between Orbit markers and the prompts themselves is that in many cases, Orbit markers represent several prompts, collectively reinforcing the same concept from different angles.
- The first chapter of Quantum Country has 112 prompts. You wouldn’t want 112 of these markers floating in the margin, particularly since many of the prompts are about the same idea.
- Bulk operations are important for controlling the interaction cost of the medium. That’s really what motivated these Orbit markers—bulk operations.
- That actually illustrates another issue with the markers: they feel subtly like they’re hiding information. Does this marker represent one prompt or six? I have to interact with it to find out. But that information is a useful cue: if there are lots of prompts here, that tells me this is an important passage. If I find this passage only mildly interesting, I might skip right over a six-prompt marker—too much bother—but I’d consider a one-prompt marker.
- Perhaps in part because of the way that markers “hide” the prompts on this separate surface, many readers found themselves fiddling with the sidebar, uncomfortable both with leaving it pinned open and with leaving it closed.
- Finally, the markers create unintended asymmetries—which are often a sign for a designer that you haven’t got your abstractions quite right.
- The first asymmetry is that if you write your own prompt, there’s no indication of that in the text itself. The marginal Orbit markers are only for the author’s prompts.
- A second asymmetry falls out of this: the interface readers get for writing prompts can’t be the interface authors use for writing prompts. In the May design, authors also need to place the Orbit markers and define how the prompts are divided between them. Asymmetry aside, that extra work seems unfortunate and perhaps unnecessary.
- The markers and sidebar work in tandem with review boxes embedded directly into the text. These elements had their own separate conceptual problems.
- As a reminder, in the context of Shape Up and other relatively informal texts, the idea was that review boxes placed at the end of a chapter would contain a curated selection of the most important ideas—maybe 5-10 prompts. But readers could opt into saving much more detail while they read by clicking Orbit markers next to passages they found particularly important. These prompts would be added to the review box at the end of the chapter.
- One problem with this scheme was that the review box’s behavior confused attentive readers who saved extra prompts while reading. Those embedded reviews include both the prompts curated by the author and also the prompts you yourself saved, without any clear distinction.
- It makes sense for these review boxes to show a curated selection of prompts as a sort of guided scaffold, but including reader-saved prompts muddies the conceptual identity of the review box as an authored object.
- Worse, when you’ve gone to the trouble of evaluating prompts as you read, choosing which to save—and more importantly, consciously deciding not to save some—it feels intrusive to have extra prompts inserted into the review, because it feels ambiguously like the point of the review is to look at the prompts you saved.
- Apart from this confusion, the reviews felt “too soon” for some readers.
- We didn’t get this kind of feedback with Quantum Country. One reason for the difference is probably that the material in Shape Up is much more straightforward. You’re less likely to have forgotten so quickly.
- But there’s a deeper issue: if you’re examining and evaluating prompts in the sidebar as you read, that means you’ve already done one informal review by the time you reach the embedded review box. It makes sense that this would feel too soon—you already saw those prompts just a few minutes ago.
- This is very different from the experience in Quantum Country, where you might have just read the text a few minutes earlier, but you haven't seen the actual questions before your first review.
- Other readers were highly mercenary. Much of the material was already familiar to them, and they were scanning for useful pearls they could take away. These mercenary readers found an embedded review session overly demanding, even though it can be switched to a list or simply skipped. The implication that they’re “supposed” to review the material felt misaligned with their stance towards the book.
- I think the meta-problem here is this: with the last prototype, I started by assuming Quantum Country’s basic design and added new primitives to handle more situations.
- But that led to a conceptual rift of sorts. There’s the world of the sidebar and the inline prompts, and then there’s the world of the embedded review box. I built a bridge of sorts between the two, but they weren’t yet conceptually unified.
- With this new design work, I explored what would happen if I didn’t start by assuming an embedded review box. Instead, I tried to design first for readers of a book like Shape Up, then I worked out how those design elements might be elaborated to accommodate a more demanding textbook like Quantum Country.
- With that, let’s take a look at a new iteration of the mnemonic medium, where I’ve tried to address all these issues.
- As in the May talk, these designs will assume that the reader is already familiar with the medium and with spaced repetition. The designs will require extra cues and flows for onboarding new users, but I’ll focus on those once I’m happy with the primitives.
- First and most dramatically, I’ve unified three primitives—the marginal prompts, the Orbit markers, and the sidebar—into one.
- A subdued prompt representation lives directly alongside the main text, rather than in a separate sidebar.
- When there’s a single prompt associated with a passage, we don’t need any kind of bulk indirection. We just see that prompt directly.
- Its full contents are unfurled on hover, and we can simply click it to save it to our Orbit.
- We’ll see it in our next review session, or we can start a review anytime through the floating Orbit menu.
- This prompt reinforces a verbatim phrase in the text, so that phrase is now highlighted. Hover interactions reinforce the association.
- Not all prompts will be associated with a specific phrase like this, but we can surface that connection, where it exists.
- In previous designs, I tried so hard to make the prompts unobtrusive, so that they wouldn’t bother readers who didn’t care about memorizing every detail. But the insight which really unblocked me here was that once a reader has taken action to save a prompt, that prompt can absolutely assert itself visually. When you mark up your own book with a colored highlighter, you don’t need to worry about the mess which you yourself make.
- I believe bulk interactions are still important for the mnemonic medium. Most of the time, readers will want to indicate their interest in a concept rather than in a specific prompt—and good coverage of a concept will often require several prompts.
- In this iteration of the medium, overlapping prompts are collapsed into a “bulk” representation alongside the text.
- Without any interaction, we can see how many prompts are involved. If we move our cursor over here, we can quickly save all the prompts with one click, or more selectively evaluate which we might like to keep.
- The authoring experience can now be made symmetrical for both readers and authors: if we add a prompt of our own, it’s presented alongside prompts we’ve saved from the author, with the same highlighting affordances in the text. No need to place Orbit markers and define their associations with prompts.
- A tighter connection between the text and the prompts lets us play one more trick: if we’re excited by this phrase, and we find ourselves wanting to internalize it, we might select the text as if to write a prompt. But if it’s something important, the author may have already provided a prompt about that idea. So when the anchor of an authors’ prompt intersects a user’s selection, we surface the author's prompt and make it easy to save.
- And of course, you can edit the author’s prompt to better match how you think about the idea, if you like. Editing is often easier than creating.
- Obviously, this style of floating prompts isn’t going to automatically work in every web layout.
- In simple cases, like where there’s a sidenote or figure in the margin, we can try to automatically shift the prompts to dodge those elements.
- In cases where the body text is laid out tightly against the edge of the screen—and, of course, on mobile—we’ll need some kind of “collapsed” representation.
- And the user should be able to hide these things if they’re feeling obtrusive.
- But at least for the moment, I’m interested in exploring the best case, where the author lays out their text with this medium in mind, or perhaps where someone has create a user stylesheet which can adapt the site’s layout appropriately. I’ll be interested in dealing with less than ideal cases once I’m convinced that the ideal cases are really working.
- So those are the new floating prompt primitives, replacing the Orbit marker, the sidebar, and the sidebar prompts.
- Now, these are all secondary interactions, occupying the same structural place in the hierarchy of the text as marginal sidenotes would. They offer extra depth to highly attentive readers, without asserting themselves in the primary linear flow of the text.
- But just as figures can appear either in the margin or within the body of a text, it makes sense for authors to be able to present prompts both as marginalia and as objects in the body of the text—for instance, at the end of a chapter, or after a passage summarizing the takeaways of a section. How often and how many prompts should be presented this way will depend on the book and on its intended audience.
- In the case of Shape Up, I think it would be quite valuable to offer a short list of curated prompts at the end of each chapter, covering the most important ideas.
- Because this book is relatively informal, and its material is quite intuitive, I don’t think we need to strongly push readers to review these prompts immediately by default, as the previous iterations of the medium did.
- The trouble with a book like this—as I’ve learned talking to readers—is not in understanding the content while you read, or in remembering it later that day. The trouble is remembering the details days or weeks later, when you first try to apply them.
- So the primary job of the mnemonic medium in this case should be to make it easy for people to circle back to the key points in review sessions on a future day.
- Here’s how that might look: a lightweight module which can be integrated directly into the body of the text.
- Structurally, this module is similar to the kind of summary callout you might find in a printed book, except of course that it’s part of a larger interactive workflow.
- I might scan my eyes over these prompts, decide they look generally good, and click to save them all.
- Or I might do a quick triage, scrubbing my mouse down the list to take a look at each, adding those which seem surprising or meaningful.
- Note that this scrubbing interaction doubles as an extremely lightweight informal review.
- This module’s contents are defined by the author. It doesn’t try to double as a list of the other prompts you’ve saved earlier in the text, as the previous design did.
- Now, if I’m studying this book seriously, and its material doesn’t feel so intuitive, I probably would benefit from doing a quick review right now.
- I can click here to make that happen.
- And now the review experience doesn’t need to be trapped in a little inline block—we can create a focused modal context.
- I can review as normal, or skip prompts which don’t interest me.
- I’m only reviewing the prompts I was just looking at in that box—other prompts I’ve saved aren’t mixed in.
- But once I finish reviewing, the interface inquires whether I might like to additionally review the other prompts I’ve saved so far.
- If I’ve opted into reviewing these curated prompts, I’m probably reading carefully, so I probably want to review everything else too. But by making this a separate “phase” of the review session, we avoid the confusion introduced by the previous review box’s design, which mixed curated and saved prompts together.
- After the review session is complete, the new prompts I reviewed are automatically saved to my Orbit. That's probably the most common desire for someone who’s opted into inline review. But of course, I can undo this behavior or modify the set of prompts I’ve saved in the list below.
- By embedding prompts directly into the body of the text, rather than into the margin, authors can be a little more assertive. It’s a way to say: hey, you should probably consider carrying some of these prompts with you. The reader gets the sense of a “default.” More attentive readers can up the stakes by choosing to review that material on the spot.
- But for more formal instructional texts like Quantum Country or this stats textbook—particularly where the material is more difficult to grasp—authors should be able to express a more demanding default. They should be able to say: reader, you should probably review the key ideas from this section, right now.
- For that situation, the mnemonic medium's traditional embedded review box still makes a lot of sense.
- These interleaved reviews do more than just reinforce the material on the spot. Many readers tell us the embedded reviews make them slow down and read—or reread—more carefully. Other readers tell us the reviews create a feeling of safety: they’re in the author’s hands, and they know that they won’t be able to read too far ahead of their understanding.
- More subtly, I suspect that inline review anchors the prompts in the context of the broader explanatory prose, so that when you see a prompt, it brings to mind that section of the text and all the connected ideas within. That may help avoid a key problem people have with separately downloadable spaced repetition decks: a feeling that the prompts are disconnected, atomized, or arbitrary.
- So we shouldn’t abandon the idea of embedded review. It’s a significant imposition on readers, but it also offers significant benefits. When those benefits are likely to be higher than the costs for the primary audience of a text, the author should make interleaved review the “default.”
- But we can still refine that primitive.
- In this iteration of the design, embedded reviews include only a set of prompts curated by the author, rather than intermingling any extra prompts you might have written or saved. But if you’ve added some extra prompts besides those, we’ll ask the reader if they’d like to tackle those after they finish reviewing the curated prompts, just as we saw a few minutes ago in the modal reviews.
- As before, when you finish reviewing, those prompts you reviewed (but not those you skipped) are automatically saved. And you can easily undo this.
- Readers who have some familiarity with the text’s materials may not want to bother with this sort of guided review. They can switch the inline review module to the prompt list module we saw previously, and save any prompts which seem unfamiliar.
- In a text like Shape Up, which is relatively informal, and whose readers will vary enormously in background and interest, it makes sense to offer multiple levels of detail in the prompts.
- The prompt lists at the bottom of the chapter are tightly curated to include the most important ideas—things which any serious reader should probably take away from the book. Readers interested in more detail or in secondary ideas can save prompts about those ideas as they read using the floating prompts.
- But in a highly guided textbook, I think it would usually make more sense for the embedded review module to include every prompt from that section. Readers will tend to express their interest by choosing which sections they read and review, rather than by picking and choosing individual prompts.
- It may not make sense to distract readers of highly guided textbooks with floating prompts—they’re going to review all those prompts anyway, in a few minutes.
- And so perhaps authors can express a site default for this floating prompt visibility setting. A textbook might set this to show only saved prompts by default, so that floating prompts won’t distract while you read initially, but after you’ve completed that section’s embedded review, you'll see the saved prompts in context above.
- Of course, a reader could change that setting away from the default if they prefer.
- In summary, this design collapses the Orbit markers, sidebar prompts, and sidebar into a single floating prompt primitive. I’ve more crisply defined an embedded prompt list primitive, and refined the behavior of the embedded review primitive.
- Next up, I’ll do another round of design crit and revision, then build out this prototype so I can learn how it makes readers feel and behave. Your feedback is very welcome.
- I’d like to thank all the readers who tested the most recent prototype and provided feedback, synchronously or asynchronously. And I’d particularly like to thank the designers who generously offered extended crit: Cameron Burgess, Shan Carter, Gray Crawford, Joe Edelman, Niko Klein, Marisa Lu, Rob Ochshorn, and Yiliu Shen-Burke.
2022-08-26 20:34:06 +0000 UTC
View Post

Another event experiment!
This Sunday, I'll be hosting a casual unconference for patrons, 11AM-12:30PM PDT in Gather, a spatial video chat platform. To join, visit this URL in a desktop browser on Sunday.
If you've never attended an unconference before, the big idea is that the attendees create the schedule. You are invited to host a discussion circle about a problem you've been thinking about, or to host a paper reading group, or to give a talk, or whatever you like.
To host a session, sign up for one of the time slots at one of the tables on the unconference schedule here. Just give a short description of what you have in mind, and include your name.
No pressure! Casual, birds-of-a-feature type discussions with no real preparation are often the most interesting unconference sessions.
See you Sunday?
2022-08-16 01:49:19 +0000 UTC
View Post
Relatively soon—maybe in our lifetimes, maybe next century—we’ll have full duplex input and output links from our brains to computers. We’ll have synthetic telepathy. We may find ourselves augmented with an effectively unlimited working memory. What will “collective intelligence” mean when the boundary between your thoughts and mine can be controlled by software?
The details here will depend enormously on understandings of the brain which we don’t yet have, and on the physical limits of neural interfaces which haven’t yet been developed. Skating to that particular puck might feel too outlandish—the fog of uncertainty just too thick. So, could I interest you instead in some more tractable neighbors? If meaningful brain-computer interfaces (BCIs) are decades out, consider: what “poor man’s BCIs” might we pursue in the meantime? What “apps” might we build with them?
With a true BCI, you could compose a manuscript silently, at the speed of thought. I don’t have a true BCI, but I do have wireless earbuds. I enjoy going on long walks with an audio recorder running continuously. I’ll talk through research problems while I roam about the city, or while I stare at the ceiling on my couch. I also use the earbud microphone when I’m curled up in an armchair with a paper book, where there’s often no good surface for my notepad. A running audio track can capture my thoughts as I read. My pipeline will even let me dictate spaced repetition prompts mid-recording. But when my wife’s at home, I stop my chatter to avoid being a nuisance. Likewise, I often want to linger at a café or library on my walks, but dictation would be unwelcome there.
What if I could talk to my computer without making sound? This is the premise of silent speech interfaces, a family of sensor systems for interpreting spoken input without audible vocalization. Details vary, but the general idea is that you go through the motions of talking without engaging your vocal cords. These interfaces don’t quite operate at the speed of thought, but ubiquitous, unobtrusive, screenless input—even at the mere speed of speech—strikes me as a plenty interesting “poor man’s BCI”. In this overview, I’ll offer an opinionated look at the field from the perspective of practical consumer design opportunities.
Reading through the literature, my sense is that silent speech interfaces are on the cusp of tractability. Now looks like a promising time for an inventive technologist to step in. In particular, I notice that most publications are focused on restoring communication to people with speech disabilities. That’s wonderful, of course. But it also means there’s ample space for creative designers to envision how healthy consumers might use these interfaces in everyday contexts.
More importantly, researchers have only recently begun to use modern machine learning techniques in silent speech interfaces. When I started my literature review, I was delighted to find a recent book-length overview: An Introduction to Silent Speech Interfaces. The book provided a helpful survey of the various sensing techniques which had been tried, but the error rates and form factors left me pessimistic. On a whim, I thought: well, the book’s from 2017; let’s check what’s happened since then. Wham! The field started using deep learning in earnest! As we’ve seen in domain after domain, deep learning excels at processing noisy signals with structured regularities—regularities like human language. State-of-the-art error rates suddenly look quite promising. And when fidelity improves, the design parameters of sensors become less constrained.
Sensing modalities: an overview
So you’re talking, but without talking. How can we possibly interpret this sort of speech? I’ll begin with a schematic overview, then we’ll dig into specifics in the modalities which seem more feasible.
Neural. Speech starts in the brain. We can intercept those cortical signals with intracranial implants, or with sensors arranged externally against the scalp. I don’t expect these systems to become relevant for consumers anytime soon. Implants require surgery, and scalp-based EEG sensors are cumbersome and too lossy at present for arbitrary speech (see e.g. Gonzalez-Lopez et al, 2020).
Muscular. From the brain, signals associated with speech travel to muscles in our jaw, lips, tongue, and throat. We can intercept the electrical activation of these muscles with electromyography (EMG). In some locations, we can use “surface” EMG sensors placed against the skin above those muscles. These are highly sensitive, but—since they need to be mounted on the skin—fairly obtrusive. Alternatively, we can measure the motion of these muscles with accelerometers, magnetic sensors, and piezoelectric sensors. Or we can measure that motion indirectly through imaging: video, ultrasound, radar, and so on.
Acoustic. Finally, all that muscle activity results in speech. Or at least, it would if you were speaking normally. If you don’t vibrate your vocal cords, your speech will be very quiet—but perhaps still interpretable by sensitive microphones or vibration sensors.
Let’s take a look at some specific systems which seem more promising for near-term consumer applications.
Visual speech recognition, a.k.a. lip reading
Neural networks can recognize gestures through solid walls (Li et al, 2019). By comparison, lip reading seems like a piece of cake! The technical term of art for this task is “visual speech recognition”. No surprise: the field has made rapid progress, most recently doubling accuracy against a standard benchmark over just three years (see review by Sheng et al, 2022).
One way to evaluate the accuracy of text input systems is with the “word error rate”, which is defined as the number of errors (substitutions, deletions, and insertions) divided by the number of words in the original speech. For example, suppose I speak ten words. My transcription software misses one completely and mis-identifies the word “total” as “too tally”. The word error rate in that example would be 0.3—one deletion, one substitution, and one insertion, divided by ten words. For reference, the Android dictation service has a word error rate of around 0.2-0.3 (Koenecke et al, 2020). Note that lower scores are better for this metric.
The current state-of-the-art model (Prajwal et al, 2021) achieved word error rates of 0.23-0.31 against a data set of subtitled footage from BBC programs and TED talks. So at least for professionally-produced footage, we’re already at roughly the accuracy of consumer speech recognition software. There seems to be plenty of room for improvement: this is a relatively small model by deep learning standards, trained in two weeks on 4 GPUs. And they’re using GPT2 as an auxiliary language model to choose from candidate sentences. Presumably newer models would perform even better.
The main technical limitation I see is that this model would need careful tuning and compression to run in realtime, rather than as an offline batch operation. But my impression is that this looks quite achievable.
Speaking more practically, suppose we have reliable lip reading. What does this mean, in terms of form factors and contexts? How would we actually deploy it?
One obviously relevant posture is the standard smartphone stance: one arm outstretched, face awash in the screen’s sickly glow. The front-facing camera is in a great position to read your lips. But if you’re holding your phone anyway, I’m not sure this buys us much relative to a software keyboard. One advantage is that you wouldn’t need to actually look at the screen. This seems fairly meager.
I can also strew cameras around my house, pointed at where my lips might be. If my couch or armchair faces a television, its built-in camera might do the job. But I certainly don’t like the aesthetic of constant surveillance. For what it’s worth, I don’t buy “smart devices” if I can help it, and I install camera covers on my devices’ built-in cameras. This system would need to be extremely valuable for me to accept an always-on camera in my home.
One more intriguing angle: what if the camera’s mounted on my glasses? Elgharib et al (2020) demonstrated a system which infers a front-facing video feed from a wide-angle camera mounted sideways on the arm of a pair of glasses. It works remarkably well, after per-subject training.

I’m already willing to put on a wireless earbud when I’d like to use dictation. It’s easy to imagine putting on a special pair of glasses, or an attachment for my glasses, when I’d like to use silent speech.
Hearing very quiet speech
Another promising route is much more boring: what if we make a system which listens very carefully—so carefully that it can hear what you’re saying, even when other humans can’t?
This isn’t a new idea. The company Jawbone got its name from the technique: their flagship product used bone conduction microphones to improve speech quality. The military uses throat-mounted microphones (akin to stethoscopes) to improve signal in noisy environments like helicopters. Unfortunately, most of these systems still require the wearer to speak audibly. They just improve the quality of the resulting audio.
What if you could whisper? It’s probably good enough for a library or café. In 2017, Grozdić and colleagues achieved high accuracies for whispered speech recognition with ordinary microphones—albeit in the ideal conditions of a recording booth. Throat-mounted microphones should help in noisier environments, and early adaptations for whisper recognition look promising (Jou et al, 2004).
A whisper would still be a nuisance in my tiny home if someone else is in the same room. Also, extensive whispering may actually harm your vocal folds (Robin et al, 2006). One interesting alternative is SilentVoice (Fukumoto, 2018), which recognizes ingressive speech. In this format, you speak as if you’re whispering, but you move the articulatory muscles while inhaling rather than exhaling, and you don’t vibrate your vocal cords at all. This style of speech is much quieter than a whisper—practically inaudible to another nearby person in a silent room. Fukumoto achieved error rates of 0.1-0.24 with roughly 30 minutes of per-speaker adaptation data.

What I like about these modalities is that they use boring, widely available hardware, deployed in a convenient and unobtrusive format. Machine learning systems for audio-based speech recognition already run in realtime and are widely deployed. I’m a little less willing to put on a throat microphone than an earbud, but it’s not a deal-breaker.
Other modalities
If you dig into the literature around silent speech interfaces, you’ll notice that discussion overwhelmingly focuses on modalities I’ve ignored, rather than the ones described above. As far as I can tell, this is mostly due to a cultural focus on restoring communication to patients with speech disabilities. Much of this work is quite interesting, though in my view it’s much less applicable in consumer contexts.
One celebrated example is AlterEgo (Kapur et al, 2018), which recognizes speech by detecting the electrical activation of muscles in the face, using surface-mounted sensors. Because it requires no involvement of the vocal tract, it may help patients with disabilities serious enough to prevent them from using microphones or lip-reading devices (Kapur et al, 2020). But AlterEgo, like other myographic sensors, requires obtrusive placement on the face, and its lexicon is limited to a small number of pre-set phrases.

Perhaps I can interest you in TongueBoard (Li et al, 2019), a device in the form of a dental retainer studded with electronics? It tracks the motions of your tongue using capacitative touch sensors. Its discriminatory capacity is quite limited, with a supported lexicon of 15 words.

Unlike lip-reading and microphone-based modalities, both AlterEgo and TongueBoard enable invisible speech. You can mouth the words without opening your mouth. Good for spies!
If you’re game for intracortical implants, you can get excellent results these days. The state of the art system (Willett et al, 2021) manages 90 characters (~18 English words) per minute at very high accuracies. This is great news for paralyzed patients, but less relevant to consumers.
There are of course many more types of sensor systems, but you get the picture. None of these seem promising for consumer applications anytime soon.
Opportunities for silent speech
What would we do with a silent speech system? What does it enable that normal dictation systems do not?
We’ve already discussed unobtrusive note-taking while reading on the couch or in crowded environments. What about unobtrusive note-taking in a social setting? I bring a paper notebook to meetings in part because using my phone or laptop feels rude. I can imagine that physicians would love to take silent notes while performing an exam. I’d love to jot silent notes while on a walk with a friend—my preferred meeting format.
Full BCIs will enable synthetic telepathy: you and I will be able to chat with only our minds. Silent speech interfaces would enable a simpler form of this. With wireless earbuds providing ubiquitous unobtrusive audio, we have hands-free, screen-free, silent, bidirectional communications. What will we do with that? Well: as Vernor Vinge depicts in Rainbows End, classrooms will be utterly out of control. More seriously, I think the march towards increasing fidelity and ubiquity will continue to blur the lines between individuals. When the group chat is always playing in my mind, things will get weird. As someone who already struggles with the chaos of text-based group chat systems, this doesn’t exactly appeal… but it certainly intrigues.
More broadly, I have a sort of blind faith here. Progress in personal computing is so often presaged by new input and output modalities. The mouse unlocked the desktop GUI; trackpads enabled mainstream laptops; the touch screen sparked the mobile revolution; haptics give us screenless turn-by-turn directions; e-ink screens give us digital books on the beach; commoditized projectors make Dynamicland possible; etc etc. We don’t know yet what devices like Leap Motion or laser eye trackers are for. Likewise, I don’t know exactly what we’ll do with silent speech interfaces. But I think it’s worth finding out.
2022-07-31 21:42:51 +0000 UTC
View Post
[Edit: oof, I accidentally recorded with no sound! Apologies! Thankfully, all who sent me questions came live, at least. We'll try something different next month…]
Hello, all! This Friday (July 22) at 2PM PDT, I'll be hosting a live Q&A session. [Add to your calendar]
I'm happy to (try to) answer questions about whatever you might like: design, engineering, cognitive psych / learning science, independent research, memory systems, new media, etc.
Here's the Google Meet link! I̵'̵l̵l̵ ̵r̵e̵c̵o̵r̵d̵ ̵t̵h̵i̵s̵ ̵a̵n̵d̵ ̵s̵h̵a̵r̵e̵ ̵a̵ ̵v̵i̵d̵e̵o̵ ̵a̵f̵t̵e̵r̵w̵a̵r̵d̵s̵.̵ (oops)
Send me questions by commenting on this post, sending me email, or showing up for the live session. I'll make space for live question-askers and discussion around the questions.
Background rumination
I'm trying this Q&A format in lieu of office hours this month. I've been glad to run the office hours experiment, but I'd like to branch out and try some other formats—I can't help feeling those sessions are stuck in a bit of a local minimum.
I was hoping that people would bring interesting work-in-progress I/others could dig into and learn from, but it seems like most folks are more interested in more informal requests for advice or information. That's fine, of course, but I bet it'd be more valuable to more people in a recorded, write-in format—the live format really shines for demos, fine-grained back-and-forth, etc.
I also get the sense that some office hours attendees like just being able to hang out with other people interested in "weird new computer ideas". I wonder how best to support that sort of thing in an ongoing way. I know Discord and Slack are popular in some communities, but I haven't been able to make them work for me—too scatterbrain-making. Maybe intermittent coordinated gather.town hangout time or something? I'm curious if other folks have ideas or have seen this done well elsewhere.
Other comments and ideas on event formats are very welcome!
2022-07-18 21:12:34 +0000 UTC
View Post
I designed something new, and I’ve spent much of the past few weeks watching people use it. Those sessions have been generative and energizing. They always are. There’s something magical about this part of the design process, in both the sense of delight and of mystery.
Rocket scientists have it easy. Well, at least along one narrow axis. When you’re trying to launch a rocket, you can’t just sum up its mass and add enough fuel to lift that mass into orbit. The rocket fuel itself adds mass, too. So you add enough extra fuel to launch the rocket, plus the fuel, and then you add enough to launch the extra fuel you just added, and so on. Happily, we know how to solve differential equations, and this infinite loop has a nice closed-form solution.
Design is loopy in a more inconvenient fashion. If you’re designing a new sort of artifact—say, the automobile—you can’t just consider the mechanics of how it moves its passenger from point A to point B. Design is ecological. People who can move at 30mph on demand will often structure their lives quite differently from a pedestrian alternative. When a large group of these people interact, transformed city structures will emerge. And one element of that transformation will be better roads, which might permit 70mph motion, which begets more transformation, and so on.
This loop does not have a predictable fixed point. Sufficiently interesting new artifacts will immediately obviate a designer’s meticulous “user journey” flowchart.
All this is to say: as a designer, it is good to make careful questions and careful theories, and to form artifacts using those theories. Then it is good to step back, to watch people in action, to see how the artifact’s contours reshape behavior and demand new theories. This is not “user testing.” You’re not primarily interested in “does it work?”. You want an expansive, curious stance—excited to redefine the problem statement based on what you see.
In praise of lightness
In March, I arrived at a set of design primitives which seemed good enough to be worth trying. In April, I wrote a talk which unpacked my ideas in various contexts. Iteration on the talk drove iteration on the design. In May, I hacked together a prototype to provide the talk’s visuals, recorded it, and published it. Then I let it sit while I traveled for a few weeks.
When I returned, I watched the talk again and felt an enormous internal… blecch. I thought: yeah, I’m still excited about this solution, but it’s so intricate. There are still no onboarding affordances. It’ll take months of unpleasant drudgery to build it out for a public test.
Over dinner, my friend Rob Ochshorn pointed out something terribly important which I’d forgotten. Consider how the following two scenarios feel:
In the first, you’re sitting someone down in a chair next to your laptop. You show them a design and scoot the laptop over so you can watch how they use it. When stuff breaks, you explain what’s going on, laugh at your scrappy prototype, and reach over them to type some reset command. You issue footnotes in realtime as your user moves through the demo: “oh, yeah, that bit’s not implemented yet.”
In the second scenario, you’re emailing someone a link to a prototype. The email has to somehow explain all those caveats and footnotes you were providing in realtime. You realize, seeing that massive text block, that your recipient will probably skip over all that and just click the link. You can’t smooth over the prototype in realtime when it gets wedged, so you work through the dozens of rough edges on your punch list which aren’t really central to the concept.
I’ve been making scrappy prototypes for years, but somehow, I had forgotten this distinction. Rob observed that I seemed to be aiming for the second scenario (or even a more elaborate public variant of it). He gently asked what it might take to make the first scenario happen next week, rather than next month or next quarter. And I felt this absolutely enormous internal unclenching.
The smoke-and-mirrors prototype I’d made for my talk only worked in that one path I demonstrated. I’d planned to throw it out and build a “for real” prototype that integrated with the existing Orbit system. But a “what about next week?” framing made me take a scrappier path, hacking my talk’s demo prototype into something usable, with a lot of asterisks.
It also made me choose my test audience carefully. My design was too incomplete to put in front of people who’d never heard of spaced repetition. Learning from that audience would take many weeks of new design work, but I realized that I could put a prototype in front of experienced users much more rapidly. Ideally, I want primitives with low floors, wide walls, and high ceilings. Experienced users will help me with the ceilings and walls; I can focus on the floors in future iterations.
I had my first sessions with demo readers that next week, just as Rob had suggested. My prototype broke in all kinds of silly ways, but it didn’t matter, because I was there to laugh at it and smooth over the issues. More importantly, it worked well enough (in my presence, anyway) for people to read the demo text in seriousness—well enough for me to learn an enormous amount about how people interacted with it, and about how it affected their reading behavior.
The following week, I smoothed over the biggest rough edges I’d seen with my in-person observation and held a bunch of demo sessions live over Zoom. Prototypes must be in much better shape to get clear insight from this kind of observation: the rapport is weaker; you can’t easily gesture at elements on their screen; you can’t reach over their arms to fix things. Perhaps more importantly, you miss out on a huge amount of information in body language.
I kept fixing the rough edges people encountered, making the prototype stand on its own with a little more stability. The following week, I shared the demo with a limited, high-context audience (you!). I asked people to record stream-of-consciousness thoughts into a text file. This was a low-effort way to collect a lot more data, though of course each log carries a tiny sliver of the information I gleaned from in-person sessions.
This whole sequence felt just great. The incremental steps freed me from the deep aversion I felt upon my return. More importantly, they kept me from building a much higher-fidelity prototype than was necessary to learn what I needed. I’m in the process of substantially redesigning the primitives now based on my observations. It’s much better to be doing that after a few weeks of refining a scrappy prototype than it would be after a few months of building a publicly accessible one!
The main thing I learned last year by shipping Orbit was: the mnemonic medium won’t work, as originally designed, in most other reading contexts; readers must be given much more control. I think I could have learned that a lot more rapidly and iteratively by building one-offs with a few handpicked authors, rather than building a real “platform” as I did. The solidity of the implementation is higher than the solidity of the design work—that’s never a good idea.
All this is laughably design-101. Yes, dummy, you should make scrappy prototypes and test them and throw them out. On the ground, it’s been hard to reconcile that with my aspiration to explore systems in serious contexts. This month’s prototyping has taught me to pursue a greater feeling of lightness and motion in my work. I’d like to find ways to dodge elements which feel like “infrastructure.” If that means I need to stand there, holding up the retaining walls while I watch you work, maybe that’s fine.
Some surprising observations
Feedback from the last few weeks uncovered some important issues with the primitives which will lead to fundamental changes. But it’ll be much more interesting to talk about those bits once I have new designs in hand. Instead, I’d like to share a few examples of reader reactions which push around the problem definition, the precious and subtle kind of reactions which only emerge once people are actually using a design to do something real.
Induced demand
Here’s another analogue to the recursive rocket fuel weight problem. There’s too much traffic on the highway into your city, so you spend a billion dollars and build four extra lanes, massively increasing capacity. Now there are more cars on the road, but everyone’s still stuck in traffic, and their commute takes just as long. Oops: induced demand. “Pent-up” demand is released by increased supply.
I was fascinated to watch a similar dynamic play out with this prototype.
In this new design, I’ve made the spaced repetition prompts feel more like marginalia. They have an object-like tactility, a permanent spatial position on the page in a sidebar element.

Immediately, almost every single tester found themselves wanting all the other trappings of marginalia and its attendant facilities. People wanted to highlight; people wanted to make clippings; people wanted to scribble non-prompt textual notes. The cropping rectangle pulls still further back. People wanted to pull those clippings out and connect them across readings. People wanted to transclude and link bits into and out of their personal knowledge management systems.
All this makes sense—truly! It’s why I’ve been living with a “personal” mnemonic medium for years. But these requests were suddenly so much more intense. These impulses came up rarely in discussions about Quantum Country and Orbit as it exists today. Add a sidebar with a nod to peritext, and wham: induced demand. People suddenly felt much more sharply what they wanted.
Prompts as structured reading affordances
Another surprise was the way these peritextual prompts strongly influenced the reading experience. More than half of my readers found themselves sometimes scanning the spaced repetition prompts first—using them as a sort of high-level summary. If a prompt seemed interesting, they’d jump into the associated main text.
This makes some sense. Prompts are a sort of compression: they curate and distill the most important elements of the text for reinforcement. But if structured reading is the aspiration, prompts are probably not the best affordance. They contain syntax noise: extra words necessary to frame the point as a question. And often a single point requires several prompts to capture its various angles.
Several testers suggested a comparison to Christopher Alexander’s The Timeless Way of Building, which contains an unusual device for structured reading:

Even if I’m skeptical that spaced repetition prompts are the right primitive, structured reading affordances seem under-explored in the epoch of dynamic media.
Reading on computers is terrible
A huge fraction of my test readers would happily proceed through the demo, expressing interest in the various affordances and augmentations. Then, as we’re wrapping up, they’d say something like: “But you know, I don’t usually read stuff like this on the computer. I read on paper, or on a Kindle. I kind of hate reading on the computer.” Some of these people offered a grim silver lining: “If I could use this system with everything I read, that might be enough to make me want to read on a computer!”
I’m not sure why this didn’t come up often more during Quantum Country interviews. Maybe it’s because this new prototype invites readers to think about generalizing it to everything they read, whereas Quantum Country seems like a one-off.
These people are right. Reading long texts on computers is terrible, and I’ve been idly collecting notes on the problem for years. I’m certainly not happy with an outcome where we all hate reading texts on computers, but we grumpily put up with it because of the nice cybernetics. This is a problem not just for the mnemonic medium, but for the broader project of “what comes after the book?”. I’ve been mostly shoving this problem under the rug, but this consistent reaction has returned it to the spotlight.
If my thesis is “let’s augment readers using the dynamic medium,” I must either make reading on computers not-terrible, or I must bring the dynamic medium to physical reading experiences.
————————
My sincere thanks to all who met with me to try this demo, in-person and online, and to those who sent me notes on their experiences.
Much of the actionable insight occupying my attention has come from generous design crit sessions, rather than from user observations. More on that in a future letter—when I have new design work to share—but I’ll offer pre-emptive thanks here to Rob Ochshorn, Gray Crawford, Niko Klein, Joe Edelman, Yiliu Shen-Burke, Shan Carter, Cameron Burgess, Marisa Lu.
2022-06-30 20:28:42 +0000 UTC
View Post
Making the demo/talk on the new mnemonic medium last month really helped me refine my ideas. The next step of course was to see how people actually behave. I turned the smoke-and-mirrors prototype in the video into something usable—for people already familiar with SRS and the mnemonic medium, at least.
I've been learning a great deal from live sessions with test readers in the past week (thanks to some of you for that!), and from crit with designer friends. So (as always), this prototype is already out of date, fails to rise to my own present vision.
But I thought I'd share it with you all anyway: click here to give it a try. Don't share that link externally. I don't have permission to redistribute that book, and I haven't yet implemented the rewriting proxy scheme I'll use to skirt that issue.
The prototype adapts Shape Up, a book on product management and strategy. If that's highly relevant to important projects in your life—i.e. it's a book you'd read under normal circumstances—I'd love to hear about your experience. In particular, I'd be grateful if you would: open a blank text file when you start reading, and every time some thought or reaction occurs to you while you read, just type it, unedited, into the file. Stream-of-consciousness reaction is very helpful.
As you'll see, this prototype isn't wired up to Orbit, but you can save your work as an Anki deck. I'm quite keen to hear how these prompts fare in the following days and weeks in your reviews.
Some more guidance:
- There's no mobile support at all.
- I'm less interested in "testing" and "evaluating" (are there bugs? is this a good product?) than in notes on your qualitative experience, surprising behaviors, feelings, etc.
- At least for the purpose of writing to me about your experience, please try to read authentically. That is, don't review or save stuff just because you think I might expect you to. If you'd normally skim that chapter of the book and wouldn't want to review anything, that's what you should do.
- I'm aware that this won't currently make any sense to people who aren't already familiar with the mnemonic medium. I'll handle that, but I want to refine the core mechanism first.
To come, perhaps next week: a letter on what I've learned so far from this prototype and how I expect to evolve the medium.
This post and an excellent past couple weeks are brought to me and you in particular by Rob Ochshorn, who helped me see how to take incremental steps in making usable prototypes of this new design, and got me out of an oh-god-this-will-take-months malaise. Perhaps more on that in the next letter too.
2022-06-24 18:24:13 +0000 UTC
View Post
Announcing office hours for June! Note that the first one is this Friday (Google Calendar link; iCal URL):
- this Friday, June 17, 1:30 PM PDT (meet)
- Thursday, June 30, 1:30 PM PDT (meet)
Seeking testers for new mnemonic medium prototype
Separately, I've been preparing a higher-fidelity version of the prototype I showed last month to try with some real readers.
For the initial round, I'd like to look at the prototype together live for half an hour with a few interested readers. Might a few of you be interested?
The prototype will adapt the first couple chapters of Shape Up, a book on software product management which balances design, strategy, and operational perspectives. I'm specifically looking for initial test readers who:
- are in a product management or design leadership role; i.e. the book's material is highly relevant to your day-to-day work (it's OK if you've read it already)
- have tried a spaced repetition system at some point, for at least a couple review sessions (or you have an active practice)
I'd particularly like to find a few testers who can spare half an hour this Friday, in person, in San Francisco. If that's you, book a slot here. I'll come to you!
I'm also looking for a few folks to try it remotely next week; book that here.
I only have a few slots listed there, so don't be alarmed if there aren't any available—you'll have a chance to try it soon! I'm just doing some limited live testing first, to guide my iteration.
Office hour details
Now that I've described the prototype test, I'll repeat some information about the rough structure of office hours:
- At least for now, there are no reservations. Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems. But! The chat transcript will be published as a comment on this post after the office hours, so that URLs which people share are more easily accessible. Identities in the chat transcript will be public.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
2022-06-15 21:17:54 +0000 UTC
View Post
Announcing office hours for May! Note that the first one is tomorrow, this Friday (Google Calendar link; iCal URL):
- Friday, May 13, 9:30 AM PDT (meet)
- Tuesday, May 31, 1:30 PM PDT (meet)
Happy to discuss the mnemonic medium redesign tomorrow if it's of interest and if others' projects don't fill our time.
Recapitulating some information about the rough structure:
- At least for now, there are no reservations. Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems. But! The chat transcript will be published as a comment on this post after the office hours, so that URLs which people share are more easily accessible. Identities in the chat transcript will be public.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
2022-05-12 22:36:06 +0000 UTC
View Post
Last year, I worked with authors to test the mnemonic medium (i.e. Orbit) in a bunch of contexts beyond Quantum Country. And… it didn't work nearly as well! Since late last year I've been working on redefining the primitives in response to readers' experiences. This demo/talk presents a new framework for the mnemonic medium in the context of a textbook, two non-technical essays, encyclopedic references, and an academic paper.
It's not a product demo video. It's the kind of thing I'd produce to be useful to my peers in the interface invention world, and to get the kind of feedback I need. So there's a lot of discussion about theory and ideas, not just showing off UI designs. But perhaps some of you will find that interesting from the standpoint of learning about how I approach designing something like this.
It feels really lovely to publish this—it's the first significant design artifact I've put out in over a year. My thanks to you all for helping make it possible.
Comments, questions, and criticism are all very welcome.
Script
This is a scripted talk, so I've included that material below for easier searching / reference for anyone who might want to write comments.
---
- Spaced repetition is an incredibly powerful way to learn, so it seems surprising that it’s not more widely adopted.
- One key reason is that it’s actually quite difficult to write good prompts—that is, the questions and answers you review over time.
- That’s why Michael Nielsen and I developed the mnemonic medium: maybe we can make it almost effortless to internalize a text if we have the author provide the spaced repetition prompts, and if we make those prompts more richly connected by anchoring them in the narrative.
- For the past two years, I’ve been exploring what the mnemonic medium wants to become as it expands to texts beyond Quantum Country.
- Quite a few authors have adopted the system in various contexts now, but in most of them, readers are much less enthusiastic than they were with Quantum Country.
- Gary Wolf nicely summarized the problem: by leaning so heavily on authors, we’ve actually created an authoritarian medium.
- As I wrote in an earlier essay (which I’ll link in the video description) “the current interactions demand not only that you read the text in full, and in order, but that you repeatedly study—and commit to memory—whatever the author things is important, in whatever form the author chooses. The memory system isn’t “yours”; it’s on loan from the author, kept under glass.”
- Now, if you’re trying to study a well-specified topic carefully, and you’re totally new to that topic yourself, it makes a lot of sense to submit yourself to an authority!
- Quantum Country works because it’s a primer. There isn’t much relevant variation in readers’ prior knowledge. Most readers won’t know enough about the field to choose specific subtopics they’d like to study. They’re unlikely to have strong opinions on how the material should be framed. And it’s okay that they can’t add their own prompts, because they’re too novice to make lots of connections with a creative project of their own. But they still want to internalize the material comprehensively, and deeply. These readers benefit from a highly guided experience.
- But most reading doesn’t work like that. Most reading is less linear—more contingent on my prior knowledge, interest, and current projects.
- I want to internalize the ideas I find meaningful, but I don’t necessarily want to grant the author carte blanche to assign me homework.
- Even in a primer, I want prompts to feel less like fussy property of the author, protected under glass, and more like they’re “yours” as a reader, malleable material in your hands, there to help you deepen your understanding of what you care about most.
- If I give you a textbook, I want you to break the spine and write all over it.
- It’s tough to write good prompts, but that doesn’t mean authors should be the only ones with that privilege: we want to encourage active engagement, people making connections between the material and their own experiences or understandings.
- In an ideal world, you’d have a personal memory genie sitting on your shoulder. And every time you read something you found striking, and every time something you read sparked an idea of your own, you could effortlessly remember that detail—and, if you like, return to that notion over time to stimulate further engagement.
- That’s science fiction, but we’ll treat it as something like a north star. How close can we get to that degree of effortlessness?
- More practically, we can think of this as an information problem. How can readers indicate which prompts accord with their interests, using the fewest interactions and making the fewest decisions? And how can they most fluidly capture prompts reflecting their own insights?
- There is no one-size-fits-all answer here. Different readers will want different things from any given text. And the same reader will read different texts—or even passages within the same text—in very different ways.
- When you’re trying to learn a new subject, you really may want to defer to an author’s guided experience; but after a few projects under your belt, you’ll often read in a more mercenary fashion.
- Reading a non-fiction essay, you might care only about the high-level claims, or you might want to be able to reconstruct the entire argument from scratch.
- Reading reference or encyclopedic material, you’ll often jump to a specific subsection of interest and care only about a line or two, or even a few specific numbers or details within those lines.
- A single text might inspire one reader to write extensively, and another reader not at all, while a different text provokes the opposite reaction, according to how each interacts with the reader’s own creative projects and interests.
- The great thing about written language is that it copes well with all this ambiguity. The same page layout can serve a gossip column and a literary journal. Readers can skip around and drill in as they like; cues like headings and indexes help guide strategic reading; margins invite scribbling; and so on.
- How might we make the mnemonic medium feel that versatile?
Demo
- Let’s start with a familiar concept for the medium: a textbook. This is Introduction to Modern Statistics, an open-access textbook I’ve been adapting as a test.
- If I haven’t studied this topic before, I’ll probably appreciate a guided approach like Quantum Country’s. I’m going to read fairly linearly. I’ll want the author to make opinionated suggestions about what I should be internalizing here, and about when I might pause for a moment to review.
- So I’ll focus on reading, and on trying to understand the material. After a few minutes, I’ll see that the author has embedded a review box where I can quickly test myself on key details from the text, and bring that material into my Orbit so that I’ll absorb it more deeply over time. So far, so familiar.
- Now here’s one new detail: if I don’t care about a particular prompt, I can just skip it. We’ll move right on to the next one.
- If I’m reading more casually, this kind of guided review might feel pretty heavy-handed. I might prefer to quickly get a birds-eye view of the material in this box. Then I can save any prompts which particularly strike me.
- But if we’d continued with the review, or if we just switch back, then when we finish up, here’s what we’d see:
- The prompts we reviewed—but not the ones we skipped—are saved to our Orbit. You’ll also see that reflected in the state of the prompts in this list here.
- This default is meant to cover the common case: questions you cared enough to review are automatically saved; and those you skipped are not.
- We make two quick bulk actions available for other common cases.
- You might decide after reviewing: you know what? That wasn’t very interesting. I don’t want to go through those again. You can just click “Undo” here to reverse the automatic behavior and remove all the prompts.
- Alternately, you might think: you know, those prompts I skipped… I didn’t feel like reviewing those right this second, but I do want to hang onto them, maybe see them at least once more. You can just click “Add all.”
- Apart from the bulk actions, you can of course add and remove prompts individually here.
- If you didn’t like how the author worded a prompt, you can edit it directly, right here.
- My goal with this design is to give you the best of both worlds: you benefit from the author’s meticulous care in constructing this material—but this is your copy of the book, and it’s yours to tear up or scribble on as you like.
- What we’ve seen so far is a relatively incremental change to the existing mnemonic medium, because we’re looking at a reading context very similar to Quantum Country’s—one where I’m quite unfamiliar with stats, and I want to put myself in the author’s hands for a carefully guided experience.
- But that’s often not how we read. In many situations, we’d be better off exploring the same stats textbook non-linearly, in a mercenary fashion.
- To take myself as an example: I studied stats years ago in university, but as I analyzed reader data from Quantum Country, I noticed that I’m feeling pretty fuzzy on certain topics in stats. When I opened this book, my natural approach was to jump around, to focus on material which seems relevant to my project, or which feels unfamiliar.
- In this context, even the new “list mode” we’ve seen in the review areas would feel pretty cumbersome.
- Reading through, this material was new to me: I’d never learned how to fit a linear model by hand. I’d always used computers, but this gives me a greater sense of tactility for the concept.
- The rest of this chapter is mostly old news for me, but I’d like to bring these details into my Orbit. How can I do that?
- Well, I’d need to scroll down until I found a review box, then skip through all the prompts except the ones which relate to the concept I wanted to reinforce.
- I could switch into list mode instead, but then I’d have to read through all of the questions to find only the ones which correspond to the concept I care about.
- There are really two problems here: cost—I need to perform a large number of actions to indicate a small subset of interest; and decontextualization—I can’t indicate my specific interest where and when it actually occurs, while reading the text.
- This new version of the mnemonic medium has an affordance which should help.
- Let’s jump back to the passage I found interesting. Say I’ve just finished reading, and I think: wow, I really hadn’t understood this, but I think I get it now. I want to make sure I really internalize this concept.
- When I have a thought like that, I can notice this little Orbit marker in the margin. This marker indicates that there are prompts available about the adjacent text.
- When I click it, a sidebar appears presenting the relevant prompts. I can add all the prompts in one click.
- Or if only a few of the prompts interest me, I can click to add each of them.
- My goal with this design is to capture that instinctual spark in the moment—that feeling where you read something and go “ooh, juicy!” And, hopefully, to minimize interaction cost by using the marker’s position within the text to implicitly indicate your subset of interest.
- In the context of a textbook like this one, readers can freely move between the two modalities I’ve shown, picking and choosing prompts as they go, or adding them in bulk as part of an end of section review.
- The marginal Orbit markers offer a sort of peripheral vision as you read—hey, this passage might be worth remembering, and you can do so if you like.
- Let’s switch gears now and talk about non-technical essays, informal articles. These contexts show another way we might combine the two modalities we’ve seen so far.
- My friend David Chapman very kindly agreed to try using Orbit in this philosophical essay he published early in 2021.
- I often find David’s essays quite striking. So historically, I’ve gone to the trouble of writing lots of my own Orbit prompts about his ideas.
- In many cases, it seems like it’d be even better if he’d done that work for me, or at least gave me a head start.
- Now, David used Orbit for this essay just as I’d suggested, writing prompts to reinforce all of the details of his argument. But while Quantum Country readers were overwhelmingly positive, feedback about this experiment was more mixed. Many readers told us they found the experience unpleasantly overbearing—“like being in school.” I received similar feedback about several other articles which tried using Orbit.
- I think the first issue here is reader stance. Quantum Country is explicitly instructional—as a reader, you’re there to learn. When the text asks you to review what you’ve read, it feels like it’s doing its job.
- But when a non-technical essay like this one starts demanding partway through that you’ve memorized the author’s arguments, that can easily feel presumptuous.
- My stance towards an essay like this will evolve while I’m reading it. I’ll often start quite casually, reading mostly for edification or curiosity. If the author really catches my interest, I might decide to study the text more carefully. The medium will offend if it demands the latter stance prematurely.
- This phrase “like being in school” is telling. I’d guess this person means that what’s being asked of them isn’t aligned with the shape of their own interest. That’s the essence of being in school for most people.
- The second issue is level of detail. David wrote prompts tracing all the fine-grained details of his argument, just as I suggested. That’s a great service for anyone who wants to really internalize this essay in extreme depth.
- But mixed in with these detailed prompts are a handful of high-level prompts—by my count, 7 of the 60—which represent the main ideas of the essay.
- It’s totally reasonable for readers to be interested in this essay at different levels of detail.
- As an author, I believe it’s usually best to write for your most serious, demanding readers. They’ll want to internalize enough detail to reconstruct your full argument.
- But that doesn’t mean you should actively alienate your (say) 80th percentile readers: those who are less invested but still quite attentive, and who want to carry your main ideas with them into future weeks.
- In the initial design for the mnemonic medium, you could only offer the highest level of detail to everyone. To provide full coverage, you need to embed a review area every few hundred words.
- I believe it would be better to follow the familiar design principle of progressive disclosure—to reveal the presence of detail, but to avoid having it take center stage unless the reader directs their attention that way. Ideally, again, with as little interaction cost as possible.
- In our discussion of the stats textbook, we talked about people being interested in specific parts of a text, rather than the whole. That’s not quite the same as people being interested in different levels of depth, but I think the same primitives can help address both needs.
- Here I’ve made a prototype of David’s essay using the new design elements.
- Instead of confronting a review box after every few paragraphs, readers will see just one review box, offered at the very end of the essay.
- This review box is intentionally high-level, focused on the main ideas of the essay. Just 7 prompts here.
- Casual readers can easily treat this final review box as an appendix, if they aren’t interested any review at all.
- But readers interested in more detail will see Orbit markers scattered throughout the essay.
- Say that I’m particularly struck by this point David makes about marshes causing trouble for a correspondence theory of maps and territory. I can click this marker to easily bring that idea into my Orbit.
- Likewise throughout the essay. This approach might sound fiddly, but it doesn’t mean there are 60 markers for the essay’s 60 prompts.
- The markers indicate prompts about the adjacent passage. It’s an intentionally rough association. Prompts tend to cluster, and there are often several describing a single idea from different angles.
- By my count, we’d use 22 markers for the 60 prompts in this 6,500 word essay. Spread uniformly, that would amount to roughly one marker every minute and a half of reading, though of course in practice some sections are denser than others.
- Readers interested in more detail can indicate that interest as they’re reading, by clicking the Orbit markers.
- They’ll see any prompts they add in their next review session, whether that’s in a review box at the end of the essay, or in the Orbit app alongside prompts from other essays.
- Readers can also stop and review at any time, which might be helpful with a longer text.
- As in the statistics textbook, the presence of the markers provides a sort of peripheral vision—more here, if you want.
- Readers can incrementally opt into more detail.
- Apart from the markers and sidebar, if you read through and complete the review box at the end, you’ll see the other available prompts here in the list tab.
- Conceptually, none of the prompts “live” in the review box anymore. The review box is just the author’s curation of prompts which appeared throughout the preceding text.
- So if you read through the text and complete the review box, you’ll notice that certain Orbit markers throughout the text are now marked as added, or partially added.
- The unifying concept is this sidebar surface, which is now where all prompts “live."
- The review boxes are like a window into parts of this sidebar. From a review box, you can always jump back to where a prompt “came from.”
- Note that we don’t surface a literal character range associated with these prompts—they’re just associated with a general region.
- I think this is important: in my design experiments, I’ve found that many prompts resist a precise association with the source text. This is particularly true of prompts which synthesize, distill, and connect—and these are often the most important ones.
- A better metaphor is often a post-it note stuck to the side of a page, gesturing at a vicinity.
- The sidebar gives the prompts a sort of “object permanence” they never had in past versions of the medium. Before, once you’d reviewed a prompt, it was gone. It didn’t “live” anywhere you could get your hands on it.
- That ephemerality certainly didn’t foster the sense of malleability I want to create, but it created practical problems too. Sometimes people wanted to give feedback on a prompt that hadn’t worked well for them, but they couldn’t find it again!
- The sidebar also provides a home for your own prompts.
- For instance, when I read this passage on adventure rationality, I was struck by an observation my friend Alec Resnick has made about the phrase “bicycle for the mind.”
- How does adventure rationalism echo Alec’s criticism of “bicycle for the mind”? As usually employed, it implies that you know the destination already!
- I can add a prompt to bring that connection into my orbit right here as I read, and it’s a first class citizen, like those the author provided.
- I can also add prompts during or after a review: I find that’s often a context where I’ll realize what prompts I wish I had.
- The Orbit markers offer a relatively quiet peripheral vision. They just say “hey, there are prompts here.” But if I’m reading quite carefully, I can pin the sidebar open to keep the prompts in my periphery while I read.
- Now I get a continuous extra channel of information about what the author views as important.
- I may also find this mode more convenient if I’m writing a lot of prompts of my own—like keeping a notebook open alongside a text as I read.
- I can just click in the margin alongside the text and start typing.
- This design deliberately resembles margin notes in books and digital annotation tools, so it naturally suggests sharing.
- Maybe if I find an article very interesting, I can mark it up using the Orbit sidebar and share my prompts with others, even if the author hasn’t provided their own.
- Or maybe we should think of this as a shared, wiki-like surface, Genius-style.
- I’d like to explore these angles, but at least initially I’m going to focus on expert-authored prompts: Quizlet and Ankiweb have amply demonstrated the challenges of crowdsourced prompt-writing.
- In my experiments so far, the pattern I’ve shown seems to work well for a variety of essays, informal articles, and blog posts.
- Here’s another example: Donella Meadows’s famous essay on leverage points for complex systems.
- As in David’s essay, I’ve put one review box at the end with a few curated questions on the main ideas, and Orbit markers throughout for more detail.
- But I’ve found that we’ll often want an even finer-grained affordance for texts with very high density of detail—texts like references, encyclopedias, and technical papers.
- Consider this article on life expectancy from Our World in Data.
- This is an encyclopedic, or reference-like resource, covering the topic from many different angles. Most readers will jump around, focusing on the aspects of life expectancy which most interest them.
- Now take a moment to scan the first few paragraphs of this section. Notice how data-dense they are: “no country in the world had a longer life expectancy than 40 years”; “Global inequality in health was enormous in 1950: People in Norway had a life expectancy of 72 years, whilst in Mali this was 26 years.”
- As a reader, what sort of affordances might be most helpful for me to carry away the details which most interest me?
- If you add prompts to cover all this fine-grained detail, you’ll end up with one Orbit marker per paragraph.
- I think that would feel pretty overwhelming. Visual noise aside, I’d be deciding every few seconds whether or not to click on an Orbit marker, and evaluating the prompts behind each.
- But I think it would also be too ambiguous. Look at this third paragraph. There are so many details here! Say I’m particularly struck by this one sentence—“The global inequality in health was enormous in 1950: People in Norway had a life expectancy of 72 years, whilst in Mali this was 26 years.”
- If this text weren’t so dense, I could act on my interest by noticing that there’s an adjacent Orbit marker and clicking on it.
- But in a paragraph this detailed, the marker no longer does that job: I’d have to sift through a bunch of prompts about the other sentences to find the ones corresponding to the bit I’m interested in.
- I think a more practical mental model for these Orbit markers is: “here the author has curated the key things to remember about this subtopic.”
- Unless I’m reading very closely, I don’t want to be making constant decisions. I want the author to help me stay focused on what’s most important.
- Here’s how that might look. This section has one Orbit marker, adjoining the paragraph which sums up the discussion.
- If I found this section striking, and I’d like to carry it with me, I can click this marker, and I’ll see these five prompts which summarize the most important points.
- Now, these prompts are still quite detailed. But sometimes I find myself wanting more detail. So I’ve designed an interaction meant for bringing fine details into your Orbit: things like numbers, proper nouns, dates.
- When I first read this section, I was really surprised to learn that no country in the world had a life expectancy longer than 40 years.
- I can easily capture that by selecting 40 and clicking “New fill-in-the-blank prompt.”
- In the sidebar, I see a preview of the prompt which would be created. Orbit guesses that I’d want the surrounding sentence as context. But I can drag these handles to give myself more or less context.
- Or I can manually edit the context down myself.
- If I create multiple fill-in-the-blank prompts with overlapping context, Orbit merges them for me.
- I’ll make the same interaction available for images, so that I can easily make fill-in-the-blank prompts for details in images like these maps, or figures like these from Cell Biology by the Numbers.
- Now, this “fill-in-the-blank” type of prompt is often called a “cloze deletion.”
- They’re easy to make, so new spaced repetition users often use them for everything: just copy and paste huge text passages verbatim and delete chunks. It’s an easy way to make prompts, but these often don’t work very well. The context is too noisy, or it’s not clear what’s being asked for, or you end up pattern matching, just parroting phrases back without really absorbing anything.
![]()
- But in my experience fill-in-the-blank prompts have a higher success rate when used for focused, precise details, like the ones I’ve demonstrated. Numbers, dates, names, terms, diction. Ideally with tight context.
- I find that it’s often best to focus on higher-level prompts which synthesize and summarize; without those, fine-grained fill-in-the-blanks tend to create knowledge which feels atomized and brittle.
- I’m hoping that we’ll create a good balance by having authors focus on those higher-level prompts—which are harder to write—and giving readers this way to easily create super-fine-detail prompts.
- Now, in this kind of encyclopedic article, I wouldn’t expect a review box at the end of the page. Reading patterns are just too non-linear.
- So for this type of content, I think readers would be well served by a handful of Orbit markers scattered throughout, plus the fill-in-the-blank interaction for finer details.
- If it makes sense to offer even coarser, article-level summary prompts, you might put those in an Orbit marker associated with the abstract, or the conclusion.
- My experiments suggest the same pattern applies to technical papers. For instance, here’s the paper introducing IFPS, a distributed file system.
- The structure here is pretty typical. The paper begins with a high-level motivation, then provides background situating this work relative to its predecessors, a summary of the high-level design, sections containing progressively higher levels of detail, and some concluding discussion of the work’s implications.
- Different readers will want very different things out of this paper. Am I just curious about how IPFS works? Am I trying to implement a client for IPFS? Am I trying to understand the design so that I can build my own distributed file system?
- Orbit markers let me indicate my interest on a topic-by-topic basis. Maybe I’m not really interested in the motivation or background—I’ve understood that already from other sources—but what I actually want to know is how the file system works.
- I’ll focus on section 3, clicking the Orbit markers about the topics where I learned something new.
- For the project I have in mind, this threshold on the size of “small” values stored directly in the distributed hash table is actually quite important. It’s not something most readers would care to take away from this section, so the author didn’t write a prompt about it, but I can capture that fine detail quickly with a fill-in-the-blank prompt.
- Reading this list of potential use cases, I’m struck by a connection to another topic of interest, and I can add a prompt about that as I read.
- In what sense might IFPS and TimBL’s Solid complement each other? You could use IPFS to implement a distributed Solid data pod.
- This way I’ll be nudged to think more about my idea a few times in the coming weeks and months.
Summary and conclusions
- Wrapping up, let’s review the primitives of the proposed new mnemonic medium.
- Prompts—both yours and the author—live in a sidebar, loosely positioned alongside the relevant source material.
- Authors can surface these prompts in review boxes. As in today’s mnemonic medium, these give readers the opportunity to review what they’ve read while also saving prompts to return to later.
- In this new design, readers can triage while they review by skipping prompts.
- And a new list screen offers a birds eye view and bulk actions.
- But authors can also surface prompts while you read, via Orbit markers. These mean: “click me to bring the main ideas from this passage into your Orbit.”
- These markers let you jump around the text, picking and choosing which topics to take with you.
- And if you like, you can keep the prompts in view while you read by pinning the sidebar open.
- Authors can mix and match review boxes and Orbit markers to shape the reading experience:
- In an essay, authors might embed a single review box covering the most important details at the end, while also possibly surfacing detailed per-topic prompts via Orbit markers.
- In explicitly instructional texts like Quantum Country and the stats textbook I’ve shown, authors can embed review boxes regularly, at the end of each section, to help readers stay on track.
- Meanwhile, encyclopedia articles and technical papers will probably use no review boxes at all.
- Readers can write their own prompts which live alongside those of the author.
- And for detail too fine-grained for authors to reasonably curate, a lightweight interaction allows readers to quickly extract fill-in-the-blank type prompts.
- The process of triaging, editing, and writing prompts doesn’t end in the reading context: I’ll be adapting these same primitives for the Orbit app as well.
- So if a prompt comes up that doesn’t resonate with you, you can just skip it, and it’ll back off for increasingly longer intervals. Or you might improve the wording. Or, in many cases, it’ll be best to simply delete it.
- The list representation of prompts we’ve seen in the sidebar and in the list tab of review boxes will have an analogue in the app for browsing and editing your library.
- So far, these new primitives exist only in smoke-and-mirrors prototypes.
- Next, I plan to build higher-fidelity prototypes to test with readers in some real contexts.
- I also have a lot of tricky design work to do in designing onboarding flows for everything I’ve shown. In this video, I’ve focused on the primitives, playing the role of a familiar user. But of course the real system will have to help introduce itself to new users.
- Thanks for watching. Criticism, questions, confusions, and riffing are all welcome.
- My special thanks to Ozzie Kirkby, who prototyped some early approaches to this problem last summer, and to Nick Barr, Taylor Rogalski, and Gary Wolf for helpful discussions.
2022-05-12 20:23:49 +0000 UTC
View Post
Hi, all! First, a timely reminder that the second April office hours is tomorrow (Google Calendar link; iCal URL):
- Friday, April 29th, 9:30 AM PDT (meet)
Prototyping new mnemonic medium design
After a lot of Quantum Country data analysis early this year, last month I switched back into design mode, focused on creating new primitives for the medium which give readers much more control (yet without creating burdensome interaction cost). See these two letters for more background there.
I have a new set of interactions which are holding together well in my "paper prototypes" so far, so I've been working on a talk which will include extended live demos of prototypes in action. I hoped to finish that this week, but that's not happening—so in recompense, I recorded my morning prototyping session.
It's more than five and a half hours long, so I don't expect anyone to watch it all the way through, but many of you may be interested in the opening few minutes of remarks about how I see the role of demo presentations in the process of inventing new user interfaces.
One more small offering: a little story on the self-inflicted harms of inappropriate time pressure in open-ended creative work.
That story presents a funny contrast, actually. There, time pressure was self-defeating because it actually subverted my own creative goals: it kept me from engaging seriously with the subject matter I was purporting to adapt into my new medium. And how else am I going to know if the medium's really working? But in this morning's livestream, I'm afraid the last two hours are me wrestling with various LaTeX rendering bugs. Those, of course, have nothing at all to do with the actual ideas under consideration, and so time pressure is actually helpful, insofar as it nudges me to give up and use hackier solutions.
2022-04-29 04:55:52 +0000 UTC
View Post
Just sending out this reminder of office hours for April; note that the first one is this Thursday (Google Calendar link; iCal URL):
- Thursday, April 14th, 2:00 PM PST (meet)
- Friday, April 29th, 9:30 AM PDT (meet)
One change to previous office hour policies: we'll post the chat transcript as a comment afterwards, so that shared links can be persisted. So while the A/V conversation has Chatham House rules applied, the chat itself does not.
Recapitulating some information about the rough structure:
- At least for now, there are no reservations. Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems. But! The chat transcript will be published as a comment on this post after the office hours. So identities are public there.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
2022-04-12 01:38:19 +0000 UTC
View Post
The best scientists, entrepreneurs, and engineers I know pour themselves into their work. You couldn’t capture their working hours on a timecard. Their creative gears turn restlessly, and insights produced in the shower or on walking conversations are no less valuable than those produced at the office. Yet I’ve noticed that top knowledge workers relate to their skills quite differently than top athletes and performing artists do.
Competitive athletes, musicians, and dancers work tirelessly—often with a stable of coaches—to assess, develop, and maintain the core skills of their disciplines. They watch tape of themselves. They measure their performance at microtasks intended to isolate specific core skills. Decades into their career, they still practice scales, or perform plyometric exercises, or whatever else they need to do to maintain top performance.
By contrast, knowledge worker friends will sometimes tell me about studying a new programming language, or brushing up on their statistics with a tutor. But I notice that these “training” efforts are usually temporary and focused on subject matter, rather than on “core skills” analogous to those an athlete or performing artist might refine daily. It’s rare that a knowledge worker tells me about a diligent ongoing training program to improve their skills at reading difficult texts, or synthesizing insights, or sharpening their research questions.
In his book summarizing a career spent studying deliberate practice and elite performance, K. Anders Ericsson suggests[1] that we shouldn’t be surprised by the omission. The core skills of tennis and ballet have been systematically characterized; they can be easily and objectively assessed; for each skill, we know practice activities which can can improve performance. The same can’t be said (yet) for the skills of a scientist, or a startup founder.
But I don’t think this is the whole story. When I talk to serious knowledge workers about this disparity between themselves and athletes, I’ll often hear a response which sounds like: “I do practice the skills you’re talking about, every day, as part of my work. I’m reading memos and synthesizing insights and formulating questions all the time.” The implied belief is that they practice these skills implicitly, as part of their routine work—so they don’t need the dedicated assessment and development used in these other fields.
Ericsson and co-authors tackle this objection in another paper[2]:
Although work activities offer some opportunities for learning, they are far from optimal. In contrast, deliberate practice would allow for repeated experiences in which the individual can attend to the critical aspects of the situation and incrementally improve her or his performance in response to knowledge of results, feedback, or both from a teacher. … During a 3-hr baseball game, a batter may get only 5-15 pitches (perhaps one or two relevant to a particular weakness), whereas during optimal practice of the same duration, a batter working with a dedicated pitcher has several hundred batting opportunities, where this weakness can be systematically explored … In contrast to play, deliberate practice is a highly structured activity, the explicit goal of which is to improve performance. Specific tasks are invented to overcome weaknesses, and performance is carefully monitored to provide cues for ways to improve it further.
I’ve learned (the hard way) this past year that there’s a type of situation in which implicit practice will often fail—and fail invisibly. I hope this story might help you spot places where a similar pattern occurs in your life.
A sight reading parable
I’ve been playing piano since I was eight years old. Unfortunately, I didn’t take the instrument seriously until I was a teenager, and a vocal music obsession diverted my musical attention for much of my adult life. So I don’t have the fluency one might hope for after a couple decades. Still, I can learn and perform “early advanced” classical repertoire, and I take great joy in my time at the piano.
Last year, I discovered that despite the efforts of multiple teachers and thousands of hours at the piano, a gaping—yet invisible—hole in my skills has been seriously handicapping my progress, and my enjoyment. My repertoire and technical skills may have been those of a modestly experienced amateur, but until I discovered this problem and started working on it deliberately, my sight reading skills were those of a beginner perhaps three years into playing.
Sight reading is the skill of picking up and performing a piece of music you’ve never seen before, with little preparation or practice. By contrast, “studying” a piece is like reading by slowly sounding out a piece of literature written in a foreign language, in a foreign alphabet. I’d had that experience in high school, translating Homer’s epics from ancient Greek. For two years, I’d only ever experienced Greek at the pace of two lines of verse per hour of study. Then we picked up the New Testament, and for the first time I had the experience of “sight reading” Greek: the language was simple enough that I could translate it on the fly. (It is to the advantage of a proselytory text to use inclusive language!) What a joyful, freeing feeling that was! So utterly different from the plodding experience of cross-indexing multiple scholarly references to understand each phrase.
I didn’t notice that I was always “studying” but never “reading” as a pianist, because no student expects to be able to sight read challenging piano music. Such pieces require weeks or months of study—not to read the notes off the page, but to practice difficult physical motions, to interpret the movement of many voices, and so on.
With piano, my teachers and I focused on studying repertoire “at my learning edge.” Each of these pieces would take months to learn. Almost all the time with those pieces was spent on interpretation and technique. After the first few sessions, I had the score memorized, so I didn’t need to read it anymore. But that meant that in a whole year, I’d only read a few pages of new music! Imagine learning to read with only a few pages of prose per year. No wonder I read music so slowly.
Unfortunately, this situation only made itself worse. As the pieces I learned became more musically challenging, each piece took longer to learn, which further reduced the amount of new music I would read each year. My poor sight reading skills made new pieces take even longer to learn: because I couldn’t read most of that music in real-time, I’d need to memorize passages before I could practice them. So each year, I’d read fewer bars of more difficult music, and atrophy still further in sight reading, and so on, in a downward spiral. Thus poor sight reading skills resulted in fewer implicit opportunities to develop sight reading skills. An awful feedback loop!
Emotionally, my poor sight reading skills gave rise to a powerful feeling of scarcity in my piano experience. Whenever I’d start a new piece, I knew that I’d have to study for months before I could play it. And I knew that I could only study a few new pieces per year. So choosing a piece to study felt like a high-stakes decision. I couldn’t respond to impulses I felt each time I sat down to the piano: I’d have to stick with one piece for a long time. That weightiness made piano less joyful.
I couldn’t quite articulate this, but I really wanted to be able to sit down and just play new music on a whim. Of course, I understood that the “at-level” pieces I was studying were quite difficult, so I couldn’t play those spontaneously. But even when I chose pieces which seemed much easier, I still couldn’t play them on the spot. These simpler pieces might take five sessions of practice instead of fifty, but they still felt like “sounding out the words” rather than “reading”. And I felt that if I couldn’t play even these easier pieces spontaneously, there was no point: I’d just be taking time away from the “at-level” pieces which would develop me as a pianist.
The moral here is that implicit practice wasn’t enough to improve my poor sight reading skills. New pieces took months to learn, but my teachers didn’t notice a problem because such pieces should be hard, should take a long time… though in hindsight perhaps not that long. The real problem was that all music took me quite a long time to learn, even music at a level I might have studied years earlier. But I never worked on “easier” music like that with a teacher, so no expert ever had the opportunity to notice the problem.
The irony in this situation is that piano is one of the classic domains which expertise researchers reference when discussing deliberate practice. The skills are well characterized and readily assessed; we have practice methods for improving performance at each skill at all levels; we have well-known teaching practices; etc. In fact, it was this kind of formal structure which finally identified my sight reading as a problem. A potential new piano teacher wanted me to sketch my abilities using the rubric of the Royal Conservatory of Music’s syllabus, which helpfully delimits “levels” for various skills, and provides learning resources for each. I measured myself at level 8 or 9 along each axis—except for sight reading, which was around level 3. Oops.
I didn’t grasp right away how important that gap was. I thought, almost as a matter of hygiene: well, maybe I should bring that straggling skill up to the level of the others. So I bought some sight reading workbooks. These books provide snippets of pieces organized by difficulty. The idea is that you find an appropriate “starting place”, simple enough to sight read, then you read a page or two of new music each day. The music slowly becomes more complex over time, much like graded reading books for children. I made progress rapidly, but that meant playing little eight-bar snippets of simple folk songs—so the growth didn’t feel terribly profound.
A few months into this process, I saw a YouTube video suggesting that pianists practice sight reading by using books which compile “easy” arrangements of music they enjoy. I purchased a book of Disney music intended for beginners, and—embarrassing as it sounds—that book gave me one of the most profound musical experiences of my life. The night it arrived, I sat down to the piano and opened to the first page. I played the first piece, then the next, then the next, straight through, until I reached the end of the book, over 200 pages in a single night. I read more music in that one night than I’d played in the prior decade of cumulative practice. After years of pieces which required weeks of study before they could really be played, it was absolutely exhilarating to play dozens of beloved songs on the spot. The arrangements were simple, but that didn’t matter. In some strange way, these arrangements made me feel more like a pianist than the difficult Chopin pieces I’d been studying. They ended the feeling of scarcity I hadn’t recognized; they gave me a feeling of agency I didn’t know I’d been missing. I’ve practiced sight reading daily for much of the past year, and the progress continues to feel deeply rewarding.
Many musicians reading this will suggest that my experience was quite unlucky. I could have avoided this problem if I’d had teachers with a broader focus, or if I’d studied traditions like jazz which rely on improvisation and session play. But I feel I got lucky in this situation. My weak skill happened to be in a domain amenable to deliberate practice. It was easy to accidentally stumble into an assessment which revealed the problem. And once the problem was identified, it was easy to make rapid progress. But my weakness could have been hiding instead in a much less well-defined domain, one without properties so friendly to deliberate practice.
When I told this story to Rob Ochshorn, he asked: are there other situations like this lurking in my life? Are there other weak skills, like sight reading, which have caused similarly harmful feedback loops? Skills which might feel rewarding in the same way to practice at an embarrassingly simplified level?
A design parable
I realized in that conversation that another much more important skill has fallen into the same spiral for me: the visual practices of user interface design. Just as my sight reading fell behind because I focused on learning pieces “at my learning edge”, this design skill never got a chance to grow because my design projects have always focused on conceptually difficult, and often novel, interaction designs.
Many young designers hone their skills by composing iteration after iteration of layouts in conceptually “simple” UIs—a sign-up screen, a list of search results, a news feed. With ample (if perhaps mundane) experience, they gain a deep intimacy with common patterns which allows them to do something like “sight reading” with a new interface: to converge spontaneously and in near-real-time to high-quality layouts.
But I came to design sideways, as an engineer, so my Apple projects were unusual concepts: iOS’s 3D page curl, novel multi-touch gesture interactions, physics-based UI animations, the gyroscope-driven 3D parallax effects, etc. At Khan Academy, I worked on designs like interactive number block manipulatives, an illustrated math “platformer” game, and a semi-synchronous peer learning environment. These projects were all quite difficult conceptually, so each one took many months. My collaborators and I would spend some of that time on the visual elements of the interface designs, sure, but each required us to focus mostly on challenging conceptual issues. This situation parallels “at-level” piano pieces, which took months of focus on technique and interpretation, but whose scores I’d no longer need to read after a few sessions. I’ve spent many years working as a designer, but I’ve laid out only a handful of interfaces—just as I’d spent many years learning advanced piano repertoire, only reading a handful of pages each year.
I can see now that my weak visual skills for interface design have created a feeling of scarcity similar to the one I felt at the piano. Interface ideas take me a long time to refine, so I feel like I need to choose projects carefully—I’ll only get to flesh out a handful each year, just as I’d only get to choose a few piano pieces to play each year. As my career has progressed, I’ve taken on more and more challenging design projects, which has generally meant that I design fewer and fewer new interfaces in a given year. But I’ve been (unintentionally) relying on implicit practice to develop my visual skills for interface design, and so I’ve been caught in a cycle: my slow visual design skills lead to fewer opportunities for implicit practice, which in turn leaves those skills further and further behind my “learning edge.”
I escaped this cycle in piano with deliberate practice. That’s trickier to arrange in design: the skills aren’t as well characterized; assessment is much more challenging; we don’t have strong practice methods. But I’ve had some promising experiences by constructing explicit practice routines for myself. I brainstormed a big list of software which I wish existed. Then I chose a few examples which I felt required no unusual representations, no unusual conceptual or interaction models. These examples could just use the standard platform controls, in standard layouts. Then I designed visual layouts mocking up these apps.
The experience felt much like playing “beginner” arrangements of Disney music. On the one hand, the exercise felt sort of “beneath me”: shallow, hyper-simplified. Not something I’d want to share with others. But on the other hand, I felt the same exhilarating taste of fluency and spontaneity. Not all interfaces must take months to design—look, I can come up with a software idea and design an interface for it on the spot! What freedom.
I can feel clearly that this skill is much more difficult to develop than sight reading. It’s harder to assess my own work; it’s less clear what I should work on next, or how to fix problems. But I’m excited at the progress, and excited to continue explicit practice in this vein.
With these two stories under my belt, I’ve experienced the limitations of implicit practice quite viscerally. The most important lesson for me has been that what’s hard about developing these skills is not figuring out how to practice or generating the right kind of feedback, but rather identifying the skills which must be improved in the first place. I’m now on the lookout for other skills I’ve neglected which have followed a similar pattern. I imagine there are other important patterns of atrophied skills which I’ve not yet identified—I’ll be searching for those, too.
[1] See his book Peak (2016) with Robert Pool, p. 98.
[2] Ericsson et al. (1993). The role of deliberate practice in the acquisition of expert performance. See page 368.
2022-04-01 05:24:22 +0000 UTC
View Post
Just sending out this reminder of office hours for March; note that the first one is tomorrow (Google Calendar link; iCal URL):
- Thursday, March 10th, 9:30 AM PST (meet)
- Monday, March 28th, 1 PM PDT (meet)
Recapitulating some information about the rough structure:
- At least for now, there are no reservations. Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
2022-03-10 03:22:22 +0000 UTC
View Post
Now equipped with several years of data and the results of several controlled experiments, I’ve spent the last few months making sense of what’s happening on Quantum Country. I’d like to discuss some of what I’ve found along the way, and some of what still confuses me.
Please note: this is an informal discussion of data from Quantum Country. The analysis is preliminary and shouldn’t be cited or excerpted in other work. I’m working with the garage door up here. There’s no audio recording for this one because it’s quite dependent on data visualization.
An update on the exponential
The most important and surprising claim of spaced repetition memory systems is that they can offer exponential returns (in memory stability) on your time. Take a moment to consider just how unusual this is! If you read or jog for a few hours a week, and you suddenly double that to four hours a week, you’re likely to get less than double the benefit. Most activities have diminishing returns at everyday scales. But at least in many cases, it seems an extra few minutes of spaced repetition practice can double how long you’ll reliably remember the material.
In 2019, Michael Nielsen and I published some preliminary results displaying this exponential in Quantum Country, but we have a stronger picture now, with two years of additional data and experiments. I’ll focus on the first essay here, since we have the most data for that.
Here’s a birds-eye view of the cost-benefit trade, aggregating much of the detail (clipping a handful of outliers which would uncomfortably shrink the figure).

Very informally, after half an hour of practice most readers can remember the answers to almost all of the essay’s 112 questions across intervals of at least 2 weeks; after an hour, at least 5 weeks; after 1.5 hours, at least 9 weeks. Notice the exponential growth in retention vs. practice time.
To unpack the graph a tad, each point represents a snapshot of a reader after a particular repetition. On the vertical axis is an aggregate measure of their memory stability: the interval over which they’ve successfully recalled 90%+ of questions. On the horizontal axis is the amount of time they practiced to completed that repetition. The “bold” points in the foreground represent the medians of each repetition; the dashed line is an exponential fit.
Of course, as you can see, there’s a great deal of dispersion in the data. Much of that is an artifact of details of the scheduler, which I’ll not discuss now. But I don’t think this changes the overall story. Here I’ve highlighted the 25th and 75th %ile results. For the top quartile, three months of retention can be had for an hour of practice. The 25th %ile grows more slowly but still makes steady progress, reaching a month of demonstrated retention after an hour and a half, still under 50% beyond the initial reading time.

It’s interesting also to note the negative correlation between practice time and retention within each repetition. “Slower” readers tend to have more trouble remembering. That’s one of many signals we could use in future attempts to optimize the scheduler.
Problems with demonstrated retention as a metric
The biggest problem with this representation is that what we’re really seeing here is an approximate lower bound of a reader’s true retention for a question. What we’d really like to know is the maximum interval a reader could wait and still have (say) a 90% chance of remembering. I’ll call that the “stable retention interval.” But we can’t measure that directly. Instead, the way we measure their retention after five repetitions is that we look at the sixth repetition’s time and result. And that measurement is entangled with our schedule.
In many cases, if the sixth repetition were scheduled later, the reader would still remember, which in turn would produce a larger demonstrated retention value. On the other hand, if a reader fails to remember after 40 days, they might have succeeded after 30… but we didn’t ask then, and so we can only report the last successful interval we attempted (say, 20 days). In this case, too, we under-report.
Because of this sampling limitation, the decisive exponential curves we see in the plot above are mostly a result of our scheduler, which is itself exponential (doubling and halving intervals according to success and failure). The stable retention intervals after five repetitions are likely much higher; if we could measure them, we’d see an even more dramatic curve. But as we’ll see later in this post, the true intervals for the first few repetitions are certainly much higher in most cases; if we could measure them directly, the exponential curve would likely flatten.
The plots I’ve shown so far aggregate across all the questions in the essay by defining a reader’s demonstrated retention as the interval that 90%+ of questions have reached. But there’s a lot of interesting detail to be seen at the next level down the ladder of abstraction.
Slicing by reader
For instance, readers near the bottom of this graph are doing better than it might appear. They can remember the vast majority of questions across long intervals—it’s just a handful of straggler questions left behind.
Here’s the 10th %ile user’s distribution of demonstrated retention by question.

The vast majority of questions are being retained across four months or more. There’s just a handful they’re having trouble pushing over a month. And with one more repetition, a big chunk of those will continue to shift upwards; here’s this same user’s next repetition:

Slicing by question
These “straggler questions” are pretty consistent across readers. Just 5 questions account for more than a quarter of the instances in which a reader’s demonstrated retention of a question remains under one month after five repetitions. 13 questions account for half of such instances. From there, it’s mostly scraps spread across the remaining 99 questions.
You can see that trend more clearly in this plot, which shows the demonstrated retention for every question after five repetitions. Each “column” here is a boxplot for one question, with each reader a sample.

About three quarters of questions have their 25th %ile above 4 months, and most of the rest sit well above 2 months—which is excellent!—though the wider dispersion suggests a clear opportunity for better scheduling. But those bottom few questions seem like they would clearly benefit from some additional support. In that bottom group are mostly difficult rote memory questions: “what’s the matrix representation of the X/Y/Z/Hadamard gate?”, “What are three common names for the dagger operation”, “Who has made progress on using quantum computers to simulate quantum field theory?”. There’s one notably more conceptual question in this group: “What’s an example circuit where the target state input to a CNOT gate is left unchanged by the gate, but the control state changes?” And then a few on key details of Dirac notation and matrix unitarity.
This mode of analysis can point us towards the problems, but it’s not obvious what’s really going on with these troublesome questions. For that, we need to dig into the dynamics of forgetting.
The counterfactual: forgetting without practice
For the mnemonic medium to truly be transformative, (at least) these four things must be true:
- Memory: If you practice as suggested, you’ll durably remember ~everything.
- Cost: It won’t take that much time.
- Counterfactual: If you don’t practice, you’ll forget.
- Transfer: And this durable recall transfers into real understanding and ability outside of artificial practice sessions.
The previous section sketched the state of the first two points. I’ll confess right now that I can’t yet say much about transfer[1]. But I’ve learned a lot about the counterfactual third point in these past few months.
Most of the data you’ll see in this section comes from an experiment I ran in 2021 which randomly assigns new readers different schedules. For instance, the initial review intervals vary between one week, two weeks, one month, and two months. This variation should let us see what happens if you don’t review.
But as I’ve previously discussed, the picture’s not so clear. Here are the recall rates, averaged across all questions and readers:

Really? A 13pp difference between 1 week and 2 months? It’s hard to believe! I discussed a number of interpretations and subanalyses in my previous article, but after another hundred hours on this question, I think the high-order bit is: many questions are easy; many readers are highly proficient; and so you need to look more closely to see dramatic forgetting curves.
My summary impression now from these data: without practice, you’ll likely forget answers to the “harder” questions; if you struggled with the material, you’ll forget many of the rest too. But you’ll probably remember answers to “easy” questions for a month or two without much support.
Make-up sessions
One of the more dramatic forgetting curves can be seen in what I call “make-up” sessions. In this newest schedule, if you forget an answer while reading the essay, we’ll prompt you to review it again one day later—a “make-up” session for extra reinforcement. If you recall the answer at that session, then we’ll schedule you for the first longer interval. (So samples used in the forgetting curve above are for the first repetition after these make-up sessions—i.e. after the reader’s demonstrated they can remember for one day.)
But people don’t always do the review right when it’s assigned. Emails can sit around for a few days. So we can see how much forgetting happens by examining recall rates at the actual time of review.

The red line is the forgetting curve for the readers we’ve been discussing, who were assigned a “make-up” session after 1 day. The blue line tracks a different pool of users using an older schedule, which initially assigned forgotten questions after 5 days. So recall drops from ~85% to ~55% over three weeks.
The blue line also addresses a concern I had about doing this type of analysis: selection bias. Aren’t “late” students less conscientious, less serious, less likely to remember? The close overlap between the red and blue lines suggests that this effect isn’t very substantial after all: tardy readers perform about the same as dutiful readers; the interval dominates.
These initially-forgotten questions also have much steeper curves in the next session, the first one scheduled across a longer interval which varies between the experimental groups.

Most readers forget about a dozen questions while reading the essay. For these questions at least, the counter-factual seems clear: without multiple rounds of practice, you’re quite likely to forget in the subsequent weeks.
There’s an interesting compounding effect here, too, which makes it harder to recover from forgetting. If you forget an answer at this first delayed repetition, you’ll get a “make-up” session one day later, just as would happen if you forgot in-essay. Readers perform worse in these make-up sessions when they forget after longer intervals. When the initial interval is 2 months, readers will forget in both the delayed repetition and the following make-up session 16% of the time, compared to just 2% of the time for readers in the 1-week condition.
Slicing by question
“Difficult” questions have steeper forgetting curves, too. Here, by “difficult”, I mean “had a low in-essay recall rate”. This turns out to correlate moderately well (r=0.65) with recall rate in the first repetition—actually, better than difficulty parameters fit through an item response theory-based model.

Here’s the forgetting curve for the first repetition, sliced by question quartile. The top line represents the “easiest” questions and the bottom line the “hardest” ones (i.e. highest and lowest in-essay recall rates).

So without practice, you’re pretty likely to forget “hard” questions after two months, and you’ll drop to C-level performance on the middle quartiles. The easiest questions might be fine!
You can see the shape of the penalty somewhat more clearly in this figure, which shows the recall rates for every question (ranked low to high) for each initial interval. Vertical grid lines indicate question deciles.

The gap is quite narrow for the easiest ~third of questions, then widens until the bottom decile or so before converging again. (You can see also that we have fewer samples for the two months interval—another interesting result, but I’ll skip that discussion for now.)
A natural next angle would be to slice by reader quartile, but we don’t have enough samples for that. But we can get a pretty good sense for this effect by noticing that a) struggling readers will forget more questions in-essay; and b) questions forgotten in-essay have steep forgetting curves, as we discussed in the previous section.
The value of practice
In this experiment, readers who begin with a 1-week interval follow that with a 3-week interval. This lets us evaluate the following rough situation: say that you’re going to need some knowledge one month from now. If you review that material one week from now, then wait three weeks (for a total of about a month), how will your performance compare to someone who didn’t review at all over that period?

Unfortunately, I didn’t set this up to be able to make a totally fair comparison: if a reader forgets a question at the 7 day mark, they’ll try again at 7 days before attempting the 21 day interval. So some of these questions got a little extra practice. And there’s also probably a meaningful selection effect in the pool of readers who stuck around for the second repetition here. I’ve tried to control for that in the other comparisons I’ve shared in this post, but I don’t have enough data to do that here.
But still, the take-home message is clear: more practice probably makes a much bigger difference than scheduling differences. To a first approximation, and with some notable exceptions we’ve discussed, the schedule appears not to matter that much. Repetition is what matters.
But we have to think of repetition in terms of cost and benefit. For instance, if you look at the “easiest” quartile of questions in the graph above, the extra round of practice doesn’t seem to matter much: it’s a bump from 95% to 100%! The cost may be greater than the benefit. Arguably this is true for the next quartile, too. Of course, we wouldn’t want to take this too far. Recall rates don’t tell the whole story. The extra repetition probably reinforces understanding in small ways we’re not seeing here. More subtly, it reinforces the emotional connection to the material. I think the decay of that connection is part of why fewer of the readers in the 2-month initial review condition stick around.
Ideally, we’d like to be able to compare schedule A and schedule B, to plot an efficient frontier. If I’m interested in putting in X minutes, what’s the best performance I can get? Or if I’m interested in a particular stable retention interval, what’s the lowest-cost schedule to get there?
There are many papers on this subject, of course, but all involve constructing predictive models for forgetting curves, and I’ve been idiosyncratically trying to avoid that, to focus on eye-visible patterns in the data. But I’ve not succeeded. If I want to push this cost/benefit topic further, I expect I’ll need to build some models.
Forgetting without in-essay prompts
So far, we’ve discussed the counterfactual of what would happen if you read Quantum Country as it exists today and then didn’t practice for some period of time. But it’s also worth asking: what would happen if you just read Quantum Country’s text—without any of the embedded prompts, and also without practice?
We ran an experiment in 2020 which we can (scrappily) combine with the experimental data above to estimate this. In the 2020 experiment, we hid one set of 9 questions from some readers’ essays, then surreptitiously re-inserted those questions into a review session one month later.
Happily, these experimental cards span all four quartiles of “question difficulty”, as we discussed earlier. I’m running out of steam for making plots, so I’ll just compare the “easiest” and “hardest” of these questions to illustrate the range of results. Those questions, respectively, are “|ψ> is an example of a…?” and “How can you write the jkth component of the matrix M, in terms of the Dirac notation and the unit vectors |e_j>?”.
- “Hard” questions need support to be reliably recalled at a month:
- Without in-essay prompts or practice, one month later: 42%
- In-essay practice and make-up sessions, then one month later: 71%
- In-essay practice and make-up sessions, practice at 1 week (possibly with make-up sessions), then 3 weeks later: 90%
- “Easy” questions need less support:
- Without in-essay prompts or practice, one month later: 89%
- In-essay practice and make-up sessions, then one month later: 91%
- In-essay practice and make-up sessions, practice at 1 week (possibly with make-up sessions), then 3 weeks later: 100%
But the situation also varies by reader.
- For readers whose in-essay recall rates were in the bottom quartile:
- The “hard” question figures are: 23%; 62%; 75%
- The “easy” question figures are: 79%; 93%; 100%
- For readers whose in-essay recall rates were in the top quartile:
- The “hard” question figures are: 56%; 67%; 100%
- The “easy” question figures are: 97%; 87% (?); 100%
This data makes the counterfactual claims somewhat starker. Without support of any kind, difficult details are very likely to be forgotten. And for readers who struggled with the material, even “easy” questions need at least in-essay prompts to maintain reliable recall.
————————
Thanks to Gary Bernhardt, Michael Nielsen, and Giacomo Randozzo for helpful discussions on these topics.
[1] Of course there are many prior experiments on this question, mostly in a laboratory setting; see e.g. Butler (2010) for a review.
2022-03-01 06:08:27 +0000 UTC
View Post
Continuing the office hours experiment in February and March (Google Calendar link; iCal URL):
- Monday, February 14th, 8 AM PST (meet)
- Friday, February 25th, 1 PM PST (meet)
- Thursday, March 10th, 9:30 AM PST (meet)
- Monday, March 28th, 1 PM PDT (meet)
My apologies that the dates/times are somewhat irregular—travel in Feb/Mar is pushing things around a bit.
I also wanted to ask: how might these office hours be improved? My sense is that they're fun, but not obviously all that actually helpful. Maybe one issue is that we really can't discuss anything deeply enough in 10-15 minutes. Maybe these sessions should be more structured? I'd be curious to hear your thoughts.
2022-02-08 19:46:03 +0000 UTC
View Post
URL for the "official" public version of this post: https://andymatuschak.org/2021
The boundary of a year is arbitrary, sure. But that doesn’t stop rituals from yielding wisdom and warmth. I begin each year by reflecting on what I’ve learned in the previous one. In the spirit of last year’s essay, I’d like to share a few lessons which others might find meaningful.
Suffering and creative work
Writing a book is a horrible, exhausting struggle, like a long bout with some painful illness. One would never undertake such a thing if one were not driven on by some demon whom one can neither resist nor understand.
—George Orwell, Why I Write
When you’re thinking about something that you don’t understand, you have a terrible, uncomfortable feeling called confusion. It’s a very difficult and unhappy business. And so most of the time you’re rather unhappy, actually, with this confusion. You can’t penetrate this thing. Now, is the confusion’s because we’re all some kind of apes that are kind of stupid working against this, trying to figure out [how] to put the two sticks together to reach the banana and we can’t quite make it, the idea? And I get this feeling all the time that I’m an ape trying to put two sticks together, so I always feel stupid. Once in a while, though, the sticks go together on me and I reach the banana.
—Richard Feynman, 1963 interview
Writers and artists of all kinds share a pervasive trope with scientists: creative work is immensely satisfying, but the moment to moment experience of producing it can be extraordinarily unpleasant. You’re lost in a blizzard, searching for something just outside your grasp, constantly feeling stupid and inadequate. The day ends—and look at the scraps you have to show for it. Days turn into weeks; little of the work seems usable. Yet something tantalizes. So you chain yourself to the desk. You press on against the freezing wind. And when you finally reach the finish line, a transcendent creative joy redeems all that suffering.
I don’t think it has to be this way.
Until this year, I hadn’t taken seriously the possibility of escaping what Orwell called a “horrible, exhausting struggle”. The trope is too ubiquitous. I’m sure some creatives have already engineered their getaway, but at least among my writer and researcher friends, conversation regularly turns to commiseration over this kind of pervasive pain. I think we take the suffering for granted much too readily. I think we can develop a relationship to creative work in which the doing in each moment is joyful, irrespective of the outcome.
I don’t want to get your hopes up. This isn’t going to be a complete guide to that kind of enlightenment. I certainly don’t have it all figured out myself. But at least in my experience, the high order bit has been to stop taking this suffering as a given, and to interrogate it instead.
Why is the feeling of confusion uncomfortable? Why specifically do I feel pain when I spend all day poking at a problem without feeling any real progress? What am I worried would happen if I just let myself fall into the problem’s contours, to let it take as long as it takes, whether or not I find an answer? For me, when I dig far enough, the answers usually bottom out in social anxieties of various kinds. Fear of not achieving “enough”, of appearing incapable, foolish, slow, unoriginal; these feelings arise in turn from deeper fears of not “belonging”, not being accepted by others.
You may have different answers. Some friends have found that their creative suffering stems from a sort of self-flagellation: they’ve convinced themselves that creative success is just a matter of will, and so if they fail to make progress, that’s a sign of their own essential weakness as an individual. In this scheme, creative troubles produce self-loathing rather than compassion and curiosity. I have another friend whose creative suffering stems in part from a hyperactive fear of running out of money, following some family financial traumas.
At least for me, and for the friends I’ve discussed, these fears arise from deeply internalized—but false—beliefs. It’s worth identifying them and rooting them out. I can’t prescribe a solution here, just the journey. It is possible to rewire your emotional system so that the creative experience does not cause suffering. Consult your local therapist, meditation guide, executive coach, psychedelic dealer, etc, or several in combination.
Abating creative suffering has been one part of the solution for me; the second is its mirror: cultivating creative joy. It’s possible to draw much more satisfaction from the moment-to-moment experiences of creative work. I found this tough to do initially because I’d trained myself to draw satisfaction in my work from outputs, achievements, and others’ approval. In that scheme, when I face a setback, or I spend a day exploring without making apparent progress, that means postponing gratification. Writing workshops, art classes, and research memoirs ritualistically emphasize the benefits of focusing on “process over product.” I’d only understood that in a purely practical sense: when you focus on product in creative work, you’ll often produce worse work. But the adage also applies to the emotional experience. The outcome can’t be the daily reward. It’s too far away, too uncertain. That’s a brittle source of satisfaction. The process has to feel rewarding, too.
Happily, I’ve found that the moment-to-moment experience of my work is quite rewarding once I get out of my own way: the pleasure of following a trail of curiosity; the serendipity of making little connections; the surprise of noticing I’m confused in an unexpected way; the satisfaction of choosing a better word to sharpen my understanding. It’s surprisingly engrossing to watch the gears of my own mind turn.
Advice about “productivity” and “motivation” often rests on an adversarial foundation. The notional goal is to focus better, work harder, achieve more. And the obstacle is you. Your will is just too weak. If you were only more organized, less lazy—you’d reach the stars! Remove distractions; bind yourself to the mast; keep up your streak; set measurable goals; up and to the right. Such techniques tend to assume that the work is a bitter pill to be swallowed, that your impulses are distortions to be overridden. The focus is on enduring the present in service of some distant future. I’m not going to pretend these techniques aren’t useful. We really do have monkey minds which benefit from taming. We really do inappropriately discount long-term rewards. But the adversarial framing is missing something important. Too often “productivity” and “motivation” techniques treat a symptom without addressing its cause. I don’t want to train myself to shout over my impulses, especially when my work depends on following creative instinct. I’d rather cultivate a healthier relationship to the sensations of doing the work, in the moment, so that my impulses will naturally serve me well.
Don’t let me give you the wrong impression. My work is not some kind of daily enlightened bliss. But I’ve made surprisingly rapid progress here this year, and I sense that much more is possible. The trope of the anguished creative seems to be a tragic, self-induced mirage. This illusion can be overcome, and that tractability may be the most important thing I learned this year.
Cultivating a better “tools for thought” scene
One key reason I work independently is that my work, “tools for thought”, doesn’t really have a natural home in an academic field or industry niche. But tools for thought do have a scene, of sorts.
The good news here is that the scene looks enormously better now than it did five years ago. I see much more ambient interest in novel and enabling user interfaces now than I’ve ever seen before. We have more part-time tinkerers; and even more importantly, we have a larger stable of serious people publishing exemplary work.
But it’s still an awfully anemic scene. There are many cheerleaders but very few full-timers producing very few powerful ideas. My instinct is that there are many more transformative ideas hiding just out of reach. It’s hard not to feel that we could move much faster, do much more.
So what’s holding us back? Here are a few bottlenecks I can see.
Money: the most obvious constraint. Few people have the funding needed to support this kind of work even temporarily, much less sustainably. What models might apply?
- Projects like Jupyter and Scratch sustain themselves with grants from philanthropic foundations and corporations, but it seems to me that these grants apply mostly to projects which have already spent years traversing the difficult stages of creative conception.
- In theory, national grant-making agencies should be well-suited to funding those exploratory early stages, but in practice, ambitious systems work is discouraged in the corresponding academic field (human-computer interaction).
- I’ve had some modest success with crowdfunding, but I worry it may not replicate well: others who have attempted this path seem not to have had as much luck, at least so far. I’ll give an update on my crowdfunding experiment later in this essay.
- Small grants from low-overhead programs like Emergent Ventures are a wonderful development, but I fear it would be difficult to cobble together enough of these to make substantive progress on a research direction. And then what? Perhaps small grants like these could bridge people to larger philanthropic funding models. More experimentation here would be good.
- Though modern venture funding is generally a poor fit (a startup’s fundamental drive is growth), perhaps a less steroidal business model would work? Mathematica and VisiCalc and Photoshop are all success stories here, in various senses. The main concern I have is: how much ongoing fundamental exploration can you get done once you turn these things into a business? This might be a way of capturing value from an initial discovery phase, but it’s not obviously a good way of funding an ongoing effort of invention. Mathematica has fared moderately well in this respect, but its success seems quite rare.
- I feel better about the approach Ink & Switch is bravely attempting with Muse. Theirs is a translational model in which spin-out projects might fund a separate “parent” lab’s open-ended explorations. This method is too early to judge, but I’m excited to learn from their experiences.
The intense quantity of money suffusing the tech sector slices both ways. If you can make significant progress on a “tools for thought” research project, you can probably also earn many hundreds of thousands of dollars per year as an employee, or raise a seed round on excellent terms. “Vanilla” tech aside, there are AI and crypto gold rushes going on, and those fields have plenty of interesting problems; why not participate? On the other hand, this cash-flush environment means that a huge number of people can readily accumulate the capital to work on whatever they’d like. There could be so many more “gentle[wo]man scholars” coming out of the tech industry. Of course this is not an ideal choice for funding research, but I do think it’s one of the more promising paths in the short term. I’d like to encourage more people to consider this option. If you’re excited about pivoting into original research, I encourage you to take the rivers of cash seriously. Do the projections yourself. The details vary enormously with your circumstances and lifestyle, but you may be able to buy your creative freedom for a few years’ labor.
Skills. We’re overweight on engineering and underweight on design and theory-building.
One problem seems to be that if you’re an engineer, you can actually build and iterate on a system yourself. Your system may not contain any interesting interface ideas, but you can certainly make things! By contrast, if you’re a designer or synthesist without engineering skills, you can sketch novel concepts for the future of computing. The trouble is that you can’t iterate on an interactive system very far without actually interacting with it. Design and theory-building are necessary but not sufficient.
So to make progress in this space, we need either individual engineer-designer-theorist hybrids, or teams of people with complementary skills. Both situations are somewhat rare. It’s tough for a single person to build strong skills in both engineering and design, because they’re both deep fields, and design in particular usually requires apprenticeship. Teams are rare because finding someone willing to work on weird, unprofitable research projects is hard enough; finding two people willing to simultaneously work on the same one involves multiplying low probabilities.
That said, I believe it’s possible to jump-start more engineer–designer dyads. My sense is that there are many engineers who are very interested in this problem space, but who harbor no delusions about doing the design or theory work themselves. If we could arrange small grants for thoughtful designers, perhaps they could develop concepts far enough that we could matchmake an eager technologist to partner with them for prototyping and iteration.
Part of the trouble here is cultural. Representative framings like “augmenting human cognition” easily attract many engineers but sound awfully Spockian to many designers I know. The situation improves if we start talking about transformative environments for creativity or expression or consciousness. Another problem, as Joe Edelman has pointed out to me, is that these various augmentations are most often discussed in terms of the relationship between an individual and their computer, rather than the relationships between people, which might involve computers. Individualistic framings often resonate poorly with design’s collectivist cultural leanings. But there’s plenty of opportunity in augmenting collective intelligence and creativity. Projects along those lines may attract a broader coalition.
Development. There aren’t enough pathways for legitimate peripheral participation, mentorship, and skill-building. This is most obviously true and perhaps most pressing for people just beginning their journey, but I feel it myself too. I think the best way to work on this problem is by funding grad-student-like apprenticeships, but these will only make sense once we improve the broader funding sustainability problems I’ve described above. Until then, we’ll be widening the top of a pipeline with gaping holes in its middle.
Teamwork. We lack exemplars and models for doing this kind of work in teams—it’s mostly loners. That limits our reach, but it also excludes the substantial majority of people who strongly prefer to work in teams, or as part of a larger institution. Ink & Switch is the best modern team exemplar we’ve got, and the scene would benefit from a written discussion of its operating model. That said, I notice that they’re now mostly focused on technical infrastructure for interfaces, rather than the interfaces themselves. I suspect the latter requires different models. I’m not sure what the right next step is here. Grants for more dyads, as I described in the section on skills, seem like a good place to start.
Campus. We have no faculty lounge or Hamming-style lunch tables, no workshops. No regular context for spontaneous deep discussion of ongoing work, at both the project level and the field level. Most of us don’t even have a consistent context for design crits. Twitter is a poor substitute. There are some enjoyable podcasts and show-and-tell series, but these venues are focused on sharing ideas, rather than criticizing and generating them. A literal campus may not be the right solution, but it’s at least evocative of what we’re missing: an environment which supports collective honing and pollination. With a relatively small amount of funding, I think a good place to start would be a small in-person summit focused on depth. More outlandishly, I’d love to find a physical space in San Francisco to host regular events and co-working sessions.
Accelerating research with execution-oriented teammates
Last year I wrote about an important practical problem in developing new tools for thought: we can only really understand novel interface ideas by using them in authentic contexts, but real-world software systems take a lot of work to build. Worse: because it’s quite expensive to switch back and forth between “research mindset” and “engineering mindset”, a single researcher will tend to see only the forest or the trees, for weeks or months at a time. This makes it tough for one person to build momentum while iterating on research systems.
So this year, with some extra funding generously provided by a few kind donors, I experimented with hiring interns and contractors to help me with execution-oriented work. Some of these collaborations are still ongoing, but I’d like to share a few early insights about how team structures in tools for thought might work best.
The biggest problem with research systems is that uncertainties mount quickly and intensely—much more so than for normal software development work. In a more traditional product development setting, you often understand the limitations and opportunities of your system well enough to curate a robust roadmap of features to build, experiments to run, problems to fix. The team will need guidance and feedback as they’re building new functionality, sure. But at least for some projects, you can specify the work well enough up front that teams can move for weeks at a time without blocking on feedback.
Experimental software development doesn’t really work like this. Sometimes you’ll understand some sub-project well enough that you can specify a few weeks of execution work, but there’s no guarantee that the “incoming” rate of well-defined projects will match the “outgoing” rate of projects completed. In practice, I found that I could often specify only a couple days worth of effort in a particular direction—and sometimes that articulation would require half a day or more of preparation! Figuring out what to do is often the most time-intensive step of these projects. In many cases, I could only see what to do next after the last small increment was completed. Tight feedback loops like these meant that I’d often end up blocking others. And because I hate being the blocker in team situations, I’d often switch away from my ostensible focus to help plan others’ projects a few steps further.
One solution would be to delegate not only execution but also pieces of the core creative problem-solving work to teammates. I’d love that, of course, but now we’re talking about a very different job description, and a much rarer candidate. With a lot of mentorship and attention, I’m confident that more people could develop these skills. But that’s a long road, and much more demanding than hiring a mainstream programmer to help me execute. This path would be more like mentoring a graduate student. We should expect it to take several years, and it’s hard to see how to make that plausible economically.
An alternative I’m trying right now is to hire contractors intermittently and for shorter periods, whenever some specific work is well-specified enough for a bit of execution-oriented attention. This introduces transactional overhead and loss of contextual continuity, but at least it focuses the effort where it’s probably highest-leverage. Fixed-scope engagements minimize the priority inversions which can occur when teammates’ work queues dry up.
I like a metaphor Adam Wiggins has used to describe a related model. Film production typically begins with a “pre-production” phase involving just a few creative staff, figuring out what the movie actually is. Then when you think you’ve got enough unknowns pinned down, you hire hundreds of people for a “production” phase and make it happen. Months later, almost everyone parts ways until the next project gets under way.
All the way at the other end of the spectrum, I’ve had wonderful experiences in creative partnerships with peers capable of outstanding autonomous work. Those relationships are precious and rare. Such people are not often hirable, especially with my limited means; they’re more likely to be partnerships than employees. I’d love to tackle a collaboration like this in 2022, but I recognize that it probably wouldn’t solve the problem I’d originally described—that is, accelerating the implementation of these experimental systems. These sorts of creative partnerships are great because they transform my conception of the problem space. They don’t (in my experience) necessarily mean moving more quickly.
There’s a substantial chasm between these two points on the employment spectrum: hourly contractors, and deep full-time partners. I’d probably make a lot more progress if I could more effectively leverage ongoing full-time staff, but that’s a puzzle which remains to be solved. Indeed, it may not be solvable! In Philip Guo’s recent retrospective on ten years of PythonTutor, he credits working solo as an important driver of the project’s sustainability. I’ve been grateful to run my own experiment with a few engineers and designers this year. I’ll keep trying.
Crowdfunding depends on highly visible public work
Crowdfunded research is still awfully rare, so I’d like to help others learn from my unusual situation as much as possible. Let’s see what we can glean from another year of this funding experiment.
Last year saw slow but steady growth throughout the year, ending with funding roughly the size of a grad student’s fellowship grant. This year, that growth continued for the first half, then halted for the second half, plateauing at around three quarters of an NSF CAREER grant (a typical grant for early-career faculty in the sciences).

I can stay afloat with this level of funding, but what should we make of the pause in growth? Should we be concerned?
One important detail here is that the rate at which patrons end their subscriptions has remained constant since mid-2020, at about 1-3% per month. The plateau is due instead to a lower rate of new patrons. And that decline can be explained by a drop in visitors to the Patreon page. Roughly the same fraction of visitors convert into members, across this entire period.
This all adds up to a prosaic, somewhat bleak story for crowdfunding research: growth, and to a lesser extent sustainability, depends on driving new eyeballs to your work. This hypothesis carries some important implications for crowdfunded research. My slowed growth shouldn’t surprise us, because I didn’t publish any major, attention-getting work in 2021.
First, don’t get me wrong: 2021 was a productive year! But my work (like any researcher’s work) varies a great deal in legibility and attractiveness to a new audience. I spent the year running a variety of experiments, none of which “worked” in a showy or summative fashion—but all of which yielded helpful insights that drive my current work. If I were a traditional academic, I’d probably have published papers on these experiments anyway, since I’d be expected to chalk up a handful of submissions each year. Absent these pressures, I’d rather publish my work when I can tell a more complete story.
As it happens, I actually published about 45,000 words in 2021—a new record for me by a wide margin—but all in smaller, informal essays for patrons, rather than highly polished work I’d want to publicize for a wide audience. I’m proud of this writing for what it is, but I don’t expect these minor essays to attract lots of new visitors. In fact, many of these essays can’t attract new visitors: they’re only available for patrons!
The unchanged cancellation rate suggests that my patrons aren’t particularly bothered by what might look like a “slow” year. The problem, again, is that this type of publication pattern isn’t going to attract much new audience. That’s okay for me, since I can at least pay my bills at current funding levels, and I expect to publish glitzier work in 2022.
But the pattern I experienced this year illustrates some of crowdfunding’s limitations. This funding model probably won’t work well for investigators who need to spend a couple years on each major project, without anything to attract a popular audience in between. And that situation may not be predictable in advance. Research doesn’t work on a schedule. “Slow” years are part of the deal.
The crowdfunding funnel’s tight conversion rate implies that the work must appeal to a fairly wide audience. And, alas, researchers can’t completely ignore marketing if they’d like to stay afloat. A cancelation rate of 1-3% per month is fairly gentle, but it means that treading water requires a modest continuous audience growth. How rapidly will these research crowdfunding audiences saturate? Could anyone actually maintain this for their entire career? I have around 650 patrons now, but to maintain my present funding at this churn rate, 1,650 additional members must come and go over the next decade.
We’ll see, I suppose! This has been a somewhat bleak way to close the year’s reflections, but I’m personally quite optimistic for the coming year—and for lots more transformative software environments to emerge.
————————
To my patrons: you have personally enabled the past few years of my life. Thank you; thank you! I hope your 2022 shines bright.
My thanks also to the following folks for conversation which helped shape the ideas above: Adam Wiggins, Andrew Sutherland, Catherine Olsson, Danny Hernandez, Joe Edelman, José Luis Ricón, Kanjun Qiu, Michael Nielsen, Molly Mielke, Nadia Eghbal, Nick Cammarata, Ozzie Kirkby, and Philip Guo.
2022-01-26 18:25:35 +0000 UTC
View Post
Happy new year, all! The office hour experiments last month were intriguing enough that I'd like to experiment with twice-monthly office hours, at least for a few months. I'll alternate times to try to cover both hemispheres.
This month's times, if you'd like to join (Google Calendar link; iCal URL):
For those of you working on tools-for-thought-y things, I'd love to offer design crit or generative sketching during these sessions. Otherwise, bring work or questions to discuss. We can workshop writing, refine research questions, identify relevant references, look for ways to improve methods/practices, etc. I won't have all the answers, but I'll do my best to help, and when appropriate I'll facilitate discussion with other attendees, who may also be able to help.
Logistical details, same as last time:
- At least for now, there are no reservations. Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
Tier experiments
I'm also experimenting with some new benefits for the "centi-grant" and "sponsor" tiers, formalizing a degree of 1:1 access for those who would like it. (Those folks will receive a separate message.)
It makes me uncomfortable to transactionalize the social sphere. I want to be very clear that I welcome correspondence from everyone—patron or not! Conversation with online strangers has made life immeasurably richer. Of course, like everyone with an inbox, I can't promise I'll always reply; these new benefits just add a layer of professional obligation (and a more explicit invitation) to an otherwise best-effort situation.
2022-01-07 21:47:17 +0000 UTC
View Post
“It is the destiny of computers to become interactive intellectual amplifiers for all people pervasively networked worldwide.
— J.C.R. Licklider
If you want to help make good on this destiny—to invent human-computer interfaces which radically expand human cognition and creativity—then what do you actually need to do? How does progress happen?
Moreover, how can we make collective progress? What potential is there for shared knowledge, frameworks, methods, values, and traditions which mutually accelerate many separate lines of exploration?
Can—or, should—“tools for thought” become a field of science? A design discipline? A “scene” in the arts? A practice of craft?
It’s not an abstract question for me: it’s a very real question of how to shape my work throughout the day! And I believe this same confusion handicaps collective progress in this space.
For me, at least, no one of those labels seems quite right. Day to day, the process feels more like following my nose than following a playbook. But it can be awfully hard to tell what I’m actually sniffing for at any given moment, much less what I “should” be sniffing for. Is this trail about understanding something more deeply? About a sense of what could be? About a yearning for how things ought to be? About manifesting some kind of perfection?
My instincts draw on all four perspectives. Each mindset helps with the process of invention in its own way. But I think they can be synthesized into a clearer picture of what it means to contribute collectively to this project, to the proliferation and refinement of tools which augment human capacity.
Tools for thought as scientific field
The prize is the pleasure of finding a thing out, the kick in the discovery, the observation that other people use it — those are the real things.
—Richard Feynman, on the Nobel Prize
Tools for thought are not “discovered” in the same sense that DNA was discovered. Tools for thought are not governed by natural laws in the sense that strong nuclear forces can be understood. And so tools for thought are not a field of natural science.
That said, the drive to understand is a central feeling in my work, and an essential ingredient for most of my contributions. Likewise, some of the most valuable outputs I produce are a kind of knowledge. I don’t think of myself as a scientist, but I do feel a kinship with scientists through these traits.
What kinds of understanding and knowledge production are most useful to progress in tools for thought? Where are they a less helpful framing?
Herb Simon introduced a concept I’ve found helpful: some fields can be understood as “sciences of the artificial”. Physics and chemistry strive to understand natural phenomena, but certain kinds of progress in architecture and business management comes from studying artificial phenomena, “artifacts” which we ourselves have constructed.
This framing suggests a few powerful spaces where those pursuing tools for thought can create and share knowledge. One lies in deeply understanding the natural phenomena operating at the meeting point—the interface!—between the artificial environment and the natural environment in which it operates.
For software interfaces, that usually means cognitive psychology and domain details. Vannevar Bush’s arguments for the memex (giving rise to present-day hyperlinks, among many other concepts) rested on the associative nature of human memory. If you’d understood symbolic logic deeply enough, you too might have been able to invent Mathematica. Powerful interfaces often reify deep ideas about the reality they represent, so there’s great potential in cultivating such ideas.
SuperMemo, the first computerized spaced repetition system, owes its existence to ideas about human forgetting first established by Ebbinghaus in the late nineteenth century. Critically, though, author Piotr Wozniak’s real-world experiences with the system led him to form his own theories of forgetting, which in turn allowed him to improve his artifact. It seems paradoxical, but studying an artificial phenomenon unlocked novel insights about a natural phenomenon. This loop is what Michael Nielsen and I have called “insight through making.”
We can also study—perhaps even quantitatively and experimentally—the artificial constructs we’ve created, and their materials.
When Wozniak works to improve the algorithm of his memory system, some part of that work depends on the natural phenomenon of human memory, but much of the opportunity has also come from understanding the “artificial” dynamics of the scheduling system he created. Years later, I benefit from that knowledge when I build my own memory system, even though I have different goals. That’s the march of science, in its own way!
Usability studies are often about understanding the “materials” we build with. Fitt’s law led us to put menus and key actions at edges and corners. Nielsen Norman Group’s usability trials have honed standard controls like type-ahead search fields. My impression is that this sort of knowledge is helpful on the margin, but is rarely transformative.
For me, the most powerful kinds of understanding about tools for thought are often insights about what kinds of artifacts can be made in the first place.
The graphical user interface created at PARC—the one which established most of the primitives we still use today—rested on the deep idea that “doing with images makes symbols.” This idea produced a universal paradigm: we use a mouse to manipulate icons, which in turn represent abstract operations.
What kind of knowledge is this? What kind of knowledge is embodied in the creation—or is it discovery?—of abstractions like a “file” or an “app” or a “retweet”?
These sorts of knowledge expand the space of artifacts which can be produced. They suggest whole new categories of artifacts. When I’m on the trail of something like this, chasing a new abstraction, part of what I'm doing comes from a deep impulse to understand. But most of it comes from a strong sense of the possible, some tantalizing form calling to me through the haze.
When my attention shifts away from trying to understand what is, and toward chasing what could be, that’s a way of being I associate with design.
Tools for thought as a design discipline
The natural sciences are concerned with how things are. … Design, on the other hand, is concerned with how things ought to be, with devising artifacts to attain goals.
— Herb Simon, The Sciences of the Artificial
What an astonishing thing the computerized spreadsheet is. The central innovation—that a cell can contain not just a value but an expression which references other cells’ values—is embodied in a new primitive abstraction: the “formula cell”. Critically, apart from their values being automatically calculated, those formula cells behave just like other cells. Formula cells compose neatly into a larger whole, one which is already exquisitely tuned by years of use.
What does it feel like to invent something like this? Obviously, I’m not Dan Bricklin; I haven’t invented something on the order of VisiCalc. But I’ve had lesser tastes of the experience.
For me, it feels like tracing my finger over the seams of reality, the edges surrounding a problem space. In my other hand, I’m fiddling with a bag of puzzle pieces, some of which have shapes which partially match those seams in reality. I’ll start collecting pieces which might fit, rotating them this way and that, combining and sometimes reshaping their edges, until all of a sudden there’s a sort of ker-chunk. Some group of puzzle pieces click into place against reality’s seams. Now I see a new whole, where there had once only been parts. Now I see those seams of reality more clearly, and I can create a better fit. Now I glimpse my puzzle pieces in a new way, and I see how to use them to reach this larger seam over here. Now I watch the seams resonating against the edges of the artifact, and I see how to play with the modes of oscillation. I start to harmonize.
In Bricklin’s case, I can imagine some of the puzzle pieces that might have been at play. I can imagine turning them in my hand. Homoiconicity. Symbolic reference. Linked representations. Array programming. Late binding. And so on. It must have felt extraordinary.
When I’m working on primitives like this, the main drive I feel is a sense of creative possibility. Understanding is involved, and it’s important; big steps often come from understanding more deeply. But the urge to understand feels secondary in these moments to some broader process of creation, of making good on what could be. Sometimes it feels like I’m not creating, but discovering. I’m seeing an image suddenly pop out of a Magic Eye rendering. It was hiding in plain sight all along!
Progress in design comes from inventing new primitives, finding new ways to combine old ones, spotting new places to apply them, and so on. Growth feels like an accumulation of patterns, principles, and methods. We may find unifying principles and frameworks now and then, but they’re forever contingent. We’ll never “hit bottom” because problems, and their solutions, are endless.
Collective progress, then, looks like effectively sharing this accumulation. Some new patterns (ideas about coordinating Mechanical Turks?) will only apply to a few practitioners. But many will be at least partially relevant across many domains. Direct manipulation, linked representations, copy and paste, and menus are good examples. More recent wide-reaching primitives include “multiplayer” editing affordances, increasingly reliable voice-to-text interactions, and, yes, contextual backlinks. At the periphery, more tenuous ideas—the pervasive financialization of computational primitives; ML-generated media; etc—may end up becoming important puzzle pieces.
Can this sort of design become a science, a design science? Simon suggests this will require “a body of intellectually tough, analytic, partly formalizable, partly empirical, teachable doctrine about the design process.” If we had such a thing, walking the search space might feel more like following GPS directions and less like stumbling through fog. Certainly many people have tried to formalize the design process in this way. I’ve not read that literature deeply, but I’m pessimistic about this. At least for the near future, I think innovative design ideas in tools for thought will continue to require a fuzzy process of ingenuity, luck, and fortitude.
For me, at least, progress also requires a kind of expressive yearning.
Tools for thought as artistic scene
Every portrait that is painted with feeling is a portrait of the artist, not of the sitter. The sitter is merely the accident, the occasion. It is not he who is revealed by the painter; it is rather the painter who, on the coloured canvas, reveals himself.
—Oscar Wilde, The Picture of Dorian Gray
The dominant methodology in design these days is “human-centered design”. Reductively, in this framework, a designer begins by immersing themselves in the worlds of potential users, trying to understand their values, goals, challenges, barriers. Then, often with the direct participation of the users, designers iteratively create artifacts which will solve the users' problems.
This is a remarkably effective method for creating products and solving problems. But, among other limitations, I think it’s missing something that’s been central to many of the most transformative tools for thought: a strong perspective on how the world should be, what’s beautiful, what’s worth amplifying.
“Dream Machines” is a telling title. Ted Nelson had a dream of what computers could mean for personal creativity and freedom. This is not “design thinking”. Much of Alan Kay’s work was driven by an almost spiritual belief in the wasted creative potential of young children. Consider his metrics: “Where some people measure progress in answers-right/test or tests-passed/year, we are more interested in Sistine-Chapel-Ceilings/lifetime.”
Bret Victor is adamant that we must break computation free of “little black rectangles.” There are problem-solving explanations for this, but it seems clear to me that they’re not behind this drive. Bret doesn’t want to physicalize computing because screens are too small and hurt your eyes, or because embodied cognition is higher bandwidth. It’s because he thinks being in the world, with our bodies, with each other, is humane and beautiful and the way things should be. He’s described this drive as “a yearning.”
Yearnings like this are sometimes the driving force for my work, too. Orbit could be framed as a tool for retaining what you learn. That’s what every other memory system does. But that’s not how I think about it. What gets me excited is the feeling of imbibing ideas more deeply, of being supported in forming an ongoing communion. In some very real sense, this project is an expression of how I want to relate to knowledge.
The impulse I just described feels quite different from the sense of possibility I feel when designing, or the hunger to understand I feel when engaging analytically. It’s more like a desire to manifest that which I think is beautiful. It’s personal, idiosyncratic. It’s a reflection of my telos.
When this drive dominates, I feel like I’m making a kind of art.
Much of the most interesting work going on now in the vicinity of “tools for thought” is driven at least in part by this kind of artistic expression. Omar Rizwan’s work consistently reflects his strong and unusual aesthetics. The Ink and Switch projects around local-first computing feel to me as much about artistic expression as about problem-solving. Sprout reflects how its creators believe collaboration should feel. In Gentle I see Rob’s consistent reverence for orality. In Cuttle I see Toby making a statement about the contemporary role of craft.
Collectively, both this modern group and the computing pioneers express a consistent yearning: the radical power of computation can be wrested from a beige-world of centralized tabulation, and instead invested in human-scale forms, where it will transform what we can think and do, individually and together. That’s not a design statement. It’s certainly not a scientific statement. It’s more like a manifesto.
If we think of tools for thought as an artistic “scene”, I think collective progress comes from transcendent work which inspires, instructs, and redefines. The ten year anniversary of Inventing on Principle is coming up. Perhaps we’re due at last for another contribution so revelatory?
Tools for thought as practice of craftsmanship
Vi Hart and Bret Victor have this way of being nuts part of the time, and the other part of the time having exquisite attention to detail at the level of an obsessive. It’s like Michelangelo: first he had to imagine putting something on the ceiling of the Sistine Chapel, but he also personally spent four years lying on his back with candle wax dripping into his eyes, painting the goddamn thing. That’s the simplest recipe: find Michelangelos.
—Alan Kay
One more impulse often shapes my practice, and I’m less certain of its proper place.
To set this up, I’d like to tell a story I haven’t shared publicly, about my first week at Apple. I was fresh out of college, joining UIKit, the technical team responsible for the iPhone and iPad user interface and “app” structure. At the time (iOS 3), it was a team of around eight. The youngest member was at least ten years older than me. I felt more than a little out of place, and therefore somewhat sheepish when I had to ask about the little black objects everyone had on their desk.

“Oh. That’s a jeweler’s loupe. We use it to see the pixels.”
That knocked me on my back. It was an absurd answer. I’d spent my entire young life with software people, and I’d never met one with a jeweler’s loupe. Yet it was also obviously right. It was a potent symbol of what people there valued. And those loupes were, in fact, quite practically useful! In my first weeks I was given a subtle bug to fix: the edges of a particular kind of button were “fuzzy.” I looked. I couldn’t see what they were talking about. I looked with the loupe, and then I saw. A straight edge seemed to be antialiased into the neighboring pixels. I took the loupe away, and I couldn’t un-see it.
The loupe is synecdoche for my whole experience at Apple. This same story happened again and again over the years—with touch interactions, with animation, with conceptual models. Each time the burr in my colleague’s eye was at first invisible to me; each time I was helped to see; and each time I could no longer un-see thereafter. I left Apple, but the indoctrinated obsession for craftsmanship has not left me.
This impulse feels different from a drive to understand, or a sense of opportunity, or an expressive yearning. It feels like a consuming desire to personally manifest a kind of perfection. It feels like a ritualistic respect for things finely honed. And, because I can rarely achieve the sublimity this mindset demands, it’s often frustrating.
I don’t know how to feel about this impulse.
On the margin, it’s obvious to me that academic human-computer interaction would benefit from an infusion of craftsmanship. It’s absurd to run elaborate evaluative studies on some new interface when that interface is executed so sloppily that few people would voluntarily use it outside a study. Yet this is exactly what happens in most systems papers published at the field’s premier conference.
Sometimes craftsmanship is essential to expressing an idea at all in this space. The Mother of All Demos was a visionary work of design, but at the time, making that show happen in realtime on a stage in San Francisco required the absolute pinnacle of technical craft. The same is true of Dynamicland today.
Another factor here is that reality has a surprising amount of detail. The line between essential design elements and craftsmanship is often unclear. When the folks at Xerox PARC added proportional-width fonts to Bravo (the first WYSIWYG word processor), was that an inseparable part of the design solution, or was it an urge to make something limited more perfect? When I obsess about the perceived lightness of spaced repetition prompts, that can feel like sanding edges, but I suspect it’s actually a “load-bearing” design element.
Another reason the craftsmanship impulse is useful is that many insights are only reachable if users make a system a part of their lives, if they use it to do something that really matters to them. No one wants to use a piece of junk for anything important. So a finely honed system has the potential to involve more real people in real situations, and therefore to produce more insight.
More prosaically, craftsmanship helps draw attention. It makes one’s work more legible and more attractive. I don’t think this is a high-order bit, but it’s worth considering.
On the margin, I think this impulse is probably too strong in me. I’ll often spend an afternoon polishing the fine details of an interaction I’ll soon discard. Some amount of this is necessary, but I suspect I could often get away with much less.
All in all, my instinct is that craftsmanship is not usually the high-order bit or the bottleneck for progress in this space, past some moderate threshold. I think there’s value to cultivating a tradition of craftsmanship among practitioners in this space, for the reasons I’ve described above, but I’m not sure what it would mean to make collective progress through craft as a primary drive.
Negotiating these forces
I don’t think I could make do with any one of these impulses. They all seem essential, in their own way, to my practice.
At least for me, the most consistently powerful lens is that of design. Progress depends on bold, imaginative ideas about how to shape an abstraction so that it dances with the seams of reality. These new artifacts do usually depend on deep understanding, both for their creation and for their improvement. And the most interesting artifacts tend to express some personal yearning, beyond fitness for purpose. It may be that craftsmanship is best understood in this framework as one element of the iterative design process—I’m less sure about that.
Likewise, I think our best model for collective progress looks less like a scientific field and more like a design discipline. We already build on a substantial shared collection of patterns, principles, and methods. Sometimes we can also build on empirical knowledge and theories of the relevant domain. These accelerate our work, but each project is necessarily bespoke. There are no formulas for creating new abstractions. Progress will continue to require the lightning strike of creative ingenuity.
————————
Happy New Year’s Eve to you all. Reflecting on this year—which I’ll likely do at greater length in the coming weeks—I’m somewhat overwhelmed by how strange and wonderful it is to pursue my work with the freedom I have. It is an incredible privilege, and it’s one which each of you makes possible. In this context, I imagine it’s easy to feel like a small part of a large crowd. A symbolic drop in a bucket, like one’s vote for president. But actually, there are only a few hundred of you! You—you, personally—play a surprisingly significant part in my ability to do the work I do. Thank you.
A few more specific thanks. I’m grateful to Michael Nielsen for introducing me to the notion of design science and for shaping how I think about the role of science in interface invention; to Bret Victor for an email exchange which prodded me to think harder about this; and to Kanjun Qiu for helpful discussions about the distinction between the “science” and “design” drives here.
2022-01-01 04:41:57 +0000 UTC
View Post
I hope you all are having a peaceful and bright holiday season. This isn't one of my usual "letters from the lab"—more a time-sensitive invitation.
tl;dr: I'm holding "office hours" next Monday and Wednesday (12/27 and 12/29) from 2-3PM PST. Come by and ask questions, pose problems, or offer work for feedback. https://meet.google.com/bby-nmzu-apx
I've noticed that my mental energy falls into a trough in the early afternoon, usually around 1-3. I'm usually just wasting my time if I try to do difficult work or reading during this period, so I often take the time to stroll, nap, or do low-effort tasks. But I've noticed that it's also a good time to take meetings: social energy seems to draw from a different well.
In that spirit, I thought it'd be fun to try experimenting with "office hours" during this time. I'll "open my office door" from 2-3 PM PST next Monday and Wednesday (12/27 and 12/29). I invite you to come by with questions, problems, and work to share for feedback: https://meet.google.com/bby-nmzu-apx
A few logistical details:
- At least for this experiment, I'm not offering "reservations". Just show up; we'll form a queue.
- In the fashion of academic office hours, eavesdropping is encouraged. You may have to wait a while to ask your question, but listening in on others' questions may turn out to be more valuable than whatever motivated you to attend, anyway. Likewise, feel free to chime in if you have thoughts on a question someone else brings—just be graceful in sharing airtime.
- These conversations aren't 1:1, but we'll have better discussions if we feel safe to speak. The Chatham House Rule is in effect: paraphrasing is okay, but don't identify anyone. And of course, we'll treat each other with generosity and nobility; I'll moderate problems.
- Unless few people show up, I'll probably cut off any one line of discussion at a maximum of about ten minutes. If it feels we've gotten the most out of a topic after just a few minutes, I may switch us up sooner. Take that as a sign of success, rather than a critical judgment!
- Rough work and ill-specified questions are very welcome. Several people have told me that they're waiting to show me design work until it's more polished. Honestly: that's silly!
I recognize that this timing is unfriendly to people in Europe, Asia, and Africa—forgive me! If these events seem fruitful, we'll work something out.
I know people will ask if I'll record these. My answer, initially, is no: I want to make sure we can create the right kind of vulnerable space first, and I'm worried recording will hamper that. If this proves fruitful, recording may be possible once we all get more comfortable.
See you next week, or not: in any case, I wish you clarity and equanimity!
2021-12-24 16:59:46 +0000 UTC
View Post
Since 2019, I’ve been running a series of randomized controlled experiments on Quantum Country readers. The main reason I haven’t published these is that I don’t understand what’s going on. Readers are forgetting very slowly. Too slowly! I’ve been crossing off theories with each experiment, but the data seem to defy a core assumption of memory systems: that we forget more and more over time, along a steep curve. This has left me awfully confused all year, but I now have one theory which might explain what’s going on. If I’m right, it’ll mean that general-purpose memory systems won’t be able to reuse the approaches which traditional vocabulary-focused memory systems have used to evaluate and improve themselves.
Please note: this is an informal discussion of data from Quantum Country. The analysis is preliminary and shouldn’t be cited or excerpted in other work. I’m working with the garage door up here.
Quantum Country’s ecological niche
Quantum Country is a sort of observatory for human memory—or at least a very limited slice of it. There have of course been countless studies of human memory, but Quantum Country lets us observe an underserved niche:
- readers are adults in a self-motivated setting, rather than kids or undergrads in a formal schooling or artificial lab environment
- the material is largely conceptual, rather than facts, vocabulary, definitions, arbitrary data, etc
This niche is important because it represents the sort of meaningful learning which most adults must do in the course of their own creative work.
Of course, some professionals use tools like SuperMemo, Anki, and Mnemosyne in the course of their work, but analyses of those data have an important limitation: they have only one data point per item per repetition, because items are (generally) authored by each user. Developers must rely on significant model assumptions to make sense of this sparse data. With Quantum Country, we can (I hope!) analyze how large cohorts of readers perform on the same set of questions, and largely avoid these model assumptions. Duolingo and Quizlet can make the same move, but mostly for vocab/fact-oriented material, rather than conceptual topics. Meanwhile, datasets from academic studies are almost exclusively restricted to artificial and classroom contexts—though, it should be noted, their data is often much cleaner and better controlled!
For me, the point of all this data is to learn something about how memory systems and human memory work, so that we can design better systems, so that we can give people superpowers. I’m not studying Quantum Country data to understand how people learn Quantum Country, but to indirectly understand how people might learn with these systems in general. At a high level, we want to answer questions like:
- How efficient can these systems become? Optimally, when should a given user review a given prompt?
- What are the boundary conditions of these systems? What are they terrible at? How far can their scope be pushed?
- How do these review interactions transfer to a person’s capacity to solve problems and do things with this knowledge in other contexts?
- … and so on.
These of course shatter fractally. Two related sub-questions I’ve been trying to answer:
- What’s the counterfactual? Without the review mechanisms, to what extent do people remember the text’s key details?
- Is Quantum Country’s current schedule too short? Too long? For which readers and questions?
These turn out to be much more difficult to answer than I’d expected!
Schedule experiments: the basics
In this post I’ll focus on the most recent experiment I’ve run, since at least for the questions I’ve described above, it’s better controlled.
Each new reader is assigned to one of four schedules, having initial intervals of one week, two weeks, one month, and two months. That is, a reader in the “two week” condition will be prompted to review a question two weeks after they initially answer it in the essay. I’m over-simplifying a bit here, but this is enough to get us started.
These conditions, taken together, should help us find a “sweet spot” for an initial review. Additionally, the two month condition should tell us something about the counterfactual: what happens if a reader doesn’t review for two whole months?
And so, here are the median reader’s accuracy rates at their first delayed review, by condition (25th and 75th %ile reader accuracies in parentheses):
- 1 week: 87% (81-92%, N=35 readers)
- 2 weeks: 87% (81-91%, N=35 readers)
- 1 month: 85% (77-92%, N=25 readers)
These data are constrained to the first essay of Quantum Country—where we have the most data—and represent readers who have collected at least 50 questions and reviewed 90%+ of those they collected. (You’ll notice I’m intentionally eschewing models and statistical tests in this article. That’s because we’re discussing effects which I expect to be large enough to be obvious at a glance!)
Only a handful of readers in the 2-month condition have fully completed their first review, so I can’t report per-reader statistics for that condition yet, but we can get some sense of what’s going by lumping all reviews in each condition into one big bucket and looking at the fraction of remembered questions in each bucket:
- 1 week: 86% (N=138 readers, 6381 reviews)
- 2 weeks: 84% (N=142 readers, 6319 reviews)
- 1 month: 83% (N=90 readers, 4477 reviews)
- 2 months: 81% (N=50 readers, 1744 reviews)
We can add one more data point by adding readers from early 2019, when the first review after just one day. This isn’t a clean comparison, both because there may be cohort effects and because these users didn’t have a “retry” feedback mechanism, but just to get a sense:
- 1 day: 89% (N=2210 readers, 122139 reviews)
This is an almost unbelievably gentle forgetting curve: from 89% to 81% across two months! This shallow slope is at the heart of my confusions, but first let’s talk about the parts which make sense.
Initially-forgotten questions should get tighter schedules
The data get more predictable if we look exclusively at delayed recall accuracy of questions which readers forgot when first answering the question in the essay. Such questions will first be assigned again one day later, repeatedly if necessary, until the reader remembers—after which they’ll the question will be assigned after a delay. Recall accuracies, by delay:
- 1 week: 84% (N = 79 readers, 626 reviews)
- 2 weeks: 77% (N = 73 readers, 447 reviews)
- 1 month: 69% (N = 57 readers, 341 reviews)
- 2 months: 56% (N = 27 readers, 138 reviews)
This data paints a more familiar picture, and it suggests a fairly clear course for memory system authors. It the goal is to ensure that recall rates stay above, say, 90%, then when a reader forgets a question initially, we should assign it to be reviewed again quite soon. The automatic one-day-later “retry” session is not enough to support lengthy subsequent intervals.
In fact, this effect compounds. If readers forget a question in this first delayed review, then it’s assigned to them again one day later. Readers with longer initial intervals were less likely to recover in that subsequent session—that is, they were more likely to forget yet again. Recovery rates by first review session delay (note that these sample sizes are now getting small):
- 1 week: 90% (N = 31 readers, 79 reviews)
- 2 weeks: 78% (N = 33 readers, 86 reviews)
- 1 month: 83% (N = 24 readers, 70 reviews)
- 2 months: 57% (N = 9 readers, 44 reviews)
OK, so that’s one pretty clear implication for memory system designers that we’ve extracted: the initial schedule should contract decisively when a prompt is forgotten in-essay.
The trouble begins: when questions are initially remembered
But questions aren’t often forgotten in-essay. Aggregated across readers in these four conditions, in-essay accuracy rates are 91-92%. So what about the common case where questions are initially remembered? Here’s where the trouble comes in.
First review recall rates for questions remembered in-essay:
- 1 week: 86% (N=137 readers, 6174 reviews)
- 2 weeks: 85% (N=140 readers, 5814 reviews)
- 1 month: 84% (N=89 readers, 4108 reviews)
- 2 months: 83% (N=49 readers, 1599 reviews)
As in the previous section, we can irresponsibly use data from 2019 readers to add a data point at 1 day: 90% (N = 2207 readers, 109031 reviews). As before, note that this isn’t a well-controlled comparison.
That forgetting curve is unreasonably shallow! Sure, if we want to target a 90% recall rate, this data suggests we should schedule the first review after less than one week. But each review has a cost; if readers can skip an initial review or two in exchange for a few percentage points lower accuracy, I suspect most would take that trade. After all, each full repetition of the first essay’s 112 questions takes about 25 minutes. How should we think about this?
One consideration is the “recovery” effect we saw in the previous section. Do the people who forget after longer periods struggle more to recover in the following session, relative to those who forget after one week? Here are the recovery rates (i.e. accuracy rates one day after forgetting in the first delayed session, after remembering in-essay):
- 1 week: 84% (N=68 readers, 600 reviews)
- 2 weeks: 81% (N=66 readers, 529 reviews)
- 1 month: 85% (N=44 readers, 384 reviews)
- 2 months: 74% (N=16 readers, 147 reviews)
This doesn’t look very compelling. Maybe there’s trouble at 2 months, but I’d like to see more samples first. It sure looks here like we can defer the first review for a month without real penalty.
Another reason to delay initial reviews is to invoke the spacing effect, but I’ll skip discussing that in this post. Suffice it to say that (with sparse data) I don’t yet observe a spacing effect in the interactions between first and second session intervals.
What about slicing by question? Looking at the ten questions in the first essay with the lowest initial accuracies, but where readers remembered the answers to those questions while reading, we still see a sharp forgetting curve in that first delayed review:
- 1 week: 74% (N=74 readers, 273 reviews)
- 2 weeks: 71% (N=67 readers, 277 reviews)
- 1 month: 65% (N=47 readers, 210 reviews)
- 2 months: 65% (N=25 readers, 85 reviews)
This is pretty compelling, but the curve rapidly disappears. Here are the accuracies for the next ten “hardest” questions, by initial delay:
- 1 week: 73% (N=129 readers, 474 reviews)
- 2 weeks: 74% (N=114 readers, 446 reviews)
- 1 month: 74% (N=77 readers, 312 reviews)
- 2 months: 75% (N=40 readers, 151 reviews)
We don’t have enough data to extract trustworthy forgetting curves on a per-question basis, but the flat curves continue with increasing intercepts for the remaining groups of ten. The median ten are flat at 82%; the easiest ten are flat at 95%.
So questions vary in difficulty, but don’t seem to decline in recall rates as more time passes. What should we take from this? Sure, we could schedule “hard” prompts earlier, but would that actually do anything? Except for the ten hardest prompts, this data shows no improvement in recall with shorter intervals.
One way to interpret this is that the main thing here is that people just need to practice. The timing is not terribly important. Indeed, we previously found that once the median reader remembers an answer after a delay (of any length), their recall rate over the following year of reviews is 95%!
But I find myself simply not believing this data. The forgetting curves are too flat. This just doesn’t reflect my experience. If I don’t practice something I’ve learned for two months, I’m much less likely to remember it than I would be one week later. Our data suggest that after the first successful delayed recall, we could safely delay subsequent reviews for many months. I just don’t buy it.
What’s going on here?
My theory: cuing effects
I think the picture gets clearer if you look at a specific question. Consider this question (which has ~75th %ile in-essay recall accuracy):

This task strongly shapes the retrieval you perform: it makes you look for connections between the normalization condition and measurement probabilities. You might have instant access to this answer; but you might also consider the question on the spot and infer that this is the only reasonable answer.
The accuracy rates we collect don’t distinguish between these two possibilities. But the difference matters! If we asked you instead to solve some problem which indirectly relies on this property, you might not make a leap you need to make.
What we really care about here is fluency: your readiness to think interesting thoughts, to solve interesting problems, to notice connections and apply your knowledge creatively. You want to train a richly patterned reasoning apparatus.
My hunch is that even though cued recall doesn’t seem to dip substantially between 1 week and 2 months, free recall and transfer tasks would show a sharper curve. The kind of fluency I just described does decline. And if you could see that decline, you might want to schedule the next review earlier.
If this theory is right, it means Quantum Country and general-purpose memory systems need to follow a very different path than most prior work in this space. Following SuperMemo’s lead, most systems generally think about scheduling in terms of a simple threshold: schedule a review when the estimated probability of recall drops to 90%. That way, your expected recall rate at any given moment should remain at least 90%.
I think this is a reasonable heuristic for language learning, facts, and term-definition pairs. You can’t usually re-derive those answers on the spot. The goal is to produce the answer from memory. The effect of explicitly cuing the retrieval should be much smaller than we’d observe for conceptual material like Quantum Country’s.
If we can’t approximate a conceptual detail’s depth of encoding with cued recall rates, we can’t use the traditional scheduling heuristics. We need to establish some other way to drive the control loop.
Response time seems like an interesting proxy for fluency, but I’ve had surprisingly little success finding patterns in Quantum Country readers’ response times.
A more intrusive approach would be to insert questions which require readers to indirectly use knowledge in some new context. If it’s true that cuing effects are particularly significant for conceptual knowledge, we should see a decline in transfer performance over time even if recall accuracies remain steady. I’d like to do something like this anyway, to establish the flexibility of the knowledge reinforced by the review system.
Another way to test this theory is to consider questions which I expect to be more “rote” and less conceptual. These should have a more pronounced forgetting curve. For example, here are the recall rates for the questions asking for the matrix values of the X, Y, Z, and H gates:
- 1 week: 56% (N=91 readers, 234 reviews)
- 2 weeks: 60% (N=87 readers, 215 reviews)
- 1 month: 56% (N=59 readers, 144 reviews)
- 2 months: 48% (N=26 readers, 54 reviews)
Not a lot of samples here, but this data doesn’t support my theory. The flat curve between 1 week and 1 month still strikes me as highly implausible. I suppose that people could be re-deriving these values from what they remember of these gates’ desired effects, but I don’t find that especially plausible.
One simple answer here, and perhaps for this whole confusion, is that people are simply lying. Quantum Country is self-graded. Maybe people are inappropriately marking answers as remembered? I don’t find this plausible. Remember, the median reader has a self-reported accuracy of 85-87% from 1 week to 1 month. That median user is still marking plenty of questions as forgotten. What’s confusing is why the median 1-month user doesn’t mark more questions as forgotten than the median 1-week user.
Another important factor distorting my data is survivorship bias. Readers who come back and review after 2 months are probably more conscientious than readers who reviewed after 1 week. They probably care more about the topic and read more closely. This effect is probably inflating the performance of the later intervals, but I don’t have a good way to establish by how much.
I think my next step here is to dig deeper into the literature, which does include numerous memory experiments focused on conceptual knowledge and transfer learning. Perhaps some of those methods or discussions can help me here.
————————
Thanks to Gary Bernhardt for helpful discussions about this topic. And thank you to you all for your ongoing support, which makes it possible for me to conduct long-term studies like this. We’re about 3/4 of the way to the equivalent of an NSF CAREER grant now, and I’m continuously shocked that such a thing might be possible. Happy holidays!
2021-12-01 06:35:19 +0000 UTC
View Post